Discover and explore top open-source AI tools and projects—updated daily.
jd-opensourceUnified multimodal model for vision and generation
New!
Top 39.7% on SourcePulse
JoyAI-Image is a unified multimodal foundation model addressing image understanding, text-to-image generation, and instruction-guided editing. It targets researchers and developers seeking advanced spatial reasoning and controllable image manipulation, providing a single, integrated solution for diverse visual AI tasks.
How It Works
The architecture combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT). Its core innovation lies in a closed-loop collaboration: enhanced spatial understanding from the MLLM improves the MMDiT's generation and editing capabilities, while generative transformations provide complementary data for spatial reasoning. This bidirectional feedback loop aims to awaken and strengthen spatial intelligence within the model.
Quick Start & Requirements
conda create -n joyai python=3.10 -y, conda activate joyai) followed by pip install -e ..Highlighted Details
JoyAI-Image-Und for understanding and JoyAI-Image-Edit for instruction-guided editing.Maintenance & Community
The project is actively hiring Research Scientists, Engineers, and Interns for next-generation generative models. Interested candidates can send resumes to huanghaoyang.ocean@jd.com. No community channels (e.g., Discord, Slack) are listed.
Licensing & Compatibility
Licensed under the Apache 2.0 license, which permits commercial use and modification.
Limitations & Caveats
Several advanced models, including JoyAI-Image-Edit-Distilled, JoyAI-Image-Edit-Plus (multi-image editing), and the core JoyAI-Image text-to-image model, are marked as "To be released," indicating they are not yet available. The README also references a speculative gpt-5 model for prompt rewriting.
9 hours ago
Inactive
QwenLM