Image generation model for multimodal prompts
Top 11.8% on sourcepulse
OmniGen is a unified image generation model designed to simplify multi-modal image creation, enabling users to generate diverse images from various prompts without additional plugins or preprocessing. It targets researchers and users seeking a flexible, all-in-one solution for tasks like text-to-image, subject-driven generation, and image editing.
How It Works
OmniGen employs a unified architecture that automatically interprets features from multi-modal inputs (text and images) based on the prompt. This approach eliminates the need for external modules like ControlNet or IP-Adapter, streamlining the generation process and allowing for direct control through natural language and image references.
Quick Start & Requirements
pip install -e .
after cloning the repository.pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
).Highlighted Details
<|image_1|>
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
docs/inference.md
.1 month ago
1 day