OmniGen  by VectorSpaceLab

Image generation model for multimodal prompts

Created 1 year ago
4,257 stars

Top 11.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OmniGen is a unified image generation model designed to simplify multi-modal image creation, enabling users to generate diverse images from various prompts without additional plugins or preprocessing. It targets researchers and users seeking a flexible, all-in-one solution for tasks like text-to-image, subject-driven generation, and image editing.

How It Works

OmniGen employs a unified architecture that automatically interprets features from multi-modal inputs (text and images) based on the prompt. This approach eliminates the need for external modules like ControlNet or IP-Adapter, streamlining the generation process and allowing for direct control through natural language and image references.

Quick Start & Requirements

Highlighted Details

  • Supports text-to-image, subject-driven generation, identity-preserving generation, image editing, and image-conditioned generation.
  • Handles multi-modal prompts with image placeholders (e.g., <|image_1|>).
  • Offers LoRA fine-tuning capabilities with provided scripts.
  • Available in Hugging Face Diffusers library.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • The README notes that OmniGen still has room for improvement due to limited resources. Specific resource requirements for efficient operation are detailed in docs/inference.md.
Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.