Discover and explore top open-source AI tools and projects—updated daily.
Dataset and model for GPT-4o-level image generation
Top 97.3% on SourcePulse
This repository provides the ShareGPT-4o-Image dataset, a collection of 92K image generation samples derived from GPT-4o, and Janus-4o, a multimodal large language model fine-tuned on this dataset. It aims to advance open-source multimodal models by aligning them with GPT-4o's image generation capabilities, targeting researchers and developers in AI and computer vision.
How It Works
The dataset comprises 45K text-to-image and 46K text-and-image-to-image pairs, offering high-quality data for training. Janus-4o, built upon Janus-Pro-7B, leverages this dataset to achieve both text-to-image and image-editing functionalities. The model's architecture likely incorporates a vision encoder and a language model, with a specialized generation head for image synthesis, enabling it to process and generate images based on textual and visual inputs.
Quick Start & Requirements
pip install -e .
. For the Gradio demo, use pip install -e .[gradio]
.FreedomIntelligence/Janus-4o-7B
model.accelerate
and deepspeed
. The command specifies using the FreedomIntelligence/ShareGPT-4o-Image
dataset.Highlighted Details
Maintenance & Community
The project is associated with FreedomIntelligence and appears to be actively developed, with a linked paper on arXiv. Further community interaction channels are not explicitly mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The Janus model is available on Hugging Face. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README states that while Janus-4o shows improvements, it still lags behind GPT-4o-Image in overall performance. The dataset is described as "distilled" from GPT-4o-Image, implying it's a subset or processed version.
1 month ago
Inactive