ShareGPT-4o-Image by FreedomIntelligence

Dataset and model for GPT-4o-level image generation

Created 8 months ago

283 stars

Top 92.5% on SourcePulse

Project Summary

This repository provides the ShareGPT-4o-Image dataset, a collection of 92K image generation samples derived from GPT-4o, and Janus-4o, a multimodal large language model fine-tuned on this dataset. It aims to advance open-source multimodal models by aligning them with GPT-4o's image generation capabilities, targeting researchers and developers in AI and computer vision.

How It Works

The dataset comprises 45K text-to-image and 46K text-and-image-to-image pairs, offering high-quality data for training. Janus-4o, built upon Janus-Pro-7B, leverages this dataset to achieve both text-to-image and image-editing functionalities. The model's architecture likely incorporates a vision encoder and a language model, with a specialized generation head for image synthesis, enabling it to process and generate images based on textual and visual inputs.

Quick Start & Requirements

Installation: Clone the Janus repository and install via pip install -e .. For the Gradio demo, use pip install -e .[gradio].
Prerequisites: Requires Python, PyTorch, and Hugging Face Transformers. GPU with CUDA is necessary for inference and training.
Inference: The README provides detailed Python code snippets for both text-to-image and text-and-image-to-image generation using the FreedomIntelligence/Janus-4o-7B model.
Training: Training scripts are available, requiring accelerate and deepspeed. The command specifies using the FreedomIntelligence/ShareGPT-4o-Image dataset.

Highlighted Details

Dataset contains 92,256 image generation samples from GPT-4o.
Janus-4o model supports both text-to-image and text-and-image-to-image generation.
Fine-tuning Janus-Pro on the dataset yields noticeable gains in image generation.
Training code is provided to reproduce Janus-4o from Janus-Pro-7B.

Maintenance & Community

The project is associated with FreedomIntelligence and appears to be actively developed, with a linked paper on arXiv. Further community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The Janus model is available on Hugging Face. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README states that while Janus-4o shows improvements, it still lags behind GPT-4o-Image in overall performance. The dataset is described as "distilled" from GPT-4o-Image, implying it's a subset or processed version.

ShareGPT-4o-Image by FreedomIntelligence

Explore Similar Projects

NextFlow by ByteVisionLab

OmniGen2 by VectorSpaceLab

NextStep-1 by stepfun-ai

UltraPixel by catcathh

Lumina-mGPT by Alpha-VLLM

diffusion-self-distillation by primecai

ELLA by TencentQQGYLab

BLIP3o by JiuhaiChen

RPG-DiffusionMaster by YangLing0818

HunyuanVideo-I2V by Tencent-Hunyuan

image-gpt by openai

Janus by deepseek-ai