ShareGPT-4o-Image  by FreedomIntelligence

Dataset and model for GPT-4o-level image generation

Created 2 months ago
262 stars

Top 97.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the ShareGPT-4o-Image dataset, a collection of 92K image generation samples derived from GPT-4o, and Janus-4o, a multimodal large language model fine-tuned on this dataset. It aims to advance open-source multimodal models by aligning them with GPT-4o's image generation capabilities, targeting researchers and developers in AI and computer vision.

How It Works

The dataset comprises 45K text-to-image and 46K text-and-image-to-image pairs, offering high-quality data for training. Janus-4o, built upon Janus-Pro-7B, leverages this dataset to achieve both text-to-image and image-editing functionalities. The model's architecture likely incorporates a vision encoder and a language model, with a specialized generation head for image synthesis, enabling it to process and generate images based on textual and visual inputs.

Quick Start & Requirements

  • Installation: Clone the Janus repository and install via pip install -e .. For the Gradio demo, use pip install -e .[gradio].
  • Prerequisites: Requires Python, PyTorch, and Hugging Face Transformers. GPU with CUDA is necessary for inference and training.
  • Inference: The README provides detailed Python code snippets for both text-to-image and text-and-image-to-image generation using the FreedomIntelligence/Janus-4o-7B model.
  • Training: Training scripts are available, requiring accelerate and deepspeed. The command specifies using the FreedomIntelligence/ShareGPT-4o-Image dataset.

Highlighted Details

  • Dataset contains 92,256 image generation samples from GPT-4o.
  • Janus-4o model supports both text-to-image and text-and-image-to-image generation.
  • Fine-tuning Janus-Pro on the dataset yields noticeable gains in image generation.
  • Training code is provided to reproduce Janus-4o from Janus-Pro-7B.

Maintenance & Community

The project is associated with FreedomIntelligence and appears to be actively developed, with a linked paper on arXiv. Further community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The Janus model is available on Hugging Face. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README states that while Janus-4o shows improvements, it still lags behind GPT-4o-Image in overall performance. The dataset is described as "distilled" from GPT-4o-Image, implying it's a subset or processed version.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.