ShareGPT-4o-Image  by FreedomIntelligence

Dataset and model for GPT-4o-level image generation

Created 9 months ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the ShareGPT-4o-Image dataset, a collection of 92K image generation samples derived from GPT-4o, and Janus-4o, a multimodal large language model fine-tuned on this dataset. It aims to advance open-source multimodal models by aligning them with GPT-4o's image generation capabilities, targeting researchers and developers in AI and computer vision.

How It Works

The dataset comprises 45K text-to-image and 46K text-and-image-to-image pairs, offering high-quality data for training. Janus-4o, built upon Janus-Pro-7B, leverages this dataset to achieve both text-to-image and image-editing functionalities. The model's architecture likely incorporates a vision encoder and a language model, with a specialized generation head for image synthesis, enabling it to process and generate images based on textual and visual inputs.

Quick Start & Requirements

  • Installation: Clone the Janus repository and install via pip install -e .. For the Gradio demo, use pip install -e .[gradio].
  • Prerequisites: Requires Python, PyTorch, and Hugging Face Transformers. GPU with CUDA is necessary for inference and training.
  • Inference: The README provides detailed Python code snippets for both text-to-image and text-and-image-to-image generation using the FreedomIntelligence/Janus-4o-7B model.
  • Training: Training scripts are available, requiring accelerate and deepspeed. The command specifies using the FreedomIntelligence/ShareGPT-4o-Image dataset.

Highlighted Details

  • Dataset contains 92,256 image generation samples from GPT-4o.
  • Janus-4o model supports both text-to-image and text-and-image-to-image generation.
  • Fine-tuning Janus-Pro on the dataset yields noticeable gains in image generation.
  • Training code is provided to reproduce Janus-4o from Janus-Pro-7B.

Maintenance & Community

The project is associated with FreedomIntelligence and appears to be actively developed, with a linked paper on arXiv. Further community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The Janus model is available on Hugging Face. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README states that while Janus-4o shows improvements, it still lags behind GPT-4o-Image in overall performance. The dataset is described as "distilled" from GPT-4o-Image, implying it's a subset or processed version.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

0%
2k
Training-free paradigm for text-to-image generation/editing
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.