Lumina-mGPT-2.0  by Alpha-VLLM

Image generation model for broad tasks

Created 5 months ago
1,068 stars

Top 35.4% on SourcePulse

GitHubView on GitHub
Project Summary

Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model designed for a wide array of image generation tasks, including text-to-image, image pair generation, subject-driven generation, and multi-turn image editing. It aims to unify these capabilities within a single, flexible framework, targeting researchers and developers working on advanced image synthesis and manipulation.

How It Works

This model employs an autoregressive approach, generating images pixel by pixel (or token by token) based on preceding outputs and conditioning information. It leverages a decoder-only transformer architecture, similar to large language models, adapted for visual data. The integration of MoVQGAN weights is crucial for its image tokenization and reconstruction process, enabling efficient and high-quality generation.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (python=3.10), activate it, and install dependencies using pip install -r requirements.txt and a specific flash-attn wheel.
  • Prerequisites: CUDA 12, PyTorch 2.3, and downloading MoVQGAN weights (movqgan_270M.ckpt).
  • Inference: Run python generate_examples/generate.py with specified model paths and generation parameters. Acceleration options include --speculative_jacobi and --quant.
  • Resources: Inference on an A100 GPU requires approximately 80 GB of memory for the base model, reducing to 33.8 GB with speculative Jacobi decoding and quantization.
  • Links: User Demo, Installation, Checkpoints

Highlighted Details

  • Unified framework for diverse image generation tasks.
  • Offers speculative Jacobi decoding and model quantization for accelerated inference.
  • Achieves 304s inference time and 33.8 GB GPU memory usage with acceleration techniques on an A100.
  • Provides a 7B parameter model capable of 768px image generation.

Maintenance & Community

The project is associated with the Alpha VLLM Group at Shanghai AI Lab. Open positions are advertised.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The "All-in-One Inference & Checkpoints" and "Technical Report" are marked as not yet released. The specific version of flash-attn required needs careful selection from the provided link.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
140 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.