Lumina-mGPT-2.0  by Alpha-VLLM

Image generation model for broad tasks

created 4 months ago
749 stars

Top 47.3% on sourcepulse

GitHubView on GitHub
Project Summary

Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model designed for a wide array of image generation tasks, including text-to-image, image pair generation, subject-driven generation, and multi-turn image editing. It aims to unify these capabilities within a single, flexible framework, targeting researchers and developers working on advanced image synthesis and manipulation.

How It Works

This model employs an autoregressive approach, generating images pixel by pixel (or token by token) based on preceding outputs and conditioning information. It leverages a decoder-only transformer architecture, similar to large language models, adapted for visual data. The integration of MoVQGAN weights is crucial for its image tokenization and reconstruction process, enabling efficient and high-quality generation.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (python=3.10), activate it, and install dependencies using pip install -r requirements.txt and a specific flash-attn wheel.
  • Prerequisites: CUDA 12, PyTorch 2.3, and downloading MoVQGAN weights (movqgan_270M.ckpt).
  • Inference: Run python generate_examples/generate.py with specified model paths and generation parameters. Acceleration options include --speculative_jacobi and --quant.
  • Resources: Inference on an A100 GPU requires approximately 80 GB of memory for the base model, reducing to 33.8 GB with speculative Jacobi decoding and quantization.
  • Links: User Demo, Installation, Checkpoints

Highlighted Details

  • Unified framework for diverse image generation tasks.
  • Offers speculative Jacobi decoding and model quantization for accelerated inference.
  • Achieves 304s inference time and 33.8 GB GPU memory usage with acceleration techniques on an A100.
  • Provides a 7B parameter model capable of 768px image generation.

Maintenance & Community

The project is associated with the Alpha VLLM Group at Shanghai AI Lab. Open positions are advertised.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The "All-in-One Inference & Checkpoints" and "Technical Report" are marked as not yet released. The specific version of flash-attn required needs careful selection from the provided link.

Health Check
Last commit

23 hours ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
77 stars in the last 90 days

Explore Similar Projects

Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Patrick Kidger Patrick Kidger(Core Contributor to JAX ecosystem), and
4 more.

glow by openai

0.1%
3k
Generative flow research paper code
created 7 years ago
updated 1 year ago
Feedback? Help us improve.