Lumina-mGPT-2.0 by Alpha-VLLM

Image generation model for broad tasks

Created 9 months ago

1,074 stars

Top 35.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Taranjeet Singh

Cofounder of Mem0

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model designed for a wide array of image generation tasks, including text-to-image, image pair generation, subject-driven generation, and multi-turn image editing. It aims to unify these capabilities within a single, flexible framework, targeting researchers and developers working on advanced image synthesis and manipulation.

How It Works

This model employs an autoregressive approach, generating images pixel by pixel (or token by token) based on preceding outputs and conditioning information. It leverages a decoder-only transformer architecture, similar to large language models, adapted for visual data. The integration of MoVQGAN weights is crucial for its image tokenization and reconstruction process, enabling efficient and high-quality generation.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (python=3.10), activate it, and install dependencies using pip install -r requirements.txt and a specific flash-attn wheel.
Prerequisites: CUDA 12, PyTorch 2.3, and downloading MoVQGAN weights (movqgan_270M.ckpt).
Inference: Run python generate_examples/generate.py with specified model paths and generation parameters. Acceleration options include --speculative_jacobi and --quant.
Resources: Inference on an A100 GPU requires approximately 80 GB of memory for the base model, reducing to 33.8 GB with speculative Jacobi decoding and quantization.
Links: User Demo, Installation, Checkpoints

Highlighted Details

Unified framework for diverse image generation tasks.
Offers speculative Jacobi decoding and model quantization for accelerated inference.
Achieves 304s inference time and 33.8 GB GPU memory usage with acceleration techniques on an A100.
Provides a 7B parameter model capable of 768px image generation.

Maintenance & Community

The project is associated with the Alpha VLLM Group at Shanghai AI Lab. Open positions are advertised.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The "All-in-One Inference & Checkpoints" and "Technical Report" are marked as not yet released. The specific version of flash-attn required needs careful selection from the provided link.

Health Check

Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days