Image generation model for broad tasks
Top 47.3% on sourcepulse
Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model designed for a wide array of image generation tasks, including text-to-image, image pair generation, subject-driven generation, and multi-turn image editing. It aims to unify these capabilities within a single, flexible framework, targeting researchers and developers working on advanced image synthesis and manipulation.
How It Works
This model employs an autoregressive approach, generating images pixel by pixel (or token by token) based on preceding outputs and conditioning information. It leverages a decoder-only transformer architecture, similar to large language models, adapted for visual data. The integration of MoVQGAN weights is crucial for its image tokenization and reconstruction process, enabling efficient and high-quality generation.
Quick Start & Requirements
python=3.10
), activate it, and install dependencies using pip install -r requirements.txt
and a specific flash-attn
wheel.movqgan_270M.ckpt
).python generate_examples/generate.py
with specified model paths and generation parameters. Acceleration options include --speculative_jacobi
and --quant
.Highlighted Details
Maintenance & Community
The project is associated with the Alpha VLLM Group at Shanghai AI Lab. Open positions are advertised.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The "All-in-One Inference & Checkpoints" and "Technical Report" are marked as not yet released. The specific version of flash-attn
required needs careful selection from the provided link.
23 hours ago
1 week