Lumina-Image-2.0 by Alpha-VLLM

Image generation research paper using a unified framework

Created 1 year ago

861 stars

Top 41.4% on SourcePulse

Project Summary

Lumina-Image 2.0 is a unified and efficient framework for image generation, targeting researchers and developers in the AI image synthesis space. It offers a comprehensive solution for generating high-quality images, with a focus on flexibility and integration into existing workflows.

How It Works

Lumina-Image 2.0 is built upon a diffusion model architecture, supporting various solvers like Midpoint, Euler, and DPM Solver for inference. The framework emphasizes efficiency and unification, providing a single codebase for checkpoints, fine-tuning, and inference. Its design allows for integration with popular tools like Hugging Face Diffusers and ComfyUI, enhancing its usability and accessibility.

Quick Start & Requirements

Install: Create a conda environment, install PyTorch with CUDA 12.1 support, and then run pip install -r requirements.txt. flash-attn installation is also recommended.
Prerequisites: Python 3.11, PyTorch 2.1.0, torchvision 0.16.0, torchaudio 2.1.0, CUDA 12.1.
Data: Data links should be placed in ./configs/data.yaml with a JSON format for image-text pairs.
Resources: Requires a GPU with CUDA 12.1.
Demos & Docs: Hugging Face Space demo: https://huggingface.co/spaces/Alpha-VLLM/Lumina-Image-2.0, Diffusers API: https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2.

Highlighted Details

Supports 1024 resolution with a 2.6B parameter model.
Integrates with Hugging Face Diffusers and ComfyUI.
Offers fine-tuning code and LoRA support.
Includes a technical report and multiple demo interfaces.

Maintenance & Community

The project has active development with recent updates and releases, including Lumina-Accessory for fine-tuning. Community engagement is encouraged via a WeChat group.

Licensing & Compatibility

The project provides checkpoints and code for research purposes. Specific licensing details for commercial use are not explicitly stated in the README, but its availability on Hugging Face suggests broad accessibility.

Limitations & Caveats

The project is actively under development, with features like "Unified multi-image generation" and "Control" listed as not yet implemented. The primary weight files are in .pth format, requiring specific handling for inference.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days