In-Context-LoRA  by ali-vilab

IC-LoRA: Diffusion Transformer framework for visual generation tasks

created 9 months ago
1,979 stars

Top 22.7% on sourcepulse

GitHubView on GitHub
Project Summary

In-Context LoRA (IC-LoRA) provides a flexible framework for adapting diffusion transformers to a wide array of visual generation tasks. It enables users to condition image generation on custom image sets, facilitating applications like virtual try-on, product design, and visual effects. The target audience includes researchers and developers working with diffusion models who need adaptable and controllable image generation capabilities.

How It Works

IC-LoRA concatenates condition and target images into a single composite image, guided by natural language prompts. This approach allows for task-agnostic adaptation, meaning the core framework can be fine-tuned for diverse applications without fundamental architectural changes. It leverages the power of diffusion transformers to generate customizable image sets with intrinsic relationships or to condition new image sets on existing ones.

Quick Start & Requirements

  • Install/Run: Use the provided AI-Toolkit. Place sample training data (data/movie-shots.zip) and configuration (config/movie-shots.yml) into the toolkit. Run training with python run.py config/movie-shots.yml.
  • Prerequisites: Requires a single GPU with at least 24GB of memory. Resolution can be adjusted for different GPU memory limits.
  • Setup Time: Training completes in a few hours.
  • Links: Paper, Project Page, Model Zoo, Community Creations

Highlighted Details

  • Offers 10 pretrained models for tasks like Film Storyboard Generation and Visual Identity Design.
  • Supports ComfyUI integration with community-developed nodes and workflows.
  • Introduced IDEA-Bench, a benchmark for assessing zero-shot generalization in generative models.
  • Predecessor, Group Diffusion Transformers, supported 30 visual generation tasks zero-shot.

Maintenance & Community

The project actively showcases community innovations and provides 10 pretrained models. Links to community creations and a model zoo are available.

Licensing & Compatibility

This repository uses FLUX as the base model. Users must comply with FLUX's license. The training data may contain copyrighted material; commercial use requires obtaining necessary permissions and ensuring compliance with copyright laws.

Limitations & Caveats

The framework requires task-specific fine-tuning for optimal performance in diverse applications. The provided training data is for reference and educational purposes only, with commercial use requiring separate permissions.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
152 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.