In-Context-LoRA by ali-vilab

IC-LoRA: Diffusion Transformer framework for visual generation tasks

Created 1 year ago

2,058 stars

Top 21.2% on SourcePulse

Project Summary

In-Context LoRA (IC-LoRA) provides a flexible framework for adapting diffusion transformers to a wide array of visual generation tasks. It enables users to condition image generation on custom image sets, facilitating applications like virtual try-on, product design, and visual effects. The target audience includes researchers and developers working with diffusion models who need adaptable and controllable image generation capabilities.

How It Works

IC-LoRA concatenates condition and target images into a single composite image, guided by natural language prompts. This approach allows for task-agnostic adaptation, meaning the core framework can be fine-tuned for diverse applications without fundamental architectural changes. It leverages the power of diffusion transformers to generate customizable image sets with intrinsic relationships or to condition new image sets on existing ones.

Quick Start & Requirements

Install/Run: Use the provided AI-Toolkit. Place sample training data (data/movie-shots.zip) and configuration (config/movie-shots.yml) into the toolkit. Run training with python run.py config/movie-shots.yml.
Prerequisites: Requires a single GPU with at least 24GB of memory. Resolution can be adjusted for different GPU memory limits.
Setup Time: Training completes in a few hours.
Links: Paper, Project Page, Model Zoo, Community Creations

Highlighted Details

Offers 10 pretrained models for tasks like Film Storyboard Generation and Visual Identity Design.
Supports ComfyUI integration with community-developed nodes and workflows.
Introduced IDEA-Bench, a benchmark for assessing zero-shot generalization in generative models.
Predecessor, Group Diffusion Transformers, supported 30 visual generation tasks zero-shot.

Maintenance & Community

The project actively showcases community innovations and provides 10 pretrained models. Links to community creations and a model zoo are available.

Licensing & Compatibility

This repository uses FLUX as the base model. Users must comply with FLUX's license. The training data may contain copyrighted material; commercial use requires obtaining necessary permissions and ensuring compliance with copyright laws.

Limitations & Caveats

The framework requires task-specific fine-tuning for optimal performance in diverse applications. The provided training data is for reference and educational purposes only, with commercial use requiring separate permissions.

In-Context-LoRA by ali-vilab

Explore Similar Projects

pose-depot by a-lgil

UltraPixel by catcathh

qwen2vl-flux by erwold

SemanticStyleGAN by seasonSH

ComfyUI-DyPE by wildminder

diffusion-self-distillation by primecai

Lumina-mGPT-2.0 by Alpha-VLLM

flymyai-lora-trainer by FlyMyAI

stable-diffusion-pytorch by kjsman

OmniGen by VectorSpaceLab

IP-Adapter by tencent-ailab

StableCascade by Stability-AI