Text-to-video generation research paper focusing on transparency
Top 41.8% on sourcepulse
TransPixeler enables text-to-video generation of RGBA (including alpha channel for transparency) content, a capability crucial for visual effects and seamless scene integration. It targets researchers and developers in computer vision and generative AI seeking to extend existing video models for transparency applications. The primary benefit is the ability to generate videos with controllable transparency, enhancing realism and creative possibilities.
How It Works
TransPixeler adapts pre-trained diffusion transformer (DiT) video models for RGBA generation. It incorporates alpha-specific tokens and utilizes LoRA-based fine-tuning to jointly generate RGB and alpha channels. This approach optimizes attention mechanisms to maintain the original model's RGB quality while ensuring high consistency between the RGB and alpha outputs, even with limited training data.
Quick Start & Requirements
pip install -r requirements.txt
within a conda
environment (Python 3.10 recommended).python app.py
.python cli.py --lora_path /path/to/lora --prompt "..."
wan
branch and ensure data follows 001.mp4
, 001_seg.mp4
, 001.txt
structure.Highlighted Details
wan
branch supports joint generation of RGB and associated modalities (e.g., segmentation maps, alpha masks) with Wan2.1.Maintenance & Community
wan
branch for joint generation and roadmap additions for Hunyuan, LTX, and ComfyUI integration.Licensing & Compatibility
Limitations & Caveats
The project is presented as research for CVPR 2025, implying it may be experimental. Specific hardware requirements for training or advanced inference scenarios (beyond the stated ~24GB VRAM for provided LoRA weights) are not detailed. License information for commercial use is absent.
2 months ago
Inactive