TransPixeler  by wileewang

Text-to-video generation research paper focusing on transparency

created 7 months ago
880 stars

Top 41.8% on sourcepulse

GitHubView on GitHub
Project Summary

TransPixeler enables text-to-video generation of RGBA (including alpha channel for transparency) content, a capability crucial for visual effects and seamless scene integration. It targets researchers and developers in computer vision and generative AI seeking to extend existing video models for transparency applications. The primary benefit is the ability to generate videos with controllable transparency, enhancing realism and creative possibilities.

How It Works

TransPixeler adapts pre-trained diffusion transformer (DiT) video models for RGBA generation. It incorporates alpha-specific tokens and utilizes LoRA-based fine-tuning to jointly generate RGB and alpha channels. This approach optimizes attention mechanisms to maintain the original model's RGB quality while ensuring high consistency between the RGB and alpha outputs, even with limited training data.

Quick Start & Requirements

  • Install via pip install -r requirements.txt within a conda environment (Python 3.10 recommended).
  • Requires LoRA weights for inference.
  • Local inference demo available via python app.py.
  • CLI inference: python cli.py --lora_path /path/to/lora --prompt "..."
  • For joint generation with Wan2.1, checkout the wan branch and ensure data follows 001.mp4, 001_seg.mp4, 001.txt structure.
  • Official Hugging Face demo available.

Highlighted Details

  • CVPR 2025 accepted paper.
  • Supports Text-to-RGBA Video and Image-to-RGBA Video.
  • LoRA weights provided for Text-to-Video + RGBA using THUDM/CogVideoX-5B (49 frames, ~24GB VRAM).
  • New wan branch supports joint generation of RGB and associated modalities (e.g., segmentation maps, alpha masks) with Wan2.1.

Maintenance & Community

  • Active development with recent updates including a new wan branch for joint generation and roadmap additions for Hunyuan, LTX, and ComfyUI integration.
  • Discord and WeChat groups available for discussion and collaboration.
  • Project page and arXiv paper available.

Licensing & Compatibility

  • License details are not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as research for CVPR 2025, implying it may be experimental. Specific hardware requirements for training or advanced inference scenarios (beyond the stated ~24GB VRAM for provided LoRA weights) are not detailed. License information for commercial use is absent.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
36 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.