LAMP  by RQ-Wu

LAMP: Few-shot video generation research paper (CVPR 2024)

created 1 year ago
279 stars

Top 94.1% on sourcepulse

GitHubView on GitHub
Project Summary

LAMP is a few-shot text-to-video generation framework designed for researchers and practitioners in computer vision and generative AI. It enables users to learn custom motion patterns from a small set of videos (8-16) and then generate new videos based on these learned motions, offering a more efficient approach to specialized video synthesis compared to training large models from scratch.

How It Works

LAMP leverages a motion pattern learning approach, building upon a pre-trained text-to-image diffusion model (specifically Stable Diffusion v1.4). It fine-tunes the model to capture the temporal dynamics and motion characteristics present in a small dataset of videos. This allows the model to generate novel video sequences that adhere to a specific learned motion, while also supporting video editing tasks by modifying existing video content based on new prompts.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (conda create -n LAMP python=3.8), activate it, and install dependencies using pip install -r requirements.txt after installing PyTorch with CUDA 11.3 support and xformers.
  • Prerequisites: Ubuntu 18.04+, CUDA 11.3, Python 3.8, PyTorch 1.12.1, and git-lfs for downloading weights. A GPU with at least 15 GB VRAM is required for training.
  • Resources: Pre-trained checkpoints and training data are available via Baidu Disk and Google Drive links.
  • Links: Arxiv Paper, Website Page, Colab Notebook (Note: Colab link is illustrative, actual link may vary).

Highlighted Details

  • Achieved CVPR 2024 acceptance.
  • Supports few-shot learning for custom motion patterns with 8-16 training videos.
  • Offers functionality for both text-to-video generation and video editing.
  • Built upon the Tune-A-Video framework.

Maintenance & Community

The repository is maintained by Ruiqi Wu. The project is based on the Tune-A-Video codebase. Further community interaction details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This license strictly prohibits commercial use without formal permission.

Limitations & Caveats

The primary limitation is the non-commercial use restriction imposed by the CC BY-NC 4.0 license. Commercial applications would require explicit permission from the authors. The specific CUDA version requirement (11.3) might also pose a compatibility challenge for users with different CUDA setups.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.