LongCat-Video  by meituan-longcat

Foundational video generation model for diverse creative tasks

Created 1 week ago

New!

969 stars

Top 38.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LongCat-Video is a 13.6B parameter foundational model for video generation, supporting Text-to-Video, Image-to-Video, and Video-Continuation tasks. It offers a unified architecture and excels in generating high-quality, minutes-long videos efficiently, positioning it as a step towards world models for researchers and developers.

How It Works

The model employs a unified architecture for multiple video generation tasks. Natively pretrained for Video-Continuation, it produces extended videos without quality degradation. Efficient inference for $720p$, $30fps$ output is achieved via coarse-to-fine temporal/spatial generation and Block Sparse Attention, with performance enhanced by multi-reward RLHF (GRPO).

Quick Start & Requirements

Installation involves cloning the repo, setting up a Python 3.10 Conda environment, and installing PyTorch (CUDA 12.4), FlashAttention-2, and other dependencies via pip install -r requirements.txt. Prerequisites include ninja and psutil. Model weights are available on Huggingface. Demos are provided via torchrun scripts and a Streamlit interface.

Highlighted Details

  • Unified architecture for T2V, I2V, and VC tasks.
  • Generates minutes-long videos without color drifting or quality degradation.
  • Efficient $720p$, $30fps$ inference.
  • Performance comparable to leading models on internal benchmarks.
  • Dense 13.6B parameter model.

Maintenance & Community

Maintained by the Meituan LongCat Team. Contact via email (longcat-team@meituan.com) and WeChat Group. No specific community channels or roadmap detailed.

Licensing & Compatibility

Model weights and contributions are under the MIT License, which does not grant rights to Meituan trademarks or patents. While generally permissive for commercial use, users must carefully assess accuracy, safety, and fairness for sensitive applications and comply with all applicable laws and regulations.

Limitations & Caveats

The model is not exhaustively evaluated for all downstream applications. Users should consider general LLM limitations (e.g., language performance variations) and carefully assess accuracy, safety, and fairness before deployment in sensitive or high-risk scenarios, ensuring compliance with legal requirements.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
28
Star History
988 stars in the last 10 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI) and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

LightX2V by ModelTC

6.3%
741
Video generation inference framework for efficient synthesis
Created 7 months ago
Updated 15 hours ago
Feedback? Help us improve.