LongCat-Video by meituan-longcat

Foundational video generation model for diverse creative tasks

Created 2 months ago

1,875 stars

Top 22.9% on SourcePulse

Project Summary

Summary

LongCat-Video is a 13.6B parameter foundational model for video generation, supporting Text-to-Video, Image-to-Video, and Video-Continuation tasks. It offers a unified architecture and excels in generating high-quality, minutes-long videos efficiently, positioning it as a step towards world models for researchers and developers.

How It Works

The model employs a unified architecture for multiple video generation tasks. Natively pretrained for Video-Continuation, it produces extended videos without quality degradation. Efficient inference for $720p$, $30fps$ output is achieved via coarse-to-fine temporal/spatial generation and Block Sparse Attention, with performance enhanced by multi-reward RLHF (GRPO).

Quick Start & Requirements

Installation involves cloning the repo, setting up a Python 3.10 Conda environment, and installing PyTorch (CUDA 12.4), FlashAttention-2, and other dependencies via pip install -r requirements.txt. Prerequisites include ninja and psutil. Model weights are available on Huggingface. Demos are provided via torchrun scripts and a Streamlit interface.

Highlighted Details

Unified architecture for T2V, I2V, and VC tasks.
Generates minutes-long videos without color drifting or quality degradation.
Efficient $720p$, $30fps$ inference.
Performance comparable to leading models on internal benchmarks.
Dense 13.6B parameter model.

Maintenance & Community

Maintained by the Meituan LongCat Team. Contact via email (longcat-team@meituan.com) and WeChat Group. No specific community channels or roadmap detailed.

Licensing & Compatibility

Model weights and contributions are under the MIT License, which does not grant rights to Meituan trademarks or patents. While generally permissive for commercial use, users must carefully assess accuracy, safety, and fairness for sensitive applications and comply with all applicable laws and regulations.

Limitations & Caveats

The model is not exhaustively evaluated for all downstream applications. Users should consider general LLM limitations (e.g., language performance variations) and carefully assess accuracy, safety, and fairness before deployment in sensitive or high-risk scenarios, ensuring compliance with legal requirements.

LongCat-Video by meituan-longcat

Explore Similar Projects

Video-T1 by liuff19

TATS by songweige

Allegro by rhymes-ai

LongLive by NVlabs

HunyuanVideo-1.5 by Tencent-Hunyuan

LightX2V by ModelTC

LiveAvatar by Alibaba-Quark

Pyramid-Flow by jy0205

Step-Video-T2V by stepfun-ai

LTX-Video by Lightricks

Wan2.2 by Wan-Video

Open-Sora by hpcaitech