Survey on video diffusion models
Top 21.3% on sourcepulse
This repository serves as a comprehensive survey of video diffusion models, cataloging research across generation, editing, completion, enhancement, prediction, and understanding tasks. It targets researchers and practitioners in AI, computer vision, and multimedia, providing a structured overview of the rapidly evolving field of AI-powered video synthesis and manipulation.
How It Works
The survey categorizes video diffusion models based on their core methodologies (e.g., U-Net, Transformer-based) and conditioning mechanisms (e.g., text, pose, motion, sound, image). It systematically lists and describes numerous research papers, highlighting their contributions to advancing video generation quality, controllability, and efficiency. The organization facilitates understanding of the landscape, from foundational techniques to specialized applications.
Quick Start & Requirements
This repository is a curated list of research papers and does not contain executable code. Accessing the underlying models requires individual investigation of linked GitHub repositories and adherence to their specific setup instructions and dependencies.
Highlighted Details
Maintenance & Community
The survey is updated periodically, with the latest version available on arXiv. It is accepted by ACM Computing Surveys (CSUR). Contact information for suggestions and feedback is provided.
Licensing & Compatibility
The repository itself is a survey and does not impose licensing restrictions. Individual linked projects will have their own licenses.
Limitations & Caveats
This is a survey and does not provide direct access to or implementation of the described models. Users must refer to the original research papers and their associated codebases for practical application.
1 month ago
1 day