Dataset for facial text-to-video generation research
Top 73.2% on sourcepulse
CelebV-Text is a large-scale dataset designed to address the lack of high-quality, text-annotated video data for facial text-to-video generation tasks. It targets researchers and developers in AI-driven video editing and generation, providing a comprehensive resource to advance facial animation and manipulation based on textual descriptions.
How It Works
The dataset comprises 70,000 in-the-wild face video clips, totaling approximately 279 hours. Each video is paired with 20 semi-automatically generated text descriptions that precisely capture both static and dynamic facial attributes. This approach ensures rich, relevant annotations covering general appearances, detailed features, lighting conditions, actions, emotions, and light directions, facilitating more accurate and controllable text-to-video synthesis.
Quick Start & Requirements
youtube_dl
and opencv-python
.youtube_dl
, opencv-python
.Highlighted Details
Maintenance & Community
The project is affiliated with OpenXDLab. Updates are provided via GitHub issues. Links to related work and potential future interests are listed.
Licensing & Compatibility
The CelebV-Text dataset is available for non-commercial research purposes only. Redistribution and commercial exploitation are strictly prohibited. Copies are allowed for internal use within a single organization.
Limitations & Caveats
The dataset is strictly for non-commercial research use. Users agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion for commercial purposes. Further distribution is also restricted.
1 year ago
1+ week