Discover and explore top open-source AI tools and projects—updated daily.
facebookresearchLarge-scale video action dataset with hierarchical annotations
New!
Top 80.6% on SourcePulse
Action100M addresses the need for a large-scale, hierarchically annotated dataset for video action understanding. It provides researchers with a comprehensive resource to train and evaluate models capable of recognizing and describing actions at various granularities, benefiting advancements in video analysis and AI.
How It Works
The dataset structures video content into a hierarchical Tree-of-Captions, enabling multi-level action annotation. It leverages large language models like PLM-3B and LLama-3.2-Vision-11B for initial captioning and action labeling, augmented by human-curated detailed summaries, action phrases, and actor identification. This approach allows for rich, temporally localized action descriptions across different levels of detail.
Quick Start & Requirements
Data can be loaded directly from the 🤗 Hugging Face facebook/action100m-preview repository using the datasets library with streaming=True. Examples for loading from local parquet files and visualization are available in usage.ipynb. No specific hardware prerequisites beyond standard Python environments are detailed.
Highlighted Details
1 week ago
Inactive