Action100M  by facebookresearch

Large-scale video action dataset with hierarchical annotations

Created 2 weeks ago

New!

345 stars

Top 80.6% on SourcePulse

GitHubView on GitHub
Project Summary

Action100M addresses the need for a large-scale, hierarchically annotated dataset for video action understanding. It provides researchers with a comprehensive resource to train and evaluate models capable of recognizing and describing actions at various granularities, benefiting advancements in video analysis and AI.

How It Works

The dataset structures video content into a hierarchical Tree-of-Captions, enabling multi-level action annotation. It leverages large language models like PLM-3B and LLama-3.2-Vision-11B for initial captioning and action labeling, augmented by human-curated detailed summaries, action phrases, and actor identification. This approach allows for rich, temporally localized action descriptions across different levels of detail.

Quick Start & Requirements

Data can be loaded directly from the 🤗 Hugging Face facebook/action100m-preview repository using the datasets library with streaming=True. Examples for loading from local parquet files and visualization are available in usage.ipynb. No specific hardware prerequisites beyond standard Python environments are detailed.

Highlighted Details

  • Hierarchical Structure: Organizes video segments into a multi-level Tree-of-Captions for granular action analysis.
  • Multi-Model Annotation Pipeline: Utilizes PLM-3B and LLama-3.2-Vision-11B for automated annotation, complemented by detailed human-verified GPT annotations.
  • Rich Action Semantics: Annotations include brief and detailed action summaries, verb phrases, and identified actors.
  • **Large-
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
1
Star History
349 stars in the last 14 days

Explore Similar Projects

Feedback? Help us improve.