Awesome_Long_Form_Video_Understanding  by ttengwang

Curated list of research on long-term video understanding

created 3 years ago
283 stars

Top 93.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of research papers, datasets, and tools focused on the challenging domain of long-form video understanding. It targets researchers and practitioners in computer vision and natural language processing, providing a centralized resource for exploring methods that analyze complex activities and events unfolding over extended durations.

How It Works

The collection is organized by task, including representation learning, efficient modeling, large language model integration, action localization, dense captioning, temporal grounding, and video prediction. It highlights papers that employ techniques like hierarchical consistency, multimodal temporal contrastive learning, memory-augmented transformers, and various LLM-based approaches to tackle the complexities of untrimmed, real-world videos.

Quick Start & Requirements

This is a curated list of research papers and datasets, not a runnable software package. Specific requirements will vary per individual paper or dataset. Links to associated GitHub repositories and datasets are provided within the README.

Highlighted Details

  • Extensive coverage of Temporal Action Localization, including surveys and representative papers from 2017 to 2023.
  • A dedicated section for Long-Term Video Large Language Models, featuring recent advancements from 2023-2024.
  • Comprehensive lists of datasets relevant to long-form video understanding, with details on annotations, sources, and tasks.
  • Includes links to video feature extractors and benchmarks for evaluating multimodal video models.

Maintenance & Community

This is an active repository with a call for contributions. Specific contributor details or community links (e.g., Discord/Slack) are not provided in the README.

Licensing & Compatibility

The repository itself is not licensed as a software package. Individual papers and datasets will have their own licenses, which must be checked for compatibility with commercial or closed-source use.

Limitations & Caveats

The README indicates that some sections are marked "TODO," suggesting ongoing development and potential for missing details or incomplete curation. It is a reference list, not a ready-to-use framework.

Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.