Awesome_Long_Form_Video_Understanding by ttengwang

Curated list of research on long-term video understanding

Created 3 years ago

338 stars

Top 81.6% on SourcePulse

Project Summary

This repository serves as a curated collection of research papers, datasets, and tools focused on the challenging domain of long-form video understanding. It targets researchers and practitioners in computer vision and natural language processing, providing a centralized resource for exploring methods that analyze complex activities and events unfolding over extended durations.

How It Works

The collection is organized by task, including representation learning, efficient modeling, large language model integration, action localization, dense captioning, temporal grounding, and video prediction. It highlights papers that employ techniques like hierarchical consistency, multimodal temporal contrastive learning, memory-augmented transformers, and various LLM-based approaches to tackle the complexities of untrimmed, real-world videos.

Quick Start & Requirements

This is a curated list of research papers and datasets, not a runnable software package. Specific requirements will vary per individual paper or dataset. Links to associated GitHub repositories and datasets are provided within the README.

Highlighted Details

Extensive coverage of Temporal Action Localization, including surveys and representative papers from 2017 to 2023.
A dedicated section for Long-Term Video Large Language Models, featuring recent advancements from 2023-2024.
Comprehensive lists of datasets relevant to long-form video understanding, with details on annotations, sources, and tasks.
Includes links to video feature extractors and benchmarks for evaluating multimodal video models.

Maintenance & Community

This is an active repository with a call for contributions. Specific contributor details or community links (e.g., Discord/Slack) are not provided in the README.

Licensing & Compatibility

The repository itself is not licensed as a software package. Individual papers and datasets will have their own licenses, which must be checked for compatibility with commercial or closed-source use.

Limitations & Caveats

The README indicates that some sections are marked "TODO," suggesting ongoing development and potential for missing details or incomplete curation. It is a reference list, not a ready-to-use framework.

Awesome_Long_Form_Video_Understanding by ttengwang

Explore Similar Projects

DeepVideoDiscovery by microsoft

MiraData by mira-space

MotionLLM by IDEA-Research

Video-MME by MME-Benchmarks

tarsier by bytedance

VideoMind by yeliudev

VideoWorld by ByteDance-Seed

UniVTG by showlab

MiniGPT4-video by Vision-CAIR

VideoLLaMA2 by DAMO-NLP-SG

Awesome-LLMs-for-Video-Understanding by yunlong10

VideoRAG by HKUDS