Awesome-LLMs-for-Video-Understanding  by yunlong10

Survey of video understanding via LLMs

created 2 years ago
2,582 stars

Top 18.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated list of the latest research papers, code repositories, and datasets focused on leveraging Large Language Models (LLMs) for Video Understanding (Vid-LLMs). It targets researchers and practitioners in computer vision and natural language processing, offering a structured overview of the rapidly evolving Vid-LLM landscape.

How It Works

The project categorizes Vid-LLMs based on their architectural approach and functional role, such as "Video Analyzer × LLM" or "Video Embedder × LLM," further detailing how LLMs are employed as summarizers, managers, text decoders, or regressors. It also outlines pre-training and instruction-tuning strategies, including adapter-based fine-tuning methods. The repository provides a taxonomy of tasks, datasets, and benchmarks relevant to Vid-LLMs.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. It links to external papers and code repositories, each with its own setup requirements.

Highlighted Details

  • Comprehensive survey of Vid-LLMs, updated to June 2024 with ~100 new models and 15 benchmarks.
  • Introduces a novel taxonomy for classifying Vid-LLMs based on video representation and LLM functionality.
  • Details various Vid-LLM tasks including recognition, captioning, grounding, retrieval, and question answering.
  • Includes extensive lists of pre-training and fine-tuning datasets, along with benchmarks for evaluation.

Maintenance & Community

The project is actively maintained by a large team of contributors from multiple universities, including the University of Rochester and Southern University of Science and Technology. Contributions are welcomed via pull requests.

Licensing & Compatibility

The repository itself does not have a specific license mentioned, but it links to numerous external research papers and code repositories, each with their own respective licenses. Users must consult the licenses of individual linked projects for usage and compatibility.

Limitations & Caveats

As a curated list, this repository does not provide direct code execution or pre-trained models. Users must navigate to individual linked projects for implementation details, dependencies, and potential usage restrictions. The rapid pace of research means the list is constantly updated, requiring users to check for the latest versions of linked resources.

Health Check
Last commit

5 days ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
1
Star History
352 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

cookbook by EleutherAI

0.1%
810
Deep learning resource for practical model work
created 1 year ago
updated 1 week ago
Feedback? Help us improve.