Awesome-Streaming-Video-Understanding  by sotayang

Always-on AI for real-time streaming video understanding

Created 5 months ago
255 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository curates the latest research papers, models, and datasets for streaming (online) video understanding, aiming to build always-on, real-time AI assistants capable of continuous multimodal perception. It addresses the unique challenges of operating under real-time, causal constraints, where decisions must be made without future knowledge, focusing on proactive decision-making and efficient resource management for perpetual processing. The collection serves as a comprehensive reference for researchers and practitioners pushing the frontier of interactive video AI.

How It Works

The project tackles streaming video understanding by categorizing approaches into two core challenges: Proactive Streaming Models, which focus on determining when to act (e.g., generating a response, asking for clarification, or remaining silent) using methods like token-driven triggering or dedicated classifiers; and Reactive Streaming Models, which address how to sustain perpetual processing through efficient long-context management, including KV cache optimization, hierarchical memory, and retrieval augmentation. This curated list provides a structured overview of techniques and solutions for these challenges.

Quick Start & Requirements

This repository is a curated collection of research resources rather than a single runnable project. Specific installation, setup, and execution instructions, along with detailed prerequisites (e.g., GPU, CUDA, Python versions), will vary significantly depending on the individual papers and models linked within the list. Users are directed to the respective GitHub repositories or arXiv pages for each specific project to find this information.

Highlighted Details

  • Focuses on "always-on, real-time video AI systems" and "J.A.R.V.I.S.-like continuous multimodal perception."
  • Organizes research into distinct categories: Proactive Streaming Models (triggering mechanisms) and Reactive Streaming Models (efficient long-context processing).
  • Features an extensive, regularly updated list of recent papers (2024-2026) with links to code, venues, and detailed methodologies.
  • Covers a wide array of techniques, from token-driven triggering and KV cache management to retrieval augmentation and computational optimizations.

Maintenance & Community

Contributions are welcomed via pull requests or issues. The project is maintained by Zhenyu Yang and Krongrong Zhang, who can be contacted via email at yangzhenyu2022@ia.ac.cn and zhangkr2025@shanghaitech.edu.cn, respectively. No specific community channels like Discord or Slack are listed.

Licensing & Compatibility

Licensing information for the collection itself or the individual projects linked within is not provided in the README. Users should consult the licenses of specific papers, code repositories, or datasets they intend to use.

Limitations & Caveats

As a curated list of research, this repository does not offer a unified installation or deployment. Users must individually assess and integrate specific models, each with its own dependencies and potential limitations. The field is rapidly evolving, with a strong emphasis on recent research (2024-2026), indicating a dynamic and potentially experimental landscape. The core challenges of real-time processing and causal constraints remain active areas of research.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
11
Issues (30d)
0
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.