MovieChat  by rese1f

Video QA for long video understanding (CVPR 2024 paper)

Created 2 years ago
650 stars

Top 51.3% on SourcePulse

GitHubView on GitHub
Project Summary

MovieChat addresses the challenge of long video understanding by proposing a novel "sparse memory" approach that significantly reduces computational requirements. It's designed for researchers and practitioners working with extensive video content, offering a more memory-efficient alternative to dense token processing.

How It Works

MovieChat employs a sparse memory mechanism to handle videos exceeding 10,000 frames, achieving a 10,000x reduction in memory cost per frame compared to dense methods. This is achieved by selectively processing keyframes and their associated information, rather than processing every frame densely. This approach allows for efficient long-video comprehension on standard hardware, such as a 24GB GPU.

Quick Start & Requirements

  • Install: pip install MovieChat (version 0.6.3 recommended).
  • Prerequisites:
    • LLaMA weights (Hugging Face format).
    • Vicuna delta weights (v0).
    • MiniGPT-4 model (trained linear layer).
    • Pretrained MovieChat weights.
    • ffprobe installed (sudo apt-get install ffmpeg on Ubuntu).
  • Setup: Requires downloading multiple model checkpoints.
  • Links: MovieChat_Onevision, MovieChat-1K leaderboard, Gradio demo.

Highlighted Details

  • Achieves 62.3 Global Acc. on MovieChat-1K with 2048 frames.
  • MovieChat+ variant reaches 71.2 Global Acc. on MovieChat-1K.
  • MovieChat-Onevision variant achieves 79.0 Global Acc. on MovieChat-1K.
  • Supports both global video understanding and breakpoint-specific queries.

Maintenance & Community

The project is associated with CVPR 2024 and has seen recent updates, including releases to lmms-eval and a new version using LLaVA-OneVision. Dataset components (ground truth, raw videos) have been released on Hugging Face.

Licensing & Compatibility

The project is intended for non-commercial research use only.

Limitations & Caveats

Due to copyright concerns and size limitations, the release of dataset features is planned for a later date. The README notes a potential RuntimeError related to video file initialization, with a suggested workaround.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.