Discover and explore top open-source AI tools and projects—updated daily.
Video QA for long video understanding (CVPR 2024 paper)
Top 51.3% on SourcePulse
MovieChat addresses the challenge of long video understanding by proposing a novel "sparse memory" approach that significantly reduces computational requirements. It's designed for researchers and practitioners working with extensive video content, offering a more memory-efficient alternative to dense token processing.
How It Works
MovieChat employs a sparse memory mechanism to handle videos exceeding 10,000 frames, achieving a 10,000x reduction in memory cost per frame compared to dense methods. This is achieved by selectively processing keyframes and their associated information, rather than processing every frame densely. This approach allows for efficient long-video comprehension on standard hardware, such as a 24GB GPU.
Quick Start & Requirements
pip install MovieChat
(version 0.6.3 recommended).ffprobe
installed (sudo apt-get install ffmpeg
on Ubuntu).Highlighted Details
Maintenance & Community
The project is associated with CVPR 2024 and has seen recent updates, including releases to lmms-eval
and a new version using LLaVA-OneVision. Dataset components (ground truth, raw videos) have been released on Hugging Face.
Licensing & Compatibility
The project is intended for non-commercial research use only.
Limitations & Caveats
Due to copyright concerns and size limitations, the release of dataset features is planned for a later date. The README notes a potential RuntimeError
related to video file initialization, with a suggested workaround.
7 months ago
Inactive