VideoRAG by HKUDS

PyTorch code for retrieval-augmented generation with long-context videos

Created 11 months ago

2,277 stars

Top 19.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

VideoRAG is a PyTorch framework for retrieval-augmented generation designed to process and understand extremely long-context videos, enabling users to query vast amounts of video content. It targets researchers and developers working with extensive video datasets, offering a structured approach to knowledge extraction and question answering from hours of footage.

How It Works

VideoRAG employs a novel dual-channel architecture. It combines graph-driven textual knowledge grounding to model cross-video semantic relationships with hierarchical multimodal context encoding for preserving spatiotemporal visual patterns. This approach dynamically constructs knowledge graphs to maintain semantic coherence across multiple videos, optimizing retrieval efficiency through adaptive multimodal fusion.

Quick Start & Requirements

Installation: Requires a conda environment with Python 3.11, PyTorch 2.1.2, and specific versions of libraries like accelerate, bitsandbytes, moviepy, pytorchvideo, timm, fvcore, eva-decord, ctranslate2, faster_whisper, neo4j, hnswlib, xxhash, nano-vectordb, transformers, tiktoken, openai, tenacity, and ImageBind (installed from source).
Checkpoints: Requires downloading checkpoints for MiniCPM-V, Whisper, and ImageBind.
Hardware: A single NVIDIA RTX 3090 (24GB VRAM) is sufficient for processing hundreds of hours of video.
Documentation: VideoRAG GitHub Repository

Highlighted Details

Efficiently processes hundreds of hours of video on a single RTX 3090.
Distills extensive video content into a structured, multi-modal knowledge graph.
Utilizes a multi-modal retrieval paradigm to align text and visual content.
Introduces the "LongerVideos" benchmark with over 160 videos totaling 134+ hours.

Maintenance & Community

The project is associated with HKUDS and cites foundational work from nano-graphrag and LightRAG. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository lists a LICENSE file, but the specific license type and its implications for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

Currently tested only in an English environment; multi-language support requires modification of the WhisperModel. The evaluation process involves uploading requests to OpenAI, which may incur costs and requires API key management.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

946 stars in the last 30 days