PyTorch code for retrieval-augmented generation with long-context videos
Top 44.9% on sourcepulse
VideoRAG is a PyTorch framework for retrieval-augmented generation designed to process and understand extremely long-context videos, enabling users to query vast amounts of video content. It targets researchers and developers working with extensive video datasets, offering a structured approach to knowledge extraction and question answering from hours of footage.
How It Works
VideoRAG employs a novel dual-channel architecture. It combines graph-driven textual knowledge grounding to model cross-video semantic relationships with hierarchical multimodal context encoding for preserving spatiotemporal visual patterns. This approach dynamically constructs knowledge graphs to maintain semantic coherence across multiple videos, optimizing retrieval efficiency through adaptive multimodal fusion.
Quick Start & Requirements
accelerate
, bitsandbytes
, moviepy
, pytorchvideo
, timm
, fvcore
, eva-decord
, ctranslate2
, faster_whisper
, neo4j
, hnswlib
, xxhash
, nano-vectordb
, transformers
, tiktoken
, openai
, tenacity
, and ImageBind
(installed from source).Highlighted Details
Maintenance & Community
The project is associated with HKUDS and cites foundational work from nano-graphrag and LightRAG. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The repository lists a LICENSE file, but the specific license type and its implications for commercial use or closed-source linking are not detailed in the README.
Limitations & Caveats
Currently tested only in an English environment; multi-language support requires modification of the WhisperModel. The evaluation process involves uploading requests to OpenAI, which may incur costs and requires API key management.
2 days ago
1 day