video-search-and-summarization by NVIDIA-AI-Blueprints

Video analytics and Q&A powered by generative AI

Created 1 year ago

415 stars

Top 70.8% on SourcePulse

Project Summary

This NVIDIA AI Blueprint provides a framework for ingesting and analyzing massive video datasets to generate insights, summaries, and enable interactive Q&A. It targets video analysts, IT engineers, and GenAI/ML engineers seeking to build custom video analytics AI agents, offering a plug-and-play approach with extensive customization options for advanced users. The blueprint leverages NVIDIA's NIM microservices and generative AI models to unlock new possibilities in video understanding for applications like smart space monitoring and warehouse automation.

How It Works

The system processes video data through an ingestion pipeline that decodes segments, selects frames, and generates detailed captions using a Vision-Language Model (VLM). Concurrently, computer vision metadata and audio transcriptions are produced. This enriched data is indexed into vector and graph databases. The core intelligence resides in the Context-Aware Retrieval-Augmented Generation (CA-RAG) module, which combines Vector RAG and Graph-RAG. This dual-RAG approach enhances temporal reasoning, anomaly detection, and multi-hop question-answering by retrieving context from both databases, enabling deeper understanding and efficient management of extensive video data.

Quick Start & Requirements

Installation: Deployment options include Docker Compose, Helm charts (for x86 platforms), and Brev Launchable notebooks.
Prerequisites: Requires an NVIDIA AI Enterprise developer license for local NIM hosting, API catalog keys, specific NVIDIA drivers (e.g., 580.65.06+), CUDA (13.0+), NVIDIA Container Toolkit (1.13.5+), and Docker (27.5.1+). Helm deployments require Kubernetes v1.31.2+ and NVIDIA GPU Operator v23.9+.
Hardware: Minimum GPU requirements vary significantly by deployment type and model configuration, ranging from a single GPU (e.g., 1x H100/A100 80GB) for reduced compute or single-GPU deployments, up to 8x high-end GPUs (e.g., 8x H200/A100 80GB) for default local deployments. Remote deployments require a minimum 8GB VRAM GPU.
Documentation: Detailed instructions are available at the official documentation link provided in the README.

Highlighted Details

Powered by NVIDIA NIM microservices, utilizing models like Cosmos-Reason1-7B, Llama-3.1-70b-instruct, and Llama-3.2-nv-embedqa-1b-v2.
Employs Context-Aware Retrieval-Augmented Generation (CA-RAG) integrating both Vector and Graph RAG for advanced video understanding.
Offers flexible deployment strategies including Docker Compose for development, Helm for production, and Brev Launchable for quick starts.
Supports comprehensive video analysis including summarization, Q&A, and alert generation.

Maintenance & Community

The provided README does not detail specific community channels (like Discord or Slack), active maintainers, or sponsorship information.

Licensing & Compatibility

The project license is available via a LICENSE file. As an NVIDIA AI Blueprint, usage may be tied to the NVIDIA AI Enterprise license, particularly for accessing proprietary models and services.

Limitations & Caveats

The VSS Engine 2.4.0 container has known CVEs (CVE-2024-8966, CVE-2025-4565, CVE-2025-3887), though the README states these do not affect VSS due to specific dependency versions or usage patterns. However, CVE-2025-3887 related to the GStreamer H.265 codec parser requires users to ensure malicious streams are not added or to build patched GStreamer libraries. Helm deployments are exclusively supported on x86 platforms.

video-search-and-summarization by NVIDIA-AI-Blueprints

Explore Similar Projects

VideoMind by yeliudev

beyondllm by aiplanethub

BilibiliHistoryFetcher by LifeArchiveProject

stream.new by muxinc

aiWatchdog by mozhang-ah

autoclip_mvp by zhouxiaoka

SwanLab by SwanHubX

Director by video-db

langchain4j-aideepin by moyangzhan

VideoPipe by sherlockchou86

EFAK by smartloli

awesome-llm-apps by Shubhamsaboo