video-analyzer by byjlw

CLI tool for video analysis using LLMs, CV, and ASR

Created 1 year ago

1,242 stars

Top 31.7% on SourcePulse

Project Summary

This project provides a tool for analyzing video content using Large Language Models (LLMs), Computer Vision, and Automatic Speech Recognition. It's designed for researchers and developers who need to extract detailed, natural language descriptions from videos, leveraging either local LLM deployments or cloud-based APIs.

How It Works

The system operates in three stages: frame extraction and audio processing, frame analysis, and video reconstruction. It uses OpenCV for intelligent keyframe extraction and Whisper for high-quality audio transcription. Each keyframe is then analyzed by a vision LLM (like Llama3.2 Vision) to capture details, with context from previous frames maintained. Finally, these analyses are combined chronologically with the audio transcript to generate a comprehensive video description.

Quick Start & Requirements

Install: pip install . or pip install -e . for development.
Prerequisites: Python 3.11+, FFmpeg.
Local LLM Requirements: 16GB RAM (32GB recommended), GPU with 12GB+ VRAM or Apple M Series with 32GB+ RAM.
Setup: Requires installing FFmpeg and optionally Ollama with a vision model (ollama pull llama3.2-vision).
Docs: USAGES.md, DESIGN.md

Highlighted Details

Can run entirely locally, with no cloud services or API keys required.
Supports OpenAI-compatible LLM services (OpenRouter, OpenAI) for scalability.
Features intelligent keyframe extraction and high-quality audio transcription.
Generates detailed JSON output including frame-by-frame analysis and a final description.

Maintenance & Community

The project welcomes contributions and provides guidelines in docs/CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache License 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is primarily designed for Linux and macOS; Windows compatibility for local LLM execution might require additional setup. Performance is heavily dependent on the chosen LLM and hardware.

video-analyzer by byjlw

Explore Similar Projects

llmpeg by jjcm

LongVA by EvolvingLMMs-Lab

Video-MME by MME-Benchmarks

tarsier by bytedance

llavavision by lxe

subvert by aschmelyun

MiniGPT4-video by Vision-CAIR

VideoLLaMA2 by DAMO-NLP-SG

tldw by the-crypt-keeper

Awesome-LLMs-for-Video-Understanding by yunlong10

CogVLM2 by zai-org

Video-LLaMA by DAMO-NLP-SG