video-analyzer  by byjlw

CLI tool for video analysis using LLMs, CV, and ASR

created 8 months ago
964 stars

Top 39.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a tool for analyzing video content using Large Language Models (LLMs), Computer Vision, and Automatic Speech Recognition. It's designed for researchers and developers who need to extract detailed, natural language descriptions from videos, leveraging either local LLM deployments or cloud-based APIs.

How It Works

The system operates in three stages: frame extraction and audio processing, frame analysis, and video reconstruction. It uses OpenCV for intelligent keyframe extraction and Whisper for high-quality audio transcription. Each keyframe is then analyzed by a vision LLM (like Llama3.2 Vision) to capture details, with context from previous frames maintained. Finally, these analyses are combined chronologically with the audio transcript to generate a comprehensive video description.

Quick Start & Requirements

  • Install: pip install . or pip install -e . for development.
  • Prerequisites: Python 3.11+, FFmpeg.
  • Local LLM Requirements: 16GB RAM (32GB recommended), GPU with 12GB+ VRAM or Apple M Series with 32GB+ RAM.
  • Setup: Requires installing FFmpeg and optionally Ollama with a vision model (ollama pull llama3.2-vision).
  • Docs: USAGES.md, DESIGN.md

Highlighted Details

  • Can run entirely locally, with no cloud services or API keys required.
  • Supports OpenAI-compatible LLM services (OpenRouter, OpenAI) for scalability.
  • Features intelligent keyframe extraction and high-quality audio transcription.
  • Generates detailed JSON output including frame-by-frame analysis and a final description.

Maintenance & Community

The project welcomes contributions and provides guidelines in docs/CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache License 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is primarily designed for Linux and macOS; Windows compatibility for local LLM execution might require additional setup. Performance is heavily dependent on the chosen LLM and hardware.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
186 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.