claude-video-vision  by jordanrendric

Claude plugin for video and audio perception

Created 1 month ago
685 stars

Top 49.2% on SourcePulse

GitHubView on GitHub
Project Summary

This Claude Code plugin enables the Claude AI to process and understand video content by extracting visual frames and analyzing audio. It targets users of Claude Code who need to integrate video analysis into their workflows, offering a perception layer that provides Claude with direct visual and auditory data from videos, enhancing its analytical capabilities.

How It Works

The plugin functions as a perception layer for Claude, leveraging ffmpeg for frame extraction and offering flexible backends for audio processing, including the Gemini API, local Whisper (via whisper.cpp or openai-whisper), or the OpenAI API. Video frames are sent to Claude as images, while audio is provided as transcriptions with timestamps. This multimodal approach allows Claude to "see" video frames directly and "hear" the audio content, with adaptive extraction capabilities that automatically adjust frame rate, resolution, and time range based on the user's query.

Quick Start & Requirements

To install, run /plugin marketplace add https://github.com/jordanrendric/claude-video-vision followed by /plugin install claude-video-vision within Claude Code. The MCP server auto-installs via npx. Prerequisites include Node.js 20+, ffmpeg (installation guidance provided by the setup wizard), and potentially API keys for Gemini or OpenAI. Local Whisper setup may require brew install whisper-cpp on macOS. An interactive setup wizard (/setup-video-vision) guides users through configuration, and Whisper models auto-download on first use.

Highlighted Details

  • Multimodal Perception: Claude receives video frames as images and audio transcriptions with timestamps, enabling direct visual and auditory understanding.
  • Flexible Backends: Supports Gemini API, local Whisper (fully offline option), and OpenAI API for audio processing.
  • Adaptive Extraction: Automatically adjusts frame rate, resolution, and duration based on the user's specific questions about the video.
  • Auto-Installation: Whisper models are downloaded automatically to ~/.claude-video-vision/models/ on first use.

Maintenance & Community

This project is at version v1.0.0, marking its initial release. It has been tested on macOS (Apple Silicon) using the local backend with whisper.cpp. The primary contributor is Jordan Vasconcelos, accessible via GitHub as @jordanrendric.

Licensing & Compatibility

The project is released under the MIT License, which is permissive and generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

As an initial release (v1.0.0), the plugin's stability and feature set are likely to evolve. Testing has primarily focused on macOS with the local Whisper backend, suggesting potential compatibility or performance differences on other operating systems or with different backend configurations.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
17
Issues (30d)
9
Star History
335 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jiaming Song Jiaming Song(Chief Scientist at Luma AI).

MoneyPrinterTurbo by harry0703

3.2%
59k
AI tool for one-click short video generation from text prompts
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.