Discover and explore top open-source AI tools and projects—updated daily.
jordanrendricClaude plugin for video and audio perception
Top 49.2% on SourcePulse
This Claude Code plugin enables the Claude AI to process and understand video content by extracting visual frames and analyzing audio. It targets users of Claude Code who need to integrate video analysis into their workflows, offering a perception layer that provides Claude with direct visual and auditory data from videos, enhancing its analytical capabilities.
How It Works
The plugin functions as a perception layer for Claude, leveraging ffmpeg for frame extraction and offering flexible backends for audio processing, including the Gemini API, local Whisper (via whisper.cpp or openai-whisper), or the OpenAI API. Video frames are sent to Claude as images, while audio is provided as transcriptions with timestamps. This multimodal approach allows Claude to "see" video frames directly and "hear" the audio content, with adaptive extraction capabilities that automatically adjust frame rate, resolution, and time range based on the user's query.
Quick Start & Requirements
To install, run /plugin marketplace add https://github.com/jordanrendric/claude-video-vision followed by /plugin install claude-video-vision within Claude Code. The MCP server auto-installs via npx. Prerequisites include Node.js 20+, ffmpeg (installation guidance provided by the setup wizard), and potentially API keys for Gemini or OpenAI. Local Whisper setup may require brew install whisper-cpp on macOS. An interactive setup wizard (/setup-video-vision) guides users through configuration, and Whisper models auto-download on first use.
Highlighted Details
~/.claude-video-vision/models/ on first use.Maintenance & Community
This project is at version v1.0.0, marking its initial release. It has been tested on macOS (Apple Silicon) using the local backend with whisper.cpp. The primary contributor is Jordan Vasconcelos, accessible via GitHub as @jordanrendric.
Licensing & Compatibility
The project is released under the MIT License, which is permissive and generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
As an initial release (v1.0.0), the plugin's stability and feature set are likely to evolve. Testing has primarily focused on macOS with the local Whisper backend, suggesting potential compatibility or performance differences on other operating systems or with different backend configurations.
1 week ago
Inactive
harry0703