Discover and explore top open-source AI tools and projects—updated daily.
bradautomatesAI video analysis and summarization tool
Top 28.2% on SourcePulse
This project provides Claude AI with the ability to process and understand video content, addressing the limitation of AI models being unable to directly "watch" videos. It targets users needing deep analysis of video material, such as content creators, marketers, or researchers, offering a significant benefit by enabling AI-driven insights from visual and auditory streams.
How It Works
The system leverages yt-dlp to download video content from a wide array of public URLs or local files. Subsequently, ffmpeg extracts frames at an adaptive rate, dynamically adjusting the frame budget based on video duration to optimize token usage. Transcription is prioritized using free, readily available captions; when absent, it falls back to the Whisper API (preferably Groq's whisper-large-v3 for cost and speed, or OpenAI's whisper-1). The combined data—timestamped transcript and visual frames—is then presented to Claude's multimodal capabilities for comprehensive analysis.
Quick Start & Requirements
/plugin marketplace add bradautomates/claude-video then /plugin install watch@claude-video.watch.skill from releases and upload via Settings → Capabilities → Skills.git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch.git clone https://github.com/bradautomates/claude-video.git ~/.claude/skills/watch.yt-dlp and ffmpeg are required. These are automatically installed via brew on macOS during the first run, with specific commands provided for Linux and Windows. An API key for Whisper transcription (Groq or OpenAI) is necessary only for videos lacking native captions.https://github.com/bradautomates/claude-video.Highlighted Details
yt-dlp integration.Maintenance & Community
The README does not detail specific contributors, sponsorships, or community channels (e.g., Discord/Slack). Maintenance appears driven by the single author, bradautomates.
Licensing & Compatibility
Limitations & Caveats
The tool does not support private platforms or videos requiring authentication. Whisper transcription has an approximate 50-minute audio limit per file, necessitating the use of native captions or focused --start/--end flags for longer content. Videos exceeding 10 minutes may trigger a "sparse scan" warning, recommending focused analysis for optimal results and token efficiency.
2 weeks ago
Inactive
browser-use