claude-video by bradautomates

AI video analysis and summarization tool

Created 2 months ago

7,215 stars

Top 7.1% on SourcePulse

Project Summary

This project provides Claude AI with the ability to process and understand video content, addressing the limitation of AI models being unable to directly "watch" videos. It targets users needing deep analysis of video material, such as content creators, marketers, or researchers, offering a significant benefit by enabling AI-driven insights from visual and auditory streams.

How It Works

The system leverages yt-dlp to download video content from a wide array of public URLs or local files. Subsequently, ffmpeg extracts frames at an adaptive rate, dynamically adjusting the frame budget based on video duration to optimize token usage. Transcription is prioritized using free, readily available captions; when absent, it falls back to the Whisper API (preferably Groq's whisper-large-v3 for cost and speed, or OpenAI's whisper-1). The combined data—timestamped transcript and visual frames—is then presented to Claude's multimodal capabilities for comprehensive analysis.

Quick Start & Requirements

Primary install / run command: Installation varies by platform:
- Claude Code: /plugin marketplace add bradautomates/claude-video then /plugin install watch@claude-video.
- claude.ai (web): Download watch.skill from releases and upload via Settings → Capabilities → Skills.
- Codex: git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch.
- Manual/Dev: git clone https://github.com/bradautomates/claude-video.git ~/.claude/skills/watch.
Non-default prerequisites: yt-dlp and ffmpeg are required. These are automatically installed via brew on macOS during the first run, with specific commands provided for Linux and Windows. An API key for Whisper transcription (Groq or OpenAI) is necessary only for videos lacking native captions.
Estimated setup time or resource footprint: Initial setup involves dependency checks and potential installations, followed by optional API key configuration. Subsequent runs are fast.
Links: GitHub repository: https://github.com/bradautomates/claude-video.

Highlighted Details

Supports numerous video sources via yt-dlp integration.
Intelligent frame extraction balances visual detail with token cost constraints.
Prioritizes free native captions, with efficient Whisper API fallback.
Enables Claude to ground answers in visual evidence and audio transcripts.

Maintenance & Community

The README does not detail specific contributors, sponsorships, or community channels (e.g., Discord/Slack). Maintenance appears driven by the single author, bradautomates.

Licensing & Compatibility

License type: MIT License.
Compatibility notes: Compatible with Claude's multimodal "Read" tool. Commercial use is permitted under the MIT license, but users must manage costs associated with third-party APIs like Groq or OpenAI if Whisper fallback is utilized.

Limitations & Caveats

The tool does not support private platforms or videos requiring authentication. Whisper transcription has an approximate 50-minute audio limit per file, necessitating the use of native captions or focused --start/--end flags for longer content. Videos exceeding 10 minutes may trigger a "sparse scan" warning, recommending focused analysis for optimal results and token efficiency.

claude-video by bradautomates

Explore Similar Projects

BiliSum by lycohana

easyvideotrans by sutro-planet

Clip-Anything by SamurAIGPT

ViNote by zrt-ai-lab

remotion-video-skill by wshuyi

prompt-lens by raojiacui

yt-dlp-mcp by kevinwatt

buttercut by barefootford

claude-watch by taoufik123-collab

claude-video-vision by jordanrendric

video-analyzer by byjlw

video-use by browser-use