claude-video  by bradautomates

AI video analysis and summarization tool

Created 1 month ago
1,414 stars

Top 28.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides Claude AI with the ability to process and understand video content, addressing the limitation of AI models being unable to directly "watch" videos. It targets users needing deep analysis of video material, such as content creators, marketers, or researchers, offering a significant benefit by enabling AI-driven insights from visual and auditory streams.

How It Works

The system leverages yt-dlp to download video content from a wide array of public URLs or local files. Subsequently, ffmpeg extracts frames at an adaptive rate, dynamically adjusting the frame budget based on video duration to optimize token usage. Transcription is prioritized using free, readily available captions; when absent, it falls back to the Whisper API (preferably Groq's whisper-large-v3 for cost and speed, or OpenAI's whisper-1). The combined data—timestamped transcript and visual frames—is then presented to Claude's multimodal capabilities for comprehensive analysis.

Quick Start & Requirements

  • Primary install / run command: Installation varies by platform:
    • Claude Code: /plugin marketplace add bradautomates/claude-video then /plugin install watch@claude-video.
    • claude.ai (web): Download watch.skill from releases and upload via Settings → Capabilities → Skills.
    • Codex: git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch.
    • Manual/Dev: git clone https://github.com/bradautomates/claude-video.git ~/.claude/skills/watch.
  • Non-default prerequisites: yt-dlp and ffmpeg are required. These are automatically installed via brew on macOS during the first run, with specific commands provided for Linux and Windows. An API key for Whisper transcription (Groq or OpenAI) is necessary only for videos lacking native captions.
  • Estimated setup time or resource footprint: Initial setup involves dependency checks and potential installations, followed by optional API key configuration. Subsequent runs are fast.
  • Links: GitHub repository: https://github.com/bradautomates/claude-video.

Highlighted Details

  • Supports numerous video sources via yt-dlp integration.
  • Intelligent frame extraction balances visual detail with token cost constraints.
  • Prioritizes free native captions, with efficient Whisper API fallback.
  • Enables Claude to ground answers in visual evidence and audio transcripts.

Maintenance & Community

The README does not detail specific contributors, sponsorships, or community channels (e.g., Discord/Slack). Maintenance appears driven by the single author, bradautomates.

Licensing & Compatibility

  • License type: MIT License.
  • Compatibility notes: Compatible with Claude's multimodal "Read" tool. Commercial use is permitted under the MIT license, but users must manage costs associated with third-party APIs like Groq or OpenAI if Whisper fallback is utilized.

Limitations & Caveats

The tool does not support private platforms or videos requiring authentication. Whisper transcription has an approximate 50-minute audio limit per file, necessitating the use of native captions or focused --start/--end flags for longer content. Videos exceeding 10 minutes may trigger a "sparse scan" warning, recommending focused analysis for optimal results and token efficiency.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
16
Issues (30d)
9
Star History
1,424 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.