yt-short-clipper by jipraks

AI-powered tool for YouTube short-form video generation

Created 5 months ago

868 stars

Top 40.6% on SourcePulse

Project Summary

This project addresses the challenge of efficiently repurposing long-form YouTube content into engaging short-form videos for platforms like TikTok, Instagram Reels, and YouTube Shorts. It targets content creators and media professionals seeking to automate the time-consuming process of identifying highlights, editing, and formatting clips, thereby increasing content reach and engagement with minimal manual effort.

How It Works

The pipeline leverages AI to automate video transformation. It begins by downloading YouTube videos and their subtitles using yt-dlp. Subsequently, GPT-4 analyzes the transcript to identify key engaging segments (60-120 seconds), generating hook text for intros. The video is then processed: clipped to the identified segments, converted to a 9:16 aspect ratio with intelligent speaker tracking (using OpenCV or MediaPipe), and enhanced with AI-generated TTS voiceovers for hooks. Finally, Whisper API generates word-by-word highlighted captions, and FFmpeg burns them into the video, producing ready-to-publish clips with SEO-optimized metadata.

Quick Start & Requirements

Primary install/run command (for developers): Clone the repository, install system dependencies (Python 3.10+, FFmpeg 4.4+, yt-dlp), install Python dependencies (pip install -r requirements.txt), and run python app.py.
Non-default prerequisites: Requires API keys for AI providers (e.g., OpenAI GPT-4, Whisper, TTS; Google Gemini; Groq; Anthropic Claude).
Links: English Guide, Panduan Indonesia, BUILD.md, CONTRIBUTING.md.

Highlighted Details

AI-driven highlight detection using GPT-4 to pinpoint engaging moments.
Portrait video conversion (9:16) with two face/speaker tracking modes: fast OpenCV or more accurate MediaPipe.
Automated generation of attention-grabbing intro "hook" scenes with AI text and TTS voiceover.
CapCut-style word-by-word highlighted captions generated via OpenAI Whisper API.
Automatic generation of SEO-optimized titles, descriptions, and tags for each clip.

Maintenance & Community

Contributions are welcomed via GitHub issues for bug reports and feature requests, and pull requests for code improvements. A detailed CONTRIBUTING.md guide is available, including a Git tutorial for beginners.

Licensing & Compatibility

This project is licensed under the MIT License. The disclaimer specifies the tool is for personal/educational use only, and users must ensure they have the rights to the content they process and respect YouTube's Terms of Service.

Limitations & Caveats

The tool is explicitly stated for personal/educational use, and users are responsible for content rights and YouTube's ToS compliance. The MediaPipe face detection mode is noted as being 2-3 times slower than the OpenCV alternative.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

81 stars in the last 30 days