AI-Video-Transcriber by wendy7756

Transcribe and summarize videos using AI

Created 9 months ago

2,677 stars

Top 17.1% on SourcePulse

Project Summary

This project provides an AI-powered tool for transcribing and summarizing video content from over 30 platforms, including YouTube and TikTok. It targets users needing efficient video content analysis, offering high-accuracy speech-to-text, AI-driven text optimization, and multi-language summarization capabilities, significantly streamlining the process of extracting and understanding information from video media.

How It Works

The system leverages yt-dlp for downloading video content, Faster-Whisper for accurate speech-to-text transcription, and the OpenAI API for advanced AI text optimization (correcting typos, completing sentences, structuring paragraphs) and generating summaries in multiple languages. It also includes conditional translation using GPT-4o when the summary language differs from the detected transcript language. The backend is built with FastAPI, providing a robust API, while the frontend offers a responsive, mobile-friendly interface.

Quick Start & Requirements

Primary install / run command: Installation can be done via an automatic script (./install.sh), Docker (docker-compose up -d), or manual Python setup (pip install -r requirements.txt). The service is started with python3 start.py.
Non-default prerequisites: Python 3.8+, FFmpeg. An OpenAI API key is required for AI summary and optimization features.
Links: Project repository: https://github.com/wendy7756/AI-Video-Transcriber.

Highlighted Details

Supports transcription and summarization for 30+ video platforms via yt-dlp.
Features AI-driven transcript optimization, including typo correction and intelligent paragraphing.
Enables multi-language summaries and conditional translation using GPT-4o.
Offers a mobile-friendly interface for accessibility.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap are provided in the README. Contact is directed towards submitting issues or reaching out to the primary developer.

Licensing & Compatibility

The README does not specify the project's license, making its terms of use and compatibility for commercial or closed-source projects unclear.

Limitations & Caveats

AI-powered summarization and text optimization features are dependent on a valid OpenAI API key; functionality is reduced without one. Transcription speed is influenced by video length, chosen Whisper model size, and hardware performance. For very long videos, users are advised to use the --prod flag to prevent potential SSE disconnections. Memory usage can be substantial, particularly with larger Whisper models, with recommendations for 4GB+ RAM.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

123 stars in the last 30 days