AI-Video-Transcriber  by wendy7756

Transcribe and summarize videos using AI

Created 9 months ago
2,677 stars

Top 17.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered tool for transcribing and summarizing video content from over 30 platforms, including YouTube and TikTok. It targets users needing efficient video content analysis, offering high-accuracy speech-to-text, AI-driven text optimization, and multi-language summarization capabilities, significantly streamlining the process of extracting and understanding information from video media.

How It Works

The system leverages yt-dlp for downloading video content, Faster-Whisper for accurate speech-to-text transcription, and the OpenAI API for advanced AI text optimization (correcting typos, completing sentences, structuring paragraphs) and generating summaries in multiple languages. It also includes conditional translation using GPT-4o when the summary language differs from the detected transcript language. The backend is built with FastAPI, providing a robust API, while the frontend offers a responsive, mobile-friendly interface.

Quick Start & Requirements

  • Primary install / run command: Installation can be done via an automatic script (./install.sh), Docker (docker-compose up -d), or manual Python setup (pip install -r requirements.txt). The service is started with python3 start.py.
  • Non-default prerequisites: Python 3.8+, FFmpeg. An OpenAI API key is required for AI summary and optimization features.
  • Links: Project repository: https://github.com/wendy7756/AI-Video-Transcriber.

Highlighted Details

  • Supports transcription and summarization for 30+ video platforms via yt-dlp.
  • Features AI-driven transcript optimization, including typo correction and intelligent paragraphing.
  • Enables multi-language summaries and conditional translation using GPT-4o.
  • Offers a mobile-friendly interface for accessibility.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap are provided in the README. Contact is directed towards submitting issues or reaching out to the primary developer.

Licensing & Compatibility

The README does not specify the project's license, making its terms of use and compatibility for commercial or closed-source projects unclear.

Limitations & Caveats

AI-powered summarization and text optimization features are dependent on a valid OpenAI API key; functionality is reduced without one. Transcription speed is influenced by video length, chosen Whisper model size, and hardware performance. For very long videos, users are advised to use the --prod flag to prevent potential SSE disconnections. Memory usage can be substantial, particularly with larger Whisper models, with recommendations for 4GB+ RAM.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
123 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.