AudioToText  by Carleslc

CLI tool for audio transcription and translation

created 2 years ago
351 stars

Top 80.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a comprehensive solution for transcribing and translating audio files, leveraging OpenAI's Whisper model and the DeepL translation API. It caters to researchers, developers, and users needing to process audio content, offering flexible output formats and multiple deployment options including Google Colab, a local CLI, and Jupyter notebooks.

How It Works

The core functionality relies on the Whisper ASR model for transcription and translation. Users can choose between local execution with various Whisper model sizes or utilize the OpenAI API for potentially faster inference. For translation beyond English, the project integrates with the DeepL API, supporting a wide range of languages. The system can output transcriptions in multiple formats (TXT, VTT, SRT, TSV, JSON) and generate captions suitable for media players.

Quick Start & Requirements

  • CLI Installation: pip install -U openai-whisper
  • Prerequisites: Python 3.8-3.10, ffmpeg (installation instructions provided for macOS, Windows, Ubuntu, Arch Linux).
  • API Keys: OpenAI API key (optional, for faster inference) and DeepL API key (optional, for translation).
  • Resources: Local execution without a GPU is possible but slow; smaller Whisper models can be used on CPU. Google Colab provides cloud GPU access.
  • Documentation: Google Colab, CLI Usage Examples, Whisper CLI Usage

Highlighted Details

  • Supports transcription and translation for numerous languages via Whisper.
  • DeepL integration enables translation to various target languages.
  • Flexible output formats include TXT, VTT, SRT, TSV, and JSON.
  • Offers both a command-line interface (CLI) and Jupyter Notebook for usage.

Maintenance & Community

The repository appears to be actively maintained by Carleslc. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the license.

Limitations & Caveats

Local CPU execution is significantly slower than GPU or API usage. While DeepL supports many languages, Whisper's translation capabilities are primarily focused on English as a target. The README specifies Python 3.8-3.10, which may indicate potential compatibility issues with newer Python versions.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.