AudioToText by Carleslc

CLI tool for audio transcription and translation

Created 2 years ago

372 stars

Top 76.1% on SourcePulse

Project Summary

This project provides a comprehensive solution for transcribing and translating audio files, leveraging OpenAI's Whisper model and the DeepL translation API. It caters to researchers, developers, and users needing to process audio content, offering flexible output formats and multiple deployment options including Google Colab, a local CLI, and Jupyter notebooks.

How It Works

The core functionality relies on the Whisper ASR model for transcription and translation. Users can choose between local execution with various Whisper model sizes or utilize the OpenAI API for potentially faster inference. For translation beyond English, the project integrates with the DeepL API, supporting a wide range of languages. The system can output transcriptions in multiple formats (TXT, VTT, SRT, TSV, JSON) and generate captions suitable for media players.

Quick Start & Requirements

CLI Installation: pip install -U openai-whisper
Prerequisites: Python 3.8-3.10, ffmpeg (installation instructions provided for macOS, Windows, Ubuntu, Arch Linux).
API Keys: OpenAI API key (optional, for faster inference) and DeepL API key (optional, for translation).
Resources: Local execution without a GPU is possible but slow; smaller Whisper models can be used on CPU. Google Colab provides cloud GPU access.
Documentation: Google Colab, CLI Usage Examples, Whisper CLI Usage

Highlighted Details

Supports transcription and translation for numerous languages via Whisper.
DeepL integration enables translation to various target languages.
Flexible output formats include TXT, VTT, SRT, TSV, and JSON.
Offers both a command-line interface (CLI) and Jupyter Notebook for usage.

Maintenance & Community

The repository appears to be actively maintained by Carleslc. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the license.

Limitations & Caveats

Local CPU execution is significantly slower than GPU or API usage. While DeepL supports many languages, Whisper's translation capabilities are primarily focused on English as a target. The README specifies Python 3.8-3.10, which may indicate potential compatibility issues with newer Python versions.

AudioToText by Carleslc

Explore Similar Projects

yt-transcriber by pmarreck

Stage-Whisper by Stage-Whisper

whispering by Sharrnah

openlrc by zh-plus

babelfish.ai by supabase-community

Speech-Translate by Dadangdut33

whispo by egoist

generate-subtitles by mayeaux

writeout.ai by beyondcode

noScribe by kaixxx

vibe by thewh1teagle

pyvideotrans by jianchang512