aTrain  by JuergenFleiss

GUI tool for offline speech transcription, speaker diarization

Created 1 year ago
891 stars

Top 40.7% on SourcePulse

GitHubView on GitHub
Project Summary

aTrain is a GUI tool for offline, privacy-preserving speech-to-text transcription and speaker diarization, designed for researchers and users needing to process sensitive audio data without cloud uploads. It leverages state-of-the-art models for fast, accurate transcriptions across 99 languages and integrates with qualitative analysis software.

How It Works

aTrain utilizes the faster-whisper implementation for high-quality, accelerated transcription and pyannote.audio for speaker diarization. This approach ensures local processing for privacy and GDPR compliance, offering significant speedups over standard implementations, especially when using NVIDIA GPUs.

Quick Start & Requirements

  • Install: Windows users can use the Microsoft Store or download an installer from the BANDAS-Center Website. Beta versions are available for macOS (Apple Silicon) and Debian.
  • Prerequisites: NVIDIA GPU with CUDA toolkit installation is recommended for significant speed improvements.
  • Links: Microsoft Store, BANDAS-Center Website

Highlighted Details

  • Fast transcription times, approximately 3x audio length on modern CPUs and down to 20% on NVIDIA GPUs.
  • Speaker diarization capabilities using pyannote.audio.
  • Supports 99 languages with varying transcription quality.
  • Outputs compatible with MAXQDA, ATLAS.ti, and nVivo, including timestamped audio playback.

Maintenance & Community

Developed by researchers at the University of Graz and tested by Know-Center Graz. A developer wiki is available for contributions.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Beta versions are available for macOS and Debian, suggesting potential stability issues on these platforms. The roadmap indicates ongoing development, with features like batch processing and customizable settings still planned.

Health Check
Last Commit

13 hours ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
40 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.