Discover and explore top open-source AI tools and projects—updated daily.
NavodPeirisAudio AI library for speaker-aware transcription
Top 99.6% on SourcePulse
Speechlib is a Python library that performs speaker diarization, speaker recognition, and transcription on audio files to generate transcripts with identified speaker names. It serves researchers and developers by providing a unified pipeline for extracting structured, speaker-attributed information from audio, simplifying analysis and content understanding.
How It Works
The library employs a multi-stage process starting with audio preprocessing: converting various formats to WAV, ensuring mono channel, and re-encoding to 16-bit PCM. It then utilizes pyannote-audio for speaker diarization and faster-whisper (or other Whisper variants/AssemblyAI) for transcription. Speaker recognition is performed by matching voice samples from a user-provided voices_folder to assign names to transcribed segments.
Quick Start & Requirements
pip install speechlibpyannote/speaker-diarization@2.1.!apt install libcublas11.Highlighted Details
faster-whisper (with optional quantization), custom Whisper models, Hugging Face models, and AssemblyAI.faster-whisper "tiny" model transcribes in ~64s (diarization 24s, recognition 10s); "large" model in ~343s.Maintenance & Community
No specific details regarding maintenance, community channels (e.g., Discord, Slack), or notable contributors were present in the provided README snippet.
Licensing & Compatibility
No explicit license information was found in the provided README snippet.
Limitations & Caveats
Running on Windows without administrator privileges may cause an OSError: [WinError 1314]. Quantization, while speeding up faster-whisper, may reduce transcription accuracy. Access to certain gated Hugging Face models requires explicit user permission and an API token. Performance benchmarks are from Google Colab tests and exclude model download times.
2 weeks ago
Inactive