Discover and explore top open-source AI tools and projects—updated daily.
Real-time audio applications framework
Top 28.1% on SourcePulse
Diart is a Python framework for building real-time AI-powered audio applications, specializing in speaker diarization. It enables developers to recognize different speakers in live or recorded audio streams with state-of-the-art performance, offering a flexible pipeline that can be customized, benchmarked, and served via WebSockets.
How It Works
Diart combines speaker segmentation and speaker embedding models within an incremental clustering algorithm. This approach refines accuracy as a conversation progresses. The framework supports custom AI pipelines, hyper-parameter tuning, and web serving. It is built upon pyannote.audio models, leveraging their segmentation and embedding capabilities for efficient and accurate speaker diarization.
Quick Start & Requirements
pip install diart
ffmpeg < 4.4
, portaudio == 19.6.X
, libsndfile >= 1.2.2
. A Conda environment file (environment.yml
) is provided for easier setup.pyannote/segmentation
, pyannote/embedding
).Highlighted Details
SpeakerSegmentation
, OverlapAwareSpeakerEmbedding
).Maintenance & Community
The project is associated with research from Université Paris-Saclay and CNRS. The README includes a citation for the core research paper and notes on reproducibility, recommending pyannote.audio<3.1
.
Licensing & Compatibility
Limitations & Caveats
Transcription and speaker-aware transcription features are listed as "coming soon." Reproducing exact benchmark results may require specific versions of pyannote.audio
.
7 months ago
Inactive