diart  by juanmc2005

Real-time audio applications framework

Created 4 years ago
1,460 stars

Top 28.1% on SourcePulse

GitHubView on GitHub
Project Summary

Diart is a Python framework for building real-time AI-powered audio applications, specializing in speaker diarization. It enables developers to recognize different speakers in live or recorded audio streams with state-of-the-art performance, offering a flexible pipeline that can be customized, benchmarked, and served via WebSockets.

How It Works

Diart combines speaker segmentation and speaker embedding models within an incremental clustering algorithm. This approach refines accuracy as a conversation progresses. The framework supports custom AI pipelines, hyper-parameter tuning, and web serving. It is built upon pyannote.audio models, leveraging their segmentation and embedding capabilities for efficient and accurate speaker diarization.

Quick Start & Requirements

  • Installation: pip install diart
  • Prerequisites: ffmpeg < 4.4, portaudio == 19.6.X, libsndfile >= 1.2.2. A Conda environment file (environment.yml) is provided for easier setup.
  • Pyannote Models: Requires accepting user conditions and logging into Hugging Face CLI for default models (pyannote/segmentation, pyannote/embedding).
  • Resources: Supports CPU and GPU (RTX 4060 Max-Q tested). Latency benchmarks provided for various models.
  • Documentation: Links to installation, streaming, models, tuning, pipelines, WebSockets, and research papers are available within the README.

Highlighted Details

  • Supports real-time streaming from microphones or audio files.
  • Offers hyper-parameter optimization using Optuna for custom tuning.
  • Enables building custom pipelines by combining modular blocks (e.g., SpeakerSegmentation, OverlapAwareSpeakerEmbedding).
  • Provides WebSocket compatibility for serving pipelines over the web.

Maintenance & Community

The project is associated with research from Université Paris-Saclay and CNRS. The README includes a citation for the core research paper and notes on reproducibility, recommending pyannote.audio<3.1.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and closed-source linking, allowing modification, distribution, and sale of the software.

Limitations & Caveats

Transcription and speaker-aware transcription features are listed as "coming soon." Reproducing exact benchmark results may require specific versions of pyannote.audio.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
58 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.