Discover and explore top open-source AI tools and projects—updated daily.
narcotic-shSpeaker diarization pipeline for rapid, accurate audio analysis
Top 98.6% on SourcePulse
Summary
Senko is a high-performance speaker diarization pipeline designed for speed and accuracy, processing an hour of audio in seconds on modern hardware. It targets engineers and researchers needing efficient audio segmentation, offering significant speedups over traditional methods and powering applications like the Zanshin media player.
How It Works
Senko optimizes the 3D-Speaker diarization pipeline through several key modifications for enhanced speed and efficiency. It employs either Pyannote segmentation-3.0 or Silero VAD for precise voice activity detection. Feature extraction uses Fbank, performed upfront and accelerated on GPU via kaldifeat for NVIDIA or efficiently on CPU using all cores otherwise. Speaker embeddings are generated using batched inference of the CAM++ model. Clustering is performed efficiently, leveraging RAPIDS for GPU acceleration on NVIDIA hardware (CUDA compute capability 7.0+) or UMAP+HDBSCAN. On macOS, models are run through CoreML for hardware acceleration on Apple Silicon.
Quick Start & Requirements
Installation involves creating a Python 3.13 virtual environment using uv and then installing via pip:
uv pip install "git+https://github.com/narcotic-sh/senko.git[nvidia]"uv pip install "git+https://github.com/narcotic-sh/senko.git[nvidia-old]"uv pip install "git+https://github.com/narcotic-sh/senko.git"Prerequisites include gcc/clang (Linux/WSL) or Xcode Command Line Tools (macOS). NVIDIA installations require CUDA 12 capable drivers.
See examples/diarize.py for usage examples and DOCS.md for detailed documentation.
Highlighted Details
reaper_speech_diarizer, scribe (for speaker-attributed transcripts), and verbatim (for multilingual speech-to-text).Maintenance & Community
A Discord server is available for community support, feature suggestions, and development discussions. Specific details on core contributors or sponsorships are not provided in the README.
Licensing & Compatibility
The README does not explicitly state the project's license. Compatibility is noted for Linux, macOS, and WSL. Native Windows installation details are in WINDOWS.md and may have specific limitations.
Limitations & Caveats
Performance is sensitive to audio recording quality; background noise or low fidelity degrades accuracy. Highly similar voices may be misclassified, and distinct recording conditions for the same speaker can lead to multiple speaker detections. The pipeline currently does not output overlapping speaker segments, though this is a planned feature.
1 week ago
Inactive
antirez