wespeaker  by wenet-e2e

Speaker toolkit for verification, recognition, and diarization research

created 3 years ago
980 stars

Top 38.5% on sourcepulse

GitHubView on GitHub
Project Summary

WeSpeaker is a comprehensive toolkit for speaker embedding learning, enabling speaker verification, recognition, and diarization. It caters to researchers and developers seeking to implement and advance state-of-the-art speaker-related tasks, offering both command-line and Python API interfaces for ease of use.

How It Works

WeSpeaker focuses on learning robust speaker embeddings, supporting various advanced neural network architectures like ECAPA-TDNN, ResNet, and ERes2Net. It allows for online feature extraction or loading pre-extracted features in Kaldi format, providing flexibility in data handling. The toolkit emphasizes performance, demonstrated by achieving state-of-the-art results on benchmarks like VoxCeleb and CNCeleb, and supports techniques such as self-supervised learning and score calibration.

Quick Start & Requirements

  • Install: pip install git+https://github.com/wenet-e2e/wespeaker.git
  • Development Install: Requires PyTorch >= 1.12.1, torchaudio, cudatoolkit=11.3, and Python 3.9.
  • Prerequisites: CUDA 11.3 is recommended for GPU acceleration.
  • Resources: Development setup involves creating a conda environment and installing dependencies.
  • Docs: Docs

Highlighted Details

  • Supports speaker verification, recognition, and diarization tasks.
  • Achieves state-of-the-art performance on VoxCeleb and CNCeleb datasets.
  • Offers various frontend options, including Whisper-encoder and WavLM.
  • Includes recipes for NIST SRE16 and VoxConverse datasets.

Maintenance & Community

The project is actively maintained with frequent updates, including support for new models and techniques. Community discussions are facilitated via WeChat.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The absence of a clearly stated license is a significant caveat for adoption, particularly in commercial or closed-source environments.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
100 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
created 6 years ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.