Easy-Voice-Toolkit  by Spr-Aachen

Local AI voice toolkit for audio processing, recognition, transcription, and conversion

created 2 years ago
802 stars

Top 44.9% on sourcepulse

GitHubView on GitHub
Project Summary

Easy Voice Toolkit is a user-friendly, locally deployable AI audio processing suite for voice recognition, transcription, and conversion. It targets users who need an integrated workflow for audio manipulation, from raw files to speech models, with a focus on voice conversion.

How It Works

The toolkit integrates several open-source projects, including audio-slicer, VoiceprintRecognition, whisper, and GPT-SoVITS. It provides a modular approach, allowing users to select specific tools or chain them for a complete voice conversion workflow. This architecture facilitates a gradual transformation of audio files into speech models.

Quick Start & Requirements

  • Installation: Clone the repository with submodules (git clone --recurse-submodules) and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, PyTorch (with CUDA 11.8 example provided).
  • System: Currently supports Windows only.
  • Resources: A "Ready-to-use portable package" is available for easier setup, containing all dependencies and models.
  • Demo: Google Colab demo available.

Highlighted Details

  • Supports Chinese, English, and Japanese for most functions.
  • Includes tools for audio processing, voice recognition, transcription, dataset creation, model training, and voice conversion.
  • Offers both lightweight installer and a large, ready-to-use portable package.
  • Future features include LLM integration and a C++ (Qt) client refactor.

Maintenance & Community

  • Active development indicated by "WIP" for backend development.
  • Contact details provided for feedback and suggestions.

Licensing & Compatibility

  • The project is free and open-source ("Natürlich~♪").
  • Users are responsible for dataset authorization.
  • Distribution or public sharing requires attribution to the original author and source.
  • Not intended for production environments.

Limitations & Caveats

  • Currently limited to Windows OS.
  • Users must manage dataset authorization and are solely responsible for any infringement issues.
  • Distribution terms require clear indication of voice changing usage and input source details.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
36 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.