Easy-Voice-Toolkit by Spr-Aachen

Local AI voice toolkit for audio processing, recognition, transcription, and conversion

Created 3 years ago

874 stars

Top 40.9% on SourcePulse

Project Summary

Easy Voice Toolkit is a user-friendly, locally deployable AI audio processing suite for voice recognition, transcription, and conversion. It targets users who need an integrated workflow for audio manipulation, from raw files to speech models, with a focus on voice conversion.

How It Works

The toolkit integrates several open-source projects, including audio-slicer, VoiceprintRecognition, whisper, and GPT-SoVITS. It provides a modular approach, allowing users to select specific tools or chain them for a complete voice conversion workflow. This architecture facilitates a gradual transformation of audio files into speech models.

Quick Start & Requirements

Installation: Clone the repository with submodules (git clone --recurse-submodules) and install dependencies using pip install -r requirements.txt.
Prerequisites: Python 3.8+, PyTorch (with CUDA 11.8 example provided).
System: Currently supports Windows only.
Resources: A "Ready-to-use portable package" is available for easier setup, containing all dependencies and models.
Demo: Google Colab demo available.

Highlighted Details

Supports Chinese, English, and Japanese for most functions.
Includes tools for audio processing, voice recognition, transcription, dataset creation, model training, and voice conversion.
Offers both lightweight installer and a large, ready-to-use portable package.
Future features include LLM integration and a C++ (Qt) client refactor.

Maintenance & Community

Active development indicated by "WIP" for backend development.
Contact details provided for feedback and suggestions.

Licensing & Compatibility

The project is free and open-source ("Natürlich~♪").
Users are responsible for dataset authorization.
Distribution or public sharing requires attribution to the original author and source.
Not intended for production environments.

Limitations & Caveats

Currently limited to Windows OS.
Users must manage dataset authorization and are solely responsible for any infringement issues.
Distribution terms require clear indication of voice changing usage and input source details.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

0

Star History

3 stars in the last 30 days

Explore Similar Projects

praises by ElmTran

Text-to-speech tool for easy reading

Created 1 year ago

Updated 7 months ago

SpeechGPT-2.0-preview by OpenMOSS

Real-time spoken dialogue system with GPT-4o-level capabilities

Created 1 year ago

Updated 1 year ago

local_llm_assistant by nickbild

Local voice assistant for verbal requests, running on Raspberry Pi

Created 2 years ago

Updated 2 years ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

ollama-voice-mac by apeatling

Offline voice assistant for macOS

Created 2 years ago

Updated 6 months ago

MMVC_Trainer by isletennos

Voice conversion trainer for real-time voice changer

Created 4 years ago

Updated 1 year ago

easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 11 months ago

Updated 1 month ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space) and

Magnus Müller

Magnus Müller(Cofounder of Browser Use).

noScribe by kaixxx

GUI tool for local AI-powered audio transcription

Created 2 years ago

Updated 2 days ago

seed-vc by Plachtaa

CLI tool for zero-shot voice/singing voice conversion, supporting real-time

Created 1 year ago

Updated 10 months ago

Kokoro-FastAPI by remsky

FastAPI wrapper for Kokoro-82M text-to-speech model

Created 1 year ago

Updated 1 month ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI).

piper by rhasspy

Local neural text-to-speech system

Created 3 years ago

Updated 6 months ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 1 day ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm) and

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

Few-shot voice cloning and TTS web UI

Created 2 years ago

Updated 2 weeks ago

Feedback? Help us improve.