mimic-recording-studio by MycroftAI

Docker app for recording voice samples to train a TTS voice with Mimic2

Created 7 years ago

510 stars

Top 61.4% on SourcePulse

Project Summary

Mimic Recording Studio is a Docker-based application designed for collecting voice samples to train custom Text-to-Speech (TTS) voices using Mycroft's Mimic 2 engine. It targets individuals or teams looking to create unique, high-quality synthetic voices, simplifying the data collection process.

How It Works

The application utilizes a Dockerized architecture, separating the frontend (React) and backend (Python/Flask) services. The backend handles audio processing, including automatic silence trimming via FFmpeg, and stores recordings and metadata in a SQLite database. The frontend provides a web interface for users to record, play back audio, and view basic metrics. This containerized approach ensures cross-platform compatibility and simplifies setup.

Quick Start & Requirements

Install: git clone https://github.com/MycroftAI/mimic-recording-studio.git && cd mimic-recording-studio && docker-compose up
Prerequisites: Docker, Docker Compose. Python 3.5+ and FFmpeg are required for manual backend setup. Node.js and npm/yarn for frontend setup.
Setup: Initial docker-compose up may take time to build containers.
Docs: Quick Start, Backend Functions, Frontend Functions

Highlighted Details

Supports custom corpora in CSV format for training in multiple languages.
Backend automatically trims silence from WAV recordings using FFmpeg.
SQLite database stores recording metadata, allowing for advanced querying of data.
Recommendations provided for optimal recording environments and techniques to ensure voice quality.

Maintenance & Community

Support is available via the Mycroft Forum and Mycroft Chat. Contributions via Pull Requests are welcomed.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. Voice recordings donated to Mycroft must be licensed under the Creative Commons CC0 Public Domain license for use in TTS applications.

Limitations & Caveats

Creating a high-quality voice requires a significant effort, estimated at 15,000-20,000 phrases. The project notes that using a new corpus requires resetting the SQLite database.

mimic-recording-studio by MycroftAI

Explore Similar Projects

speechlib by NavodPeiris

speech-dataset-generator by davidmartinrius

smol-podcaster by FanaHOVA

LiveWhisper by Nikorasu

Speech-to-Text-Russian by SergeyShk

meetingmind by misbahsy

orpheus-tts-local by isaiahbjork

Easy-Voice-Toolkit by Spr-Aachen

noScribe by kaixxx

Kimi-Audio by MoonshotAI

whisper-asr-webservice by ahmetoner

sherpa-onnx by k2-fsa