mimic-recording-studio  by MycroftAI

Docker app for recording voice samples to train a TTS voice with Mimic2

created 6 years ago
513 stars

Top 62.0% on sourcepulse

GitHubView on GitHub
Project Summary

Mimic Recording Studio is a Docker-based application designed for collecting voice samples to train custom Text-to-Speech (TTS) voices using Mycroft's Mimic 2 engine. It targets individuals or teams looking to create unique, high-quality synthetic voices, simplifying the data collection process.

How It Works

The application utilizes a Dockerized architecture, separating the frontend (React) and backend (Python/Flask) services. The backend handles audio processing, including automatic silence trimming via FFmpeg, and stores recordings and metadata in a SQLite database. The frontend provides a web interface for users to record, play back audio, and view basic metrics. This containerized approach ensures cross-platform compatibility and simplifies setup.

Quick Start & Requirements

  • Install: git clone https://github.com/MycroftAI/mimic-recording-studio.git && cd mimic-recording-studio && docker-compose up
  • Prerequisites: Docker, Docker Compose. Python 3.5+ and FFmpeg are required for manual backend setup. Node.js and npm/yarn for frontend setup.
  • Setup: Initial docker-compose up may take time to build containers.
  • Docs: Quick Start, Backend Functions, Frontend Functions

Highlighted Details

  • Supports custom corpora in CSV format for training in multiple languages.
  • Backend automatically trims silence from WAV recordings using FFmpeg.
  • SQLite database stores recording metadata, allowing for advanced querying of data.
  • Recommendations provided for optimal recording environments and techniques to ensure voice quality.

Maintenance & Community

Support is available via the Mycroft Forum and Mycroft Chat. Contributions via Pull Requests are welcomed.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. Voice recordings donated to Mycroft must be licensed under the Creative Commons CC0 Public Domain license for use in TTS applications.

Limitations & Caveats

Creating a high-quality voice requires a significant effort, estimated at 15,000-20,000 phrases. The project notes that using a new corpus requires resetting the SQLite database.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.