CosyVoice_For_Windows  by v3ucn

Windows version of a voice model

created 1 year ago
717 stars

Top 48.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Windows-specific build of CosyVoice, an advanced text-to-speech (TTS) model. It enables users to perform zero-shot, cross-lingual, and instruction-based voice synthesis with high fidelity, targeting researchers and developers working with multilingual speech generation on Windows.

How It Works

CosyVoice leverages a multi-stage approach, likely incorporating components for acoustic modeling, vocoding, and potentially style/speaker embedding. The project emphasizes optimized performance on Windows, requiring specific versions of Python, CUDA, and cuDNN for accelerated inference. It supports various inference modes, including zero-shot (voice cloning from a short audio sample), cross-lingual (synthesizing speech in one language using a prompt in another), and instruct-based synthesis (generating speech based on text and speaker descriptions).

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment with Python 3.11, install dependencies via pip install -r requirements.txt, and install PyTorch with CUDA support (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121). A specific DeepSpeed build for Windows is also required.
  • Prerequisites: Python 3.11, CUDA 12.1+, cuDNN 9.4+, Git LFS.
  • Models: Pre-trained models (CosyVoice-300M, -SFT, -Instruct, speech_kantts_ttsfrd) must be downloaded.
  • Demo: A web UI can be launched with python3 webui.py.
  • Docs: CosyVoice Paper, CosyVoice Demos, CosyVoice Studio, CosyVoice Code.

Highlighted Details

  • Supports zero-shot, cross-lingual, and instruction-based TTS.
  • Optimized for Windows environments with specific dependency requirements.
  • Offers a web UI for quick experimentation.
  • Provides Docker image for deployment.

Maintenance & Community

The project acknowledges borrowing code from several other open-source projects (FunASR, FunCodec, Matcha-TTS, AcademiCodec, WeNet). Discussion is primarily through GitHub Issues.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the underlying CosyVoice project is typically associated with research and academic use, and commercial use would require careful review of the original project's licensing.

Limitations & Caveats

The setup is highly specific to Windows and requires precise versions of CUDA and other dependencies, which may be challenging to manage. The project is presented as a "version for Windows environment," implying it might not be the latest official release and could lag behind or introduce platform-specific issues.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
57 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.