vits_with_chatgpt-gpt3  by Paraworks

Chatbot with text-to-speech using VITS and optional ChatGPT/ChatGLM

created 2 years ago
388 stars

Top 75.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a text-to-speech (TTS) system leveraging VITS and integrates with large language models like GPT-3.5/GPT-3 and ChatGLM for conversational AI applications. It targets developers and researchers building interactive voice agents or chatbots.

How It Works

The system utilizes the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for high-quality speech synthesis. It integrates with external LLMs via API calls to generate responses, which are then fed into the VITS model for speech output. The architecture supports custom chat servers and offers a flexible configuration for different LLM backends and speech processing pipelines.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, Anaconda, Git, FFmpeg. Optional: CUDA for GPU acceleration, pyopenjtalk for Japanese speech synthesis (requires CMake).
  • Setup: The README outlines detailed setup steps for Linux and Windows, including environment creation and dependency installation.
  • Links: Hugging Face Repo

Highlighted Details

  • Supports multiple LLM backends: GPT-3.5/GPT-3 API, ChatGLM.
  • Offers a web UI for chatbot configuration and VITS model loading.
  • Includes an ONNX export tool for the VITS model.
  • Provides guidance on handling Japanese text processing for VITS.

Maintenance & Community

Information regarding maintainers, community channels, or roadmaps is not explicitly detailed in the provided README.

Licensing & Compatibility

The repository's licensing is not specified in the README. Compatibility for commercial use or closed-source linking would depend on the underlying licenses of VITS and the LLM APIs used.

Limitations & Caveats

The README notes that using pyopenjtalk for Japanese synthesis may yield suboptimal results, suggesting an alternative cleaner. It also warns about potential issues with specific dependency versions (e.g., protobuf, transformers) when using ChatGLM. The project's status (e.g., alpha, beta) and long-term maintenance are not clear.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

WeClone by xming521

0.6%
15k
Digital twin one-stop solution
created 1 year ago
updated 5 days ago
Feedback? Help us improve.