Chatbot with text-to-speech using VITS and optional ChatGPT/ChatGLM
Top 75.0% on sourcepulse
This repository provides a text-to-speech (TTS) system leveraging VITS and integrates with large language models like GPT-3.5/GPT-3 and ChatGLM for conversational AI applications. It targets developers and researchers building interactive voice agents or chatbots.
How It Works
The system utilizes the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for high-quality speech synthesis. It integrates with external LLMs via API calls to generate responses, which are then fed into the VITS model for speech output. The architecture supports custom chat servers and offers a flexible configuration for different LLM backends and speech processing pipelines.
Quick Start & Requirements
pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
Information regarding maintainers, community channels, or roadmaps is not explicitly detailed in the provided README.
Licensing & Compatibility
The repository's licensing is not specified in the README. Compatibility for commercial use or closed-source linking would depend on the underlying licenses of VITS and the LLM APIs used.
Limitations & Caveats
The README notes that using pyopenjtalk for Japanese synthesis may yield suboptimal results, suggesting an alternative cleaner. It also warns about potential issues with specific dependency versions (e.g., protobuf, transformers) when using ChatGLM. The project's status (e.g., alpha, beta) and long-term maintenance are not clear.
1 year ago
1 day