Linly-Talker  by Kedreamix

Digital avatar conversational system

created 1 year ago
2,815 stars

Top 17.3% on sourcepulse

GitHubView on GitHub
Project Summary

Linly-Talker is an AI-powered digital human conversational system designed for interactive dialogue and visual generation. It targets users seeking to create and engage with virtual personas, offering a rich feature set for personalized human-AI interaction.

How It Works

This system integrates multiple AI models for speech recognition (Whisper, FunASR), text-to-speech (Edge TTS, PaddleTTS, CosyVoice), voice cloning (GPT-SoVITS, XTTS, CosyVoice), large language models (Linly, Qwen, Gemini-Pro, ChatGPT), and talking head generation (SadTalker, Wav2Lip, ER-NeRF, MuseTalk). It leverages a Gradio-based WebUI for an interactive experience, allowing users to upload images and engage in multi-turn conversations with AI-driven digital humans.

Quick Start & Requirements

  • Installation: Clone the repository, set up a Conda environment (Python 3.10 recommended), and install dependencies via requirements_webui.txt. PyTorch installation requires specifying CUDA version (e.g., conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia). Model downloads are handled via a provided script (scripts/download_models.sh) or manual methods (Hugging Face, ModelScope).
  • Prerequisites: NVIDIA GPU with CUDA support is essential. Specific CUDA versions (11.8, 12.1, 12.4) are listed for PyTorch installation.
  • Resources: Model downloads can be substantial. Setup time depends on download speed and dependency installation.
  • Links: Colab Notebook, Hugging Face Models, API Documentation.

Highlighted Details

  • Supports real-time conversation with MuseTalk integration.
  • Offers one-shot voice cloning with GPT-SoVITS and CosyVoice (3-10 seconds of audio).
  • Integrates multiple LLMs including Gemini-Pro and Qwen.
  • Gradio WebUI allows for custom character image uploads and multi-module/model selection.
  • Includes API documentation for programmatic access.

Maintenance & Community

The project is actively updated, with frequent additions of new models and features. Community interaction is encouraged via GitHub issues and pull requests.

Licensing & Compatibility

The project is licensed under MIT. However, users must comply with the licenses of all integrated third-party models and components. Commercial use may be restricted by these underlying licenses.

Limitations & Caveats

Installation can be complex due to numerous dependencies and specific version requirements. Some models, like ER-NeRF, may require specific setup or model replacements for optimal results. Edge TTS has reported IP restrictions.

Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
201 stars in the last 90 days

Explore Similar Projects

Starred by Addy Osmani Addy Osmani(Engineering Leader on Google Chrome), Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), and
1 more.

chatbox by chatboxai

0.3%
36k
Desktop client app for AI models/LLMs
created 2 years ago
updated 6 days ago
Feedback? Help us improve.