Linly-Talker  by Kedreamix

Digital avatar conversational system

Created 1 year ago
2,926 stars

Top 16.3% on SourcePulse

GitHubView on GitHub
Project Summary

Linly-Talker is an AI-powered digital human conversational system designed for interactive dialogue and visual generation. It targets users seeking to create and engage with virtual personas, offering a rich feature set for personalized human-AI interaction.

How It Works

This system integrates multiple AI models for speech recognition (Whisper, FunASR), text-to-speech (Edge TTS, PaddleTTS, CosyVoice), voice cloning (GPT-SoVITS, XTTS, CosyVoice), large language models (Linly, Qwen, Gemini-Pro, ChatGPT), and talking head generation (SadTalker, Wav2Lip, ER-NeRF, MuseTalk). It leverages a Gradio-based WebUI for an interactive experience, allowing users to upload images and engage in multi-turn conversations with AI-driven digital humans.

Quick Start & Requirements

  • Installation: Clone the repository, set up a Conda environment (Python 3.10 recommended), and install dependencies via requirements_webui.txt. PyTorch installation requires specifying CUDA version (e.g., conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia). Model downloads are handled via a provided script (scripts/download_models.sh) or manual methods (Hugging Face, ModelScope).
  • Prerequisites: NVIDIA GPU with CUDA support is essential. Specific CUDA versions (11.8, 12.1, 12.4) are listed for PyTorch installation.
  • Resources: Model downloads can be substantial. Setup time depends on download speed and dependency installation.
  • Links: Colab Notebook, Hugging Face Models, API Documentation.

Highlighted Details

  • Supports real-time conversation with MuseTalk integration.
  • Offers one-shot voice cloning with GPT-SoVITS and CosyVoice (3-10 seconds of audio).
  • Integrates multiple LLMs including Gemini-Pro and Qwen.
  • Gradio WebUI allows for custom character image uploads and multi-module/model selection.
  • Includes API documentation for programmatic access.

Maintenance & Community

The project is actively updated, with frequent additions of new models and features. Community interaction is encouraged via GitHub issues and pull requests.

Licensing & Compatibility

The project is licensed under MIT. However, users must comply with the licenses of all integrated third-party models and components. Commercial use may be restricted by these underlying licenses.

Limitations & Caveats

Installation can be complex due to numerous dependencies and specific version requirements. Some models, like ER-NeRF, may require specific setup or model replacements for optimal results. Edge TTS has reported IP restrictions.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
70 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.