Linly-Talker by Kedreamix

Digital avatar conversational system

Created 2 years ago

3,083 stars

Top 15.4% on SourcePulse

Project Summary

Linly-Talker is an AI-powered digital human conversational system designed for interactive dialogue and visual generation. It targets users seeking to create and engage with virtual personas, offering a rich feature set for personalized human-AI interaction.

How It Works

This system integrates multiple AI models for speech recognition (Whisper, FunASR), text-to-speech (Edge TTS, PaddleTTS, CosyVoice), voice cloning (GPT-SoVITS, XTTS, CosyVoice), large language models (Linly, Qwen, Gemini-Pro, ChatGPT), and talking head generation (SadTalker, Wav2Lip, ER-NeRF, MuseTalk). It leverages a Gradio-based WebUI for an interactive experience, allowing users to upload images and engage in multi-turn conversations with AI-driven digital humans.

Quick Start & Requirements

Installation: Clone the repository, set up a Conda environment (Python 3.10 recommended), and install dependencies via requirements_webui.txt. PyTorch installation requires specifying CUDA version (e.g., conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia). Model downloads are handled via a provided script (scripts/download_models.sh) or manual methods (Hugging Face, ModelScope).
Prerequisites: NVIDIA GPU with CUDA support is essential. Specific CUDA versions (11.8, 12.1, 12.4) are listed for PyTorch installation.
Resources: Model downloads can be substantial. Setup time depends on download speed and dependency installation.
Links: Colab Notebook, Hugging Face Models, API Documentation.

Highlighted Details

Supports real-time conversation with MuseTalk integration.
Offers one-shot voice cloning with GPT-SoVITS and CosyVoice (3-10 seconds of audio).
Integrates multiple LLMs including Gemini-Pro and Qwen.
Gradio WebUI allows for custom character image uploads and multi-module/model selection.
Includes API documentation for programmatic access.

Maintenance & Community

The project is actively updated, with frequent additions of new models and features. Community interaction is encouraged via GitHub issues and pull requests.

Licensing & Compatibility

The project is licensed under MIT. However, users must comply with the licenses of all integrated third-party models and components. Commercial use may be restricted by these underlying licenses.

Limitations & Caveats

Installation can be complex due to numerous dependencies and specific version requirements. Some models, like ER-NeRF, may require specific setup or model replacements for optimal results. Edge TTS has reported IP restrictions.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

59 stars in the last 30 days