AI-Digital-Human  by Jack-Cherish

AI digital human pipeline using open-source tools

created 2 years ago
308 stars

Top 88.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for creating AI-driven digital humans, targeting users interested in generating realistic virtual characters for various applications. It aims to simplify the process by integrating several open-source AI models for image enhancement, natural language processing, speech synthesis, and facial animation.

How It Works

The system orchestrates a pipeline of specialized AI models. Image super-resolution and face restoration are handled by CodeFormer. Large language model capabilities are provided by ChatGLM2-6B for generating text responses. Text-to-speech is achieved using vits, which can be fine-tuned with custom voice data. Finally, SadTalker drives facial animations on static images using the synthesized audio, creating a lip-synced digital human.

Quick Start & Requirements

Highlighted Details

  • Leverages CodeFormer for high-quality facial restoration.
  • Integrates ChatGLM2-6B for flexible conversational AI.
  • Supports custom voice training with vits for personalized speech synthesis.
  • Utilizes SadTalker for audio-driven facial animation.

Maintenance & Community

The project is actively under development, with the author planning a series of video tutorials and code releases. Community engagement details (e.g., Discord, Slack) are not yet specified.

Licensing & Compatibility

The project itself does not specify a license. However, it integrates several components with their own licenses:

  • CodeFormer: MIT License
  • ChatGLM2-6B: Apache License 2.0
  • vits: Not explicitly stated in the README, but the linked repository is under the MIT License.
  • SadTalker: Not explicitly stated in the README, but the linked repository is under the Apache License 2.0.
  • Gradio: Apache License 2.0 Compatibility for commercial use depends on the licenses of the individual integrated components.

Limitations & Caveats

The project is still in development, with a significant portion of the promised tutorials and installation packages yet to be released. Some components used in earlier demonstrations were non-open-source, and the current open-source replacements (ChatGLM2-6B, SadTalker) may result in slightly different output quality.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 2 days ago
Feedback? Help us improve.