AI-Digital-Human by Jack-Cherish

AI digital human pipeline using open-source tools

Created 2 years ago

311 stars

Top 86.8% on SourcePulse

Project Summary

This project provides a framework for creating AI-driven digital humans, targeting users interested in generating realistic virtual characters for various applications. It aims to simplify the process by integrating several open-source AI models for image enhancement, natural language processing, speech synthesis, and facial animation.

How It Works

The system orchestrates a pipeline of specialized AI models. Image super-resolution and face restoration are handled by CodeFormer. Large language model capabilities are provided by ChatGLM2-6B for generating text responses. Text-to-speech is achieved using vits, which can be fine-tuned with custom voice data. Finally, SadTalker drives facial animations on static images using the synthesized audio, creating a lip-synced digital human.

Quick Start & Requirements

Installation and setup details are pending, with a promise of one-click installation packages and detailed tutorials in upcoming video releases.
Key dependencies include Python, and potentially specific versions of libraries for image processing, LLMs, and audio synthesis.
Links to related projects:
- CodeFormer: https://github.com/sczhou/CodeFormer
- ChatGLM2-6B: https://github.com/THUDM/ChatGLM2-6B
- vits: https://github.com/jaywalnut310/vits
- SadTalker: https://github.com/OpenTalker/SadTalker
- Gradio (for UI): https://github.com/gradio-app/gradio

Highlighted Details

Leverages CodeFormer for high-quality facial restoration.
Integrates ChatGLM2-6B for flexible conversational AI.
Supports custom voice training with vits for personalized speech synthesis.
Utilizes SadTalker for audio-driven facial animation.

Maintenance & Community

The project is actively under development, with the author planning a series of video tutorials and code releases. Community engagement details (e.g., Discord, Slack) are not yet specified.

Licensing & Compatibility

The project itself does not specify a license. However, it integrates several components with their own licenses:

CodeFormer: MIT License
ChatGLM2-6B: Apache License 2.0
vits: Not explicitly stated in the README, but the linked repository is under the MIT License.
SadTalker: Not explicitly stated in the README, but the linked repository is under the Apache License 2.0.
Gradio: Apache License 2.0 Compatibility for commercial use depends on the licenses of the individual integrated components.

Limitations & Caveats

The project is still in development, with a significant portion of the promised tutorials and installation packages yet to be released. Some components used in earlier demonstrations were non-open-source, and the current open-source replacements (ChatGLM2-6B, SadTalker) may result in slightly different output quality.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days