AI digital human pipeline using open-source tools
Top 88.2% on sourcepulse
This project provides a framework for creating AI-driven digital humans, targeting users interested in generating realistic virtual characters for various applications. It aims to simplify the process by integrating several open-source AI models for image enhancement, natural language processing, speech synthesis, and facial animation.
How It Works
The system orchestrates a pipeline of specialized AI models. Image super-resolution and face restoration are handled by CodeFormer. Large language model capabilities are provided by ChatGLM2-6B for generating text responses. Text-to-speech is achieved using vits, which can be fine-tuned with custom voice data. Finally, SadTalker drives facial animations on static images using the synthesized audio, creating a lip-synced digital human.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is actively under development, with the author planning a series of video tutorials and code releases. Community engagement details (e.g., Discord, Slack) are not yet specified.
Licensing & Compatibility
The project itself does not specify a license. However, it integrates several components with their own licenses:
Limitations & Caveats
The project is still in development, with a significant portion of the promised tutorials and installation packages yet to be released. Some components used in earlier demonstrations were non-open-source, and the current open-source replacements (ChatGLM2-6B, SadTalker) may result in slightly different output quality.
1 year ago
1 week