I_am_a_person  by yangkang2021

GPT digital human tech notes (not an open-source project)

created 2 years ago
483 stars

Top 64.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive technical overview and notes on building real-time interactive GPT digital humans. It serves as a knowledge base for researchers and developers interested in the various components and techniques involved in creating such systems, rather than a deployable open-source project.

How It Works

The project outlines a modular approach, detailing components for data preprocessing (video segmentation, face detection/recognition, matting), digital human appearance generation (AI art, face swapping), input processing (speech recognition), core intelligence (large language models for role-playing and chat), output generation (text-to-speech for speech and singing), and finally, digital human driving (motion capture, 3D reconstruction, NeRF, Gaussian Splatting). This breakdown allows for a systematic understanding of the complex pipeline required for interactive digital humans.

Quick Start & Requirements

This is a collection of notes and links to external projects, not a single installable package. Requirements vary significantly based on the specific sub-component being explored, often including Python, deep learning frameworks (PyTorch, TensorFlow), specific libraries (OpenCV, FFmpeg), and potentially GPU acceleration with CUDA for many AI models.

Highlighted Details

  • Extensive coverage of state-of-the-art models for each stage, including TransNetV2 for video segmentation, Wav2Lip for lip-sync, DeepLabV3/SAM2 for matting, and various LLMs like MiniCPM-V and Phi-3-v.
  • Detailed sections on speech synthesis, covering both general TTS (VITS, XTTS) and singing TTS (so-vits-svc, MaskGCT).
  • Exploration of advanced 3D reconstruction techniques like NeRF and Gaussian Splatting for digital human representation.
  • References to cutting-edge research and tools like Apple's HUGS and Meta's ExAvatar for 3D human synthesis.

Maintenance & Community

The repository is maintained by yangkang2021. It primarily links to external GitHub repositories and research papers, indicating community engagement through the referenced projects.

Licensing & Compatibility

The licensing is not specified for this collection of notes. However, the linked external projects have their own licenses, which may include permissive (MIT, Apache) or restrictive licenses, impacting commercial use or closed-source integration.

Limitations & Caveats

This repository is a technical documentation and reference guide, not a ready-to-use software package. Users will need to individually set up, configure, and integrate the various linked open-source projects and models, which can be complex and resource-intensive.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.