RAD-NeRF  by ashawkey

PyTorch for real-time neural talking head synthesis

created 2 years ago
922 stars

Top 40.4% on sourcepulse

GitHubView on GitHub
Project Summary

RAD-NeRF provides a PyTorch re-implementation for real-time neural radiance talking portrait synthesis, decomposing audio and spatial information. It's designed for researchers and developers working on realistic avatar generation and animation from audio input. The project enables the creation of dynamic, talking portraits with impressive visual fidelity.

How It Works

RAD-NeRF leverages Neural Radiance Fields (NeRF) combined with audio-spatial decomposition. It processes input videos and audio to extract facial landmarks, semantic parsing, and head poses. The core innovation lies in its ability to synthesize novel views of a portrait that accurately lip-syncs to an input audio stream, achieving real-time performance through efficient rendering and optimized NeRF representations.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt. Ubuntu users need sudo apt install portaudio19-dev.
  • Prerequisites: PyTorch 1.12, CUDA 11.6, Python 3.x. Requires downloading pre-trained models and pose sequence files. Data pre-processing involves installing pytorch3d and downloading specific models (face-parsing, Basel Face Model).
  • Setup: Initial setup and data pre-processing can take a significant amount of time, depending on video length and data preparation steps.
  • Links: Project Page | Arxiv

Highlighted Details

  • Achieves real-time inference at approximately 40 FPS on a V100 GPU with ~2GB VRAM.
  • Supports both Wav2Vec and DeepSpeech for audio feature extraction.
  • Offers a GUI for visualization and interactive testing.
  • Includes scripts for data pre-processing, training, and inference.

Maintenance & Community

The project is based on AD-NeRF for data pre-processing and torch-ngp for the NeRF framework. The GUI is built with DearPyGui. No specific community channels (Discord/Slack) or active maintenance signals are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, given its reliance on other projects, users should verify licensing compatibility for commercial or closed-source use.

Limitations & Caveats

The installation and data pre-processing steps are complex and require downloading multiple external files and models. The project is tested on Ubuntu 22.04, and compatibility with other operating systems may vary. Training can be memory-intensive, especially when preloading data to GPU.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.