ER-NeRF  by Fictionarry

Talking head synthesis via efficient region-aware neural radiance fields

created 2 years ago
1,195 stars

Top 33.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of ER-NeRF, an efficient region-aware neural radiance field method for synthesizing high-fidelity talking portraits. It targets researchers and developers in computer vision and graphics working on realistic human avatar generation and animation from speech. The key benefit is high-quality, dynamic portrait synthesis with controllable facial expressions.

How It Works

ER-NeRF utilizes Neural Radiance Fields (NeRFs) to represent 3D scenes, specifically focusing on human heads and torsos. It incorporates a region-aware approach, likely segmenting the face into different regions to handle expressions and movements more effectively. This allows for disentangled control and high-fidelity rendering of talking portraits, achieving impressive visual quality.

Quick Start & Requirements

  • Installation: Requires PyTorch 1.12.1, CUDA 11.3, and Python 3.10. Installation involves creating a conda environment, installing PyTorch and dependencies, and then installing pytorch3d from source.
  • Prerequisites: Requires downloading face-parsing and 3DMM models, and potentially Basel Face Model 2009.
  • Data: Example Obama video provided; custom videos require preprocessing.
  • Usage: Scripts for testing and training are provided, with specific commands for head-only or head+torso models.
  • Links: Paper, Project Page, Video

Highlighted Details

  • Achieves PSNR of 35.607, LPIPS of 0.0178, and LMD of 2.525 for head-only synthesis on the Obama dataset.
  • Supports inference with target audio using various ASR models (DeepSpeech, Wav2Vec, HuBERT).
  • Allows for optional head pose smoothing during inference.
  • Code heavily relies on and acknowledges RAD-NeRF, DFRF, GeneFace, and AD-NeRF.

Maintenance & Community

The project is associated with ICCV 2023. Recent updates mention related work like InsTaG (CVPR 2025) and TalkingGaussian (ECCV 2024), indicating active development in the research group. No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Given its academic nature and reliance on other projects, users should verify licensing for commercial use.

Limitations & Caveats

The installation process requires specific older versions of PyTorch and CUDA, which may pose compatibility challenges with newer systems. Some datasets are not distributed due to copyright, requiring users to source them independently. The "head+torso" model shows a significant drop in PSNR and LPIPS compared to the head-only model.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.