DICE-Talk by toto222

Create emotional talking portraits with diffusion models

Created 6 months ago

270 stars

Top 95.1% on SourcePulse

Project Summary

DICE-Talk is a diffusion-based method for generating emotional talking head videos from input images and audio. It addresses the challenge of creating vivid and diverse emotional expressions in speaking portraits by disentangling identity and emotion. The project targets researchers and developers working on AI-driven animation and content creation, offering a novel approach to controllable emotional synthesis in facial animation.

How It Works

The core of DICE-Talk is a diffusion-based generative model designed to disentangle identity and emotion. It employs a correlation-aware approach to ensure that generated emotions are consistent with the input audio and visual cues, leading to more realistic and diverse emotional expressions in talking portraits. Key components include pre-trained models for audio processing (Whisper), motion guidance (pose_guider), and video generation (stable-video-diffusion-img2vid-xt).

Quick Start & Requirements

Installation: Install PyTorch with appropriate CUDA version (e.g., cu118), ffmpeg, and then pip install -r requirements.txt.
Prerequisites: Python 3.10, Linux OS, GPU with 20GB+ VRAM.
Model Download: Use huggingface-cli to download checkpoints for DICE-Talk, stable-video-diffusion-img2vid-xt, and whisper-tiny.
Running Demo: Execute python3 demo.py with specified image, audio, and emotion paths.
Running GUI: Launch python3 gradio_app.py for an interactive interface.
Documentation: Visual demos are available on the project page.

Highlighted Details

Accepted by ACM MM'25, indicating significant research contribution.
Generates vivid and diverse emotions for speaking portraits.
Disentangles identity and emotion using a correlation-aware approach.
Provides both command-line demo and Gradio-based GUI for easy interaction.

Maintenance & Community

The project released its initial version in April 2025 with ongoing updates planned. No specific community channels (like Discord or Slack) or notable contributors/sponsorships are mentioned in the provided text.

Licensing & Compatibility

The provided README does not specify a license. This is a critical omission for evaluating adoption and compatibility, especially for commercial use.

Limitations & Caveats

The project is described as an "initial version" with "continuous updates," suggesting it may still be under active development or in an alpha/beta state. It requires a high-end GPU (20GB+ VRAM) and is tested on Linux, potentially limiting its accessibility on other operating systems or lower-spec hardware. The absence of a specified license poses a significant adoption blocker.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days