VASA-1-hack by johndpope

AI code reverse-engineered from a white paper

Created 1 year ago

309 stars

Top 87.1% on SourcePulse

Project Summary

This repository aims to reverse-engineer the VASA-1 model using Claude Sonnet 3.5, focusing on generating talking head videos from a single image and audio. It's an experimental project for researchers and developers interested in understanding and potentially replicating advanced audio-driven facial animation techniques.

How It Works

The project breaks down the VASA model into distinct stages, with a focus on training Stage 1 and Stage 2 components. It utilizes a Diffusion Transformer architecture for motion generation, conditioned on audio and facial features. Key components include a VASAFaceEncoder for disentangled facial representations, VASADiffusionTransformer for motion synthesis, and a VideoGenerator employing a sliding window approach. The training infrastructure is managed by VASATrainer, leveraging PyTorch and the accelerate library for distributed training.

Quick Start & Requirements

Install/Run: Primarily uses accelerate launch for training commands.
Prerequisites: PyTorch, accelerate, wandb, mprof (for memory profiling). Specific hardware requirements (e.g., multi-GPU, CUDA) are implied by the training commands.
Links:
- Progress updates: https://github.com/johndpope/VASA-1-hack/issues/20
- Related project: https://github.com/johndpope/MegaPortrait-hack

Highlighted Details

Utilizes Claude Sonnet 3.5 for code generation and reverse-engineering.
Implements a two-stage training process (Stage 1 and Stage 2).
Features a modular design with classes for data processing, model components, training, and evaluation.
Supports various training configurations including multi-GPU, distributed training, and memory optimization techniques like gradient checkpointing.

Maintenance & Community

This appears to be a personal, experimental project with updates shared via GitHub issues. No specific community channels or roadmap are indicated.

Licensing & Compatibility

The repository does not explicitly state a license. Given its experimental nature and reliance on reverse-engineering, commercial use or integration into closed-source projects may be restricted.

Limitations & Caveats

This is a work-in-progress ("WIP") with ongoing development and potential for instability. Users may encounter Out-of-Memory (OOM) errors during training, and the code's direct applicability or completeness for replicating the original VASA-1 model is not guaranteed.

VASA-1-hack by johndpope

Explore Similar Projects

Squeezeformer by kssteven418

METER by zdou0830

radtts by NVIDIA

VLM2Vec by TIGER-AI-Lab

awesome-tensorlayer by tensorlayer

fish-diffusion by fishaudio

sophon-demo by sophgo

NExT-GPT by NExT-GPT

PyTorch-Tutorial-2nd by TingsongYu

KAIR by cszn

Deep-Learning-Experiments by roatienza

transformers by huggingface