HelloMeme by HelloVision

Diffusion model research paper implementation

Created 1 year ago

627 stars

Top 52.8% on SourcePulse

Project Summary

HelloMeme provides a framework for generating high-fidelity images and videos conditioned on reference inputs, targeting researchers and developers in generative AI and computer vision. It enables users to create novel visual content by transferring styles and poses from reference images or videos to generated outputs.

How It Works

HelloMeme integrates "Spatial Knitting Attentions" into diffusion models, allowing for the embedding of high-level and fidelity-rich conditions. This approach enhances control over the generation process, enabling precise style and pose transfer. The system leverages pre-trained diffusion models and custom adapters (ReferenceAdapter, HMControlNet) for detailed manipulation of generated content.

Quick Start & Requirements

Install: Create a conda environment, install PyTorch (refer to official site), ensure FFmpeg is installed, then pip install diffusers==0.31.0 transformers einops scipy opencv-python tqdm pillow onnxruntime-gpu onnx safetensors accelerate peft imageio imageio[ffmpeg] torchvision. Clone the repository and run python inference_image.py or python inference_video.py. For the Gradio app, pip install gradio and run python app.py.
Prerequisites: Python 3.10.11, PyTorch, FFmpeg, and specific Python packages. Models are downloaded automatically.
Resources: Tested on 2080Ti for lower VRAM usage with a patch size of 12.
Links: YouTube Demo, Modelscope Demo.

Highlighted Details

Supports both image and video generation with reference inputs.
Includes a ComfyUI interface for HelloMeme.
Offers HMControlNet2 module for enhanced control.
Fine-tuned Animatediff versions available, including one with reduced VRAM usage (patch size 12).
Utilizes ONNX models for face RT extraction and blendshape parameters.

Maintenance & Community

The project is actively updated, with recent additions including HelloMemeV3, Modelscope Demo, and a Gradio app rewrite. Contact: Shengkai Zhang (songkey@pku.edu.cn).

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that frequent updates to the diffusers library may cause dependency conflicts, with diffusers==0.31.0 being the currently tested and supported version. For video generation with significant face movement, setting trans_ratio=0 is recommended to prevent distorted outputs.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days