EasyAnimate by aigc-apps

Video generator for high-resolution, long AI videos using transformer diffusion

Created 1 year ago

2,243 stars

Top 20.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

EasyAnimate is an end-to-end solution for generating high-resolution and long videos using transformer-based diffusion models. It targets researchers and developers looking to create AI-generated videos, train custom models, and explore advanced control mechanisms. The project offers a comprehensive pipeline from data preprocessing to model training and inference, enabling the generation of videos with various resolutions and frame rates.

How It Works

EasyAnimate leverages Diffusion Transformer (DiT) models for video and image generation, offering a unified architecture for both tasks. It supports training custom baseline and LoRA models for style transfer and fine-tuning. The pipeline includes components for data preprocessing, VAE training (optional), and DiT training, allowing for a complete workflow from raw data to generated video content.

Quick Start & Requirements

Installation: Docker is recommended for ease of setup. Local installation requires Python 3.10/3.11, PyTorch 2.2.0, CUDA 11.8/12.1, and CUDNN 8+.
Hardware: High-end GPUs are recommended, with specific memory requirements detailed for different model sizes (7B, 12B) and resolutions. 16GB VRAM is the minimum for basic functionality, while 40GB+ is needed for higher resolutions and frame counts.
Disk Space: Approximately 60GB is required for saving weights.
Resources: Links to Aliyun DSW (free GPU time), ComfyUI integration, and Docker images are provided.
Documentation: Quick start guides and usage instructions are available.

Highlighted Details

Supports video generation up to 1024x1024 resolution, 49 frames at 8fps (V5.1), and up to 144 frames at 24fps (V4).
Offers various control mechanisms including Canny, Pose, Depth, trajectory, and camera control.
Includes options for memory-saving inference (CPU offloading, quantization) to accommodate consumer-grade GPUs.
Provides a complete training pipeline for custom model and LoRA development.

Maintenance & Community

The project is actively updated, with recent versions (V5.1) incorporating new features like Qwen2 VL text encoder and advanced sampling methods. Community support is available via DingTalk and WeChat groups.

Licensing & Compatibility

The project is licensed under the Apache License (Version 2.0), which permits commercial use and linking with closed-source projects.

Limitations & Caveats

High-end GPU hardware is strongly recommended for optimal performance, especially for higher resolutions and frame counts. Some older GPUs may require modifications to run. Memory-saving modes can impact generation speed.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days