EasyAnimate  by aigc-apps

Video generator for high-resolution, long AI videos using transformer diffusion

created 1 year ago
2,189 stars

Top 21.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

EasyAnimate is an end-to-end solution for generating high-resolution and long videos using transformer-based diffusion models. It targets researchers and developers looking to create AI-generated videos, train custom models, and explore advanced control mechanisms. The project offers a comprehensive pipeline from data preprocessing to model training and inference, enabling the generation of videos with various resolutions and frame rates.

How It Works

EasyAnimate leverages Diffusion Transformer (DiT) models for video and image generation, offering a unified architecture for both tasks. It supports training custom baseline and LoRA models for style transfer and fine-tuning. The pipeline includes components for data preprocessing, VAE training (optional), and DiT training, allowing for a complete workflow from raw data to generated video content.

Quick Start & Requirements

  • Installation: Docker is recommended for ease of setup. Local installation requires Python 3.10/3.11, PyTorch 2.2.0, CUDA 11.8/12.1, and CUDNN 8+.
  • Hardware: High-end GPUs are recommended, with specific memory requirements detailed for different model sizes (7B, 12B) and resolutions. 16GB VRAM is the minimum for basic functionality, while 40GB+ is needed for higher resolutions and frame counts.
  • Disk Space: Approximately 60GB is required for saving weights.
  • Resources: Links to Aliyun DSW (free GPU time), ComfyUI integration, and Docker images are provided.
  • Documentation: Quick start guides and usage instructions are available.

Highlighted Details

  • Supports video generation up to 1024x1024 resolution, 49 frames at 8fps (V5.1), and up to 144 frames at 24fps (V4).
  • Offers various control mechanisms including Canny, Pose, Depth, trajectory, and camera control.
  • Includes options for memory-saving inference (CPU offloading, quantization) to accommodate consumer-grade GPUs.
  • Provides a complete training pipeline for custom model and LoRA development.

Maintenance & Community

The project is actively updated, with recent versions (V5.1) incorporating new features like Qwen2 VL text encoder and advanced sampling methods. Community support is available via DingTalk and WeChat groups.

Licensing & Compatibility

The project is licensed under the Apache License (Version 2.0), which permits commercial use and linking with closed-source projects.

Limitations & Caveats

High-end GPU hardware is strongly recommended for optimal performance, especially for higher resolutions and frame counts. Some older GPUs may require modifications to run. Memory-saving modes can impact generation speed.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
68 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.