EasyAnimate  by aigc-apps

Video generator for high-resolution, long AI videos using transformer diffusion

Created 1 year ago
2,207 stars

Top 20.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

EasyAnimate is an end-to-end solution for generating high-resolution and long videos using transformer-based diffusion models. It targets researchers and developers looking to create AI-generated videos, train custom models, and explore advanced control mechanisms. The project offers a comprehensive pipeline from data preprocessing to model training and inference, enabling the generation of videos with various resolutions and frame rates.

How It Works

EasyAnimate leverages Diffusion Transformer (DiT) models for video and image generation, offering a unified architecture for both tasks. It supports training custom baseline and LoRA models for style transfer and fine-tuning. The pipeline includes components for data preprocessing, VAE training (optional), and DiT training, allowing for a complete workflow from raw data to generated video content.

Quick Start & Requirements

  • Installation: Docker is recommended for ease of setup. Local installation requires Python 3.10/3.11, PyTorch 2.2.0, CUDA 11.8/12.1, and CUDNN 8+.
  • Hardware: High-end GPUs are recommended, with specific memory requirements detailed for different model sizes (7B, 12B) and resolutions. 16GB VRAM is the minimum for basic functionality, while 40GB+ is needed for higher resolutions and frame counts.
  • Disk Space: Approximately 60GB is required for saving weights.
  • Resources: Links to Aliyun DSW (free GPU time), ComfyUI integration, and Docker images are provided.
  • Documentation: Quick start guides and usage instructions are available.

Highlighted Details

  • Supports video generation up to 1024x1024 resolution, 49 frames at 8fps (V5.1), and up to 144 frames at 24fps (V4).
  • Offers various control mechanisms including Canny, Pose, Depth, trajectory, and camera control.
  • Includes options for memory-saving inference (CPU offloading, quantization) to accommodate consumer-grade GPUs.
  • Provides a complete training pipeline for custom model and LoRA development.

Maintenance & Community

The project is actively updated, with recent versions (V5.1) incorporating new features like Qwen2 VL text encoder and advanced sampling methods. Community support is available via DingTalk and WeChat groups.

Licensing & Compatibility

The project is licensed under the Apache License (Version 2.0), which permits commercial use and linking with closed-source projects.

Limitations & Caveats

High-end GPU hardware is strongly recommended for optimal performance, especially for higher resolutions and frame counts. Some older GPUs may require modifications to run. Memory-saving modes can impact generation speed.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Rodrigo Nader Rodrigo Nader(Cofounder of Langflow), and
1 more.

DiffSynth-Studio by modelscope

0.9%
10k
Open-source project for diffusion model exploration
Created 1 year ago
Updated 13 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luca Antiga Luca Antiga(CTO of Lightning AI), and
2 more.

mmagic by open-mmlab

0.1%
7k
AIGC toolbox for image/video editing and generation
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.