mmagic by open-mmlab

AIGC toolbox for image/video editing and generation

Created 6 years ago

7,380 stars

Top 7.0% on SourcePulse

View on GitHub

4 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

MMagic is an advanced AIGC toolkit for multimodal creation, offering a comprehensive suite of state-of-the-art generative models for image and video synthesis, editing, and restoration. It targets researchers and AIGC enthusiasts seeking a flexible and powerful platform for tasks like text-to-image generation, image/video enhancement, and 3D-aware generation.

How It Works

MMagic is built upon the OpenMMLab 2.0 framework, leveraging MMEngine and MMCV for a modular and efficient design. It supports a wide array of generative models, including diffusion models (Stable Diffusion, ControlNet, DreamBooth) and GANs (StyleGAN, BigGAN), enabling flexible experimentation and customization through a Lego-like component-based approach. This architecture facilitates easy integration of new algorithms and supports distributed training for dynamic architectures.

Quick Start & Requirements

Installation: pip3 install openmim, mim install mmcv>=2.0.0 mmengine mmagic
Prerequisites: PyTorch (2.0+ recommended), Python (3.9+ recommended).
Quick Start Example: Provided Python code snippet demonstrates text-to-image generation using MMMagicInferencer.
Documentation: https://mmagic.readthedocs.io/en/latest/
Installation Guide: https://mmagic.readthedocs.io/en/latest/get_started/install.html

Highlighted Details

Supports 11 new models across 4 new tasks, including Text2Image (ControlNet, DreamBooth, Stable Diffusion), 3D-aware Generation (EG3D), Image Restoration (NAFNet, Restormer), and Image Colorization.
Offers "magic" diffusion features like Stable Diffusion/Disco Diffusion support, finetuning (DreamBooth, LoRA), ControlNet integration, xFormers acceleration, and MultiFrame Render for video generation.
Upgraded framework with MMEngine/MMCV 2.0, supporting unified data formats, flexible evaluation loops for various metrics, and visualization via TensorBoard/WandB.
Extensive Model Zoo covering GANs, Image Restoration, Super-Resolution, Video tasks, Inpainting, Matting, Text-to-Image, and 3D-aware generation.

Maintenance & Community

Actively maintained with recent releases (v1.2.0 in Dec 2023) and community contributions.
Part of the larger OpenMMLab ecosystem.
Issue reporting and ongoing projects are tracked on GitHub.

Licensing & Compatibility

Released under the Apache 2.0 license.
Permissive for commercial use, but users are advised to review LICENSES for specifics.

Limitations & Caveats

The project requires specific versions of PyTorch and Python, and installation involves multiple steps using MIM. While comprehensive, the vast number of models and features may present a learning curve for new users.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

20 stars in the last 30 days