mmagic  by open-mmlab

AIGC toolbox for image/video editing and generation

created 6 years ago
7,221 stars

Top 7.3% on sourcepulse

GitHubView on GitHub
Project Summary

MMagic is an advanced AIGC toolkit for multimodal creation, offering a comprehensive suite of state-of-the-art generative models for image and video synthesis, editing, and restoration. It targets researchers and AIGC enthusiasts seeking a flexible and powerful platform for tasks like text-to-image generation, image/video enhancement, and 3D-aware generation.

How It Works

MMagic is built upon the OpenMMLab 2.0 framework, leveraging MMEngine and MMCV for a modular and efficient design. It supports a wide array of generative models, including diffusion models (Stable Diffusion, ControlNet, DreamBooth) and GANs (StyleGAN, BigGAN), enabling flexible experimentation and customization through a Lego-like component-based approach. This architecture facilitates easy integration of new algorithms and supports distributed training for dynamic architectures.

Quick Start & Requirements

Highlighted Details

  • Supports 11 new models across 4 new tasks, including Text2Image (ControlNet, DreamBooth, Stable Diffusion), 3D-aware Generation (EG3D), Image Restoration (NAFNet, Restormer), and Image Colorization.
  • Offers "magic" diffusion features like Stable Diffusion/Disco Diffusion support, finetuning (DreamBooth, LoRA), ControlNet integration, xFormers acceleration, and MultiFrame Render for video generation.
  • Upgraded framework with MMEngine/MMCV 2.0, supporting unified data formats, flexible evaluation loops for various metrics, and visualization via TensorBoard/WandB.
  • Extensive Model Zoo covering GANs, Image Restoration, Super-Resolution, Video tasks, Inpainting, Matting, Text-to-Image, and 3D-aware generation.

Maintenance & Community

  • Actively maintained with recent releases (v1.2.0 in Dec 2023) and community contributions.
  • Part of the larger OpenMMLab ecosystem.
  • Issue reporting and ongoing projects are tracked on GitHub.

Licensing & Compatibility

  • Released under the Apache 2.0 license.
  • Permissive for commercial use, but users are advised to review LICENSES for specifics.

Limitations & Caveats

The project requires specific versions of PyTorch and Python, and installation involves multiple steps using MIM. While comprehensive, the vast number of models and features may present a learning curve for new users.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
95 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.