Survey paper for multimodal image synthesis/editing & visual AIGC
Top 46.8% on sourcepulse
This repository serves as a comprehensive survey and curated collection of research papers, code, and projects focused on Multimodal Image Synthesis and Editing (MISE) within the Generative AI era. It aims to provide researchers and practitioners with a structured overview of the field, categorized by data modality and model architectures, facilitating a deeper understanding and advancement of visual AI generation techniques.
How It Works
The project organizes a vast landscape of MISE research, classifying papers and associated resources into taxonomies based on data modality (e.g., text, audio) and model architectures (e.g., Diffusion-based, GAN-based, Autoregressive). This structured approach allows users to navigate and discover relevant advancements, understand the evolution of techniques, and identify key contributions in areas like text-to-image synthesis, image editing, and 3D generation.
Quick Start & Requirements
This repository is primarily a curated list of resources, not a runnable software package. No installation or execution commands are provided. The "quick start" involves browsing the categorized links to papers and projects.
Highlighted Details
Maintenance & Community
The project is maintained by fnzhan and is open for community contributions through pull requests. The primary paper associated with the survey is "Multimodal Image Synthesis and Editing: The Generative AI Era" by Zhan et al. (TPAMI 2023).
Licensing & Compatibility
The repository itself does not specify a license. The linked papers and projects will have their own respective licenses, which may vary. Users should consult the individual project licenses for compatibility and usage restrictions.
Limitations & Caveats
As a curated list, this repository does not provide executable code or direct access to the models themselves. The "quick start" is informational, requiring users to follow external links to find and potentially run the associated research projects. The sheer volume of listed papers means it's a broad overview rather than a deep dive into any single technique.
1 year ago
Inactive