LayoutGPT by weixi-feng

Research paper for visual planning & generation using LLMs

Created 2 years ago

401 stars

Top 72.2% on SourcePulse

Project Summary

LayoutGPT addresses the challenge of generating and planning visual layouts for both 2D images and 3D scenes using large language models (LLMs). It targets researchers and developers working on AI-driven content creation, scene synthesis, and multimodal AI, offering a novel approach to compositional visual generation.

How It Works

LayoutGPT leverages LLMs to interpret textual prompts and generate structured layout descriptions (bounding boxes, object placements). It then utilizes external generative models like GLIGEN for 2D image synthesis and ATISS for 3D scene generation, enabling a compositional approach to visual planning and creation. This method allows for fine-grained control over scene elements and their spatial relationships.

Quick Start & Requirements

Installation: conda create -n layoutgpt python=3.8 -y followed by pip install -r requirements.txt. Additional setup for GLIGEN, GLIP, and ATISS is required.
Prerequisites: Python 3.8, Conda, PyTorch, OpenAI API authentication, Blender (for visualization). Specific checkpoints for GLIGEN and GLIP, and data for 3D-FUTURE are needed.
Data: Download NSR-1K (2D) and 3D-FUTURE (3D) datasets.
Links: Project Page, arxiv, GLIGEN, GLIP, ATISS.

Highlighted Details

Supports both 2D image layout generation and 3D indoor scene synthesis.
Integrates with LLMs like GPT-4 and Llama-2 for layout planning.
Utilizes GLIGEN for 2D image generation from layouts and ATISS for 3D scene synthesis.
Includes evaluation scripts for layout quality and generated images.

Maintenance & Community

The project was presented at NeurIPS 2023.
Recent updates include Llama-2 support and Blender rendering scripts.
The project relies on external repositories (GLIGEN, GLIP, ATISS), and users should refer to their respective communities.

Licensing & Compatibility

The README does not explicitly state a license for the LayoutGPT code itself.
It notes that downstream generation code usage is governed by the terms of the original authors of GLIGEN, GLIP, and ATISS. Commercial use may be restricted by these dependencies.

Limitations & Caveats

The project disclaimer states that the code is not the official code of the original creators of GLIGEN, GLIP, and ATISS, and may be subject to retraction. Users must comply with the terms of the original projects for downstream generation.

LayoutGPT by weixi-feng

Explore Similar Projects

CE3D by Fangkang515

scene-language by zzyunzhi

LLaVA-3D by ZCMax

EmbodiedGen by HorizonRobotics

ShapeLLM-Omni by JAMESYJL

WonderWorld by KovenYu

Awesome-3D-Scene-Generation by hzxie

WonderJourney by KovenYu

MultiDiffusion by omerbt

RPG-DiffusionMaster by YangLing0818

HunyuanWorld-1.0 by Tencent-Hunyuan

shap-e by openai