LayoutGPT  by weixi-feng

Research paper for visual planning & generation using LLMs

created 2 years ago
369 stars

Top 77.7% on sourcepulse

GitHubView on GitHub
Project Summary

LayoutGPT addresses the challenge of generating and planning visual layouts for both 2D images and 3D scenes using large language models (LLMs). It targets researchers and developers working on AI-driven content creation, scene synthesis, and multimodal AI, offering a novel approach to compositional visual generation.

How It Works

LayoutGPT leverages LLMs to interpret textual prompts and generate structured layout descriptions (bounding boxes, object placements). It then utilizes external generative models like GLIGEN for 2D image synthesis and ATISS for 3D scene generation, enabling a compositional approach to visual planning and creation. This method allows for fine-grained control over scene elements and their spatial relationships.

Quick Start & Requirements

  • Installation: conda create -n layoutgpt python=3.8 -y followed by pip install -r requirements.txt. Additional setup for GLIGEN, GLIP, and ATISS is required.
  • Prerequisites: Python 3.8, Conda, PyTorch, OpenAI API authentication, Blender (for visualization). Specific checkpoints for GLIGEN and GLIP, and data for 3D-FUTURE are needed.
  • Data: Download NSR-1K (2D) and 3D-FUTURE (3D) datasets.
  • Links: Project Page, arxiv, GLIGEN, GLIP, ATISS.

Highlighted Details

  • Supports both 2D image layout generation and 3D indoor scene synthesis.
  • Integrates with LLMs like GPT-4 and Llama-2 for layout planning.
  • Utilizes GLIGEN for 2D image generation from layouts and ATISS for 3D scene synthesis.
  • Includes evaluation scripts for layout quality and generated images.

Maintenance & Community

  • The project was presented at NeurIPS 2023.
  • Recent updates include Llama-2 support and Blender rendering scripts.
  • The project relies on external repositories (GLIGEN, GLIP, ATISS), and users should refer to their respective communities.

Licensing & Compatibility

  • The README does not explicitly state a license for the LayoutGPT code itself.
  • It notes that downstream generation code usage is governed by the terms of the original authors of GLIGEN, GLIP, and ATISS. Commercial use may be restricted by these dependencies.

Limitations & Caveats

The project disclaimer states that the code is not the official code of the original creators of GLIGEN, GLIP, and ATISS, and may be subject to retraction. Users must comply with the terms of the original projects for downstream generation.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.