threestudio  by threestudio-project

Framework for 3D content generation from text/images using 2D diffusion

created 2 years ago
6,830 stars

Top 7.6% on sourcepulse

GitHubView on GitHub
Project Summary

threestudio is a unified framework for generating 3D content from text or images, leveraging 2D text-to-image diffusion models. It aims to provide a flexible and extensible platform for researchers and developers to experiment with and implement various 3D generation techniques.

How It Works

The framework employs score distillation sampling (SDS) and related techniques to optimize 3D representations (like NeRFs or SDFs) based on guidance from pre-trained 2D diffusion models. It supports multiple 3D representations and guidance methods, allowing for diverse generation approaches. The modular design enables easy integration of new methods and customization of the generation pipeline.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt (Python >= 3.8, PyTorch >= 1.12). Docker installation is also supported.
  • Hardware: NVIDIA GPU with at least 6GB VRAM, CUDA installed.
  • Dependencies: PyTorch, Ninja (recommended for CUDA extensions). DeepFloyd IF requires a Hugging Face login and license agreement.
  • Documentation: installation.md, DOCUMENTATION.md
  • Demo: HuggingFace Spaces, self-hosted service

Highlighted Details

  • Supports numerous state-of-the-art 3D generation methods including DreamFusion, Magic3D, ProlificDreamer, Zero-1-to-3, and Gaussian Splatting.
  • Offers a custom extension system for adding new methods.
  • Includes features like prompt debiasing, Perp-Neg, and VRAM optimization techniques.
  • Provides a Gradio web interface for easier experimentation.

Maintenance & Community

  • Active development with frequent updates and contributions from various researchers.
  • Discord server for community discussion.
  • GitHub Issues for bug reports and feature requests.

Licensing & Compatibility

  • The project itself appears to be under a permissive license, but specific models used (e.g., Stable Diffusion, DeepFloyd IF) have their own licenses that may restrict commercial use or require specific agreements.

Limitations & Caveats

  • Achieving high-quality results often requires significant VRAM and computational resources.
  • Some implementations are experimental or unofficial re-implementations of original papers.
  • Results can be sensitive to hyperparameters and may require extensive tuning.
Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
124 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.