Step1X-3D  by stepfun-ai

Framework for high-fidelity, controllable textured 3D asset generation

created 2 months ago
746 stars

Top 47.5% on sourcepulse

GitHubView on GitHub
Project Summary

Step1X-3D addresses the challenges in 3D asset generation by providing a framework for high-fidelity, controllable, and textured 3D asset creation. It targets researchers and developers in the 3D AI space, offering a two-stage architecture that bridges 2D and 3D generation paradigms for improved control and quality.

How It Works

The framework employs a two-stage, 3D-native architecture. The first stage, a hybrid VAE-DiT model, generates watertight TSDF geometry representations using perceiver-based latent encoding and sharp edge sampling for detail preservation. The second stage utilizes an SD-XL-based texture synthesis module, ensuring cross-view consistency through geometric conditioning and latent-space synchronization. This approach allows for direct transfer of 2D control techniques like LoRA to 3D synthesis.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (python=3.10), and install dependencies using pip install -r requirements.txt after setting up CUDA 12.4. Specific PyTorch and torch-cluster installations are required.
  • Prerequisites: CUDA 12.4, Python 3.10, PyTorch 2.5.1.
  • Resources: Inference requires ~27-29GB GPU memory.
  • Links: Huggingface Demo, Model Weights, Technical Report

Highlighted Details

  • Generates high-fidelity geometry and versatile texture maps with strong alignment.
  • Supports direct transfer of 2D control techniques (e.g., LoRA) to 3D synthesis.
  • Released 800K high-quality 3D asset UIDs and training code.
  • Achieves state-of-the-art performance compared to open-source methods.

Maintenance & Community

The project was released on May 13, 2025, with all planned open-source components (technical report, inference code, model weights, training code, dataset UIDs, demo) now available.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is newly released, and future work is planned for more controllable models (multi-view, bounding-box, skeleton conditioning) and ComfyUI integration.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
747 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.