Isaac-GR00T by NVIDIA

Open foundation model for humanoid robot reasoning and skills

Created 10 months ago

5,871 stars

Top 8.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

Omar Sanseviero

DevRel at Google DeepMind

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

NVIDIA Isaac GR00T N1 is an open foundation model for generalized humanoid robot reasoning and skills, designed for researchers and professionals in robotics. It enables cross-embodiment manipulation tasks by processing multimodal inputs (language, images) and can be adapted to specific robots and environments through fine-tuning.

How It Works

GR00T N1 utilizes a vision-language foundation model combined with a diffusion transformer head that denoises continuous actions. This architecture is trained on a large dataset of real, synthetic, and internet-scale video data, allowing it to generalize across different robot embodiments and tasks. The model outputs actions based on multimodal inputs, facilitating robot control.

Quick Start & Requirements

Installation: Clone the repository and install dependencies within a conda environment with Python 3.10 and CUDA 12.4. Key dependencies include flash-attn==2.7.1.post4.
Prerequisites: Ubuntu 20.04/22.04, NVIDIA GPU (H100, L40, RTX 4090, A6000 for fine-tuning; RTX 3090, RTX 4090, A6000 for inference), CUDA 12.4, ffmpeg, libsm6, libxext6.
Resources: Fine-tuning requires significant GPU resources. Inference is performant on modern GPUs.
Documentation: Jupyter notebooks and detailed guides are available in the ./getting_started folder.

Highlighted Details

Cross-Embodiment: Trained on diverse data for generalization across different humanoid robots.
Fine-tuning: Supports full fine-tuning and parameter-efficient LoRA fine-tuning for customization.
Inference: Provides an inference service in server/client modes.
Evaluation: Includes scripts for offline policy evaluation and plotting results.

Maintenance & Community

The project is maintained by NVIDIA. Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model requires specific CUDA versions (12.4) and may have compatibility issues with other versions, particularly for the flash-attn module. Fine-tuning performance varies significantly with hardware, with H100/L40 nodes recommended for optimal results.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

338 stars in the last 30 days