Isaac-GR00T  by NVIDIA

Open foundation model for humanoid robot reasoning and skills

created 4 months ago
4,544 stars

Top 11.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NVIDIA Isaac GR00T N1 is an open foundation model for generalized humanoid robot reasoning and skills, designed for researchers and professionals in robotics. It enables cross-embodiment manipulation tasks by processing multimodal inputs (language, images) and can be adapted to specific robots and environments through fine-tuning.

How It Works

GR00T N1 utilizes a vision-language foundation model combined with a diffusion transformer head that denoises continuous actions. This architecture is trained on a large dataset of real, synthetic, and internet-scale video data, allowing it to generalize across different robot embodiments and tasks. The model outputs actions based on multimodal inputs, facilitating robot control.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies within a conda environment with Python 3.10 and CUDA 12.4. Key dependencies include flash-attn==2.7.1.post4.
  • Prerequisites: Ubuntu 20.04/22.04, NVIDIA GPU (H100, L40, RTX 4090, A6000 for fine-tuning; RTX 3090, RTX 4090, A6000 for inference), CUDA 12.4, ffmpeg, libsm6, libxext6.
  • Resources: Fine-tuning requires significant GPU resources. Inference is performant on modern GPUs.
  • Documentation: Jupyter notebooks and detailed guides are available in the ./getting_started folder.

Highlighted Details

  • Cross-Embodiment: Trained on diverse data for generalization across different humanoid robots.
  • Fine-tuning: Supports full fine-tuning and parameter-efficient LoRA fine-tuning for customization.
  • Inference: Provides an inference service in server/client modes.
  • Evaluation: Includes scripts for offline policy evaluation and plotting results.

Maintenance & Community

The project is maintained by NVIDIA. Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The model requires specific CUDA versions (12.4) and may have compatibility issues with other versions, particularly for the flash-attn module. Fine-tuning performance varies significantly with hardware, with H100/L40 nodes recommended for optimal results.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
12
Issues (30d)
46
Star History
968 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Feedback? Help us improve.