openpi-comet  by mli0603

Multimodal VLA training and evaluation framework

Created 5 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Team Comet's OpenPi Comet repository provides a unified framework for training Vision-Language-Action (VLA) models, specifically for the 2025 BEHAVIOR Challenge. It addresses the complexities of pre-training, post-training, data generation, and evaluation on the BEHAVIOR-1K dataset. This codebase enables researchers and engineers to leverage a competitive end-to-end VLA training strategy, demonstrated by their 2nd place finish in the challenge and subsequent performance improvements.

How It Works

The framework employs a distributed training infrastructure supporting multi-dataset sharding. It offers diverse pre-training setups, incorporating hierarchical instructions (global, subtask, skill) and multimodal observations (RGB, depth, point cloud, etc.). Post-training is achieved via Rejection Sampling Fine-Tuning (RFT), which includes automated dataset construction and simulation rollouts. The approach prioritizes native OpenPi compatibility for seamless integration.

Quick Start & Requirements

  • Installation: Clone repositories (openpi-comet, BEHAVIOR-1K), install uv, sync dependencies (uv sync), install package (uv pip install -e .), and activate environment (source .venv/bin/activate). Install BEHAVIOR dependencies separately.
  • Prerequisites: NVIDIA GPU (Inference > 8GB, LoRA Fine-Tuning > 22.5 GB, Full Fine-Tuning > 70 GB). Tested on Ubuntu 22.04.
  • Links:
    • Codebase: https://github.com/mli0603/openpi-comet.git
    • BEHAVIOR-1K: https://github.com/StanfordVL/BEHAVIOR-1K.git
    • Pre-trained weights and RFT dataset (comet-1.5k) released Dec 2025/Jan 2026.

Highlighted Details

  • Secured 2nd place in the 2025 BEHAVIOR Challenge with a Q-score of 0.2514 (Held-out Test).
  • Achieved a Q-score of 0.345 on the Public Validation set with refined training strategies.
  • Supports distributed training and multi-dataset sharding.
  • Incorporates hierarchical instructions and multimodal observations for pre-training.
  • Features Rejection Sampling Fine-Tuning (RFT) with automated dataset generation.
  • Maintains native compatibility with the official OpenPi framework.

Maintenance & Community

Developed by Team Comet, the project encourages community feedback and contributions via GitHub issues and discussions. Key contributors are listed in the citation. No explicit roadmap or dedicated community channels (like Discord/Slack) are mentioned.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README, which may pose a risk for commercial use or integration into closed-source projects. The codebase has been tested on Ubuntu 22.04 and is designed for native compatibility with OpenPi.

Limitations & Caveats

The current training script does not support multi-node distributed training. Specific NVIDIA GPU memory requirements are detailed for different operational modes. The lack of a clearly stated license is a significant caveat for adoption.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

finetrainers by huggingface

0.1%
1k
Library for diffusion model training
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.