cliport  by cliport

Robotic manipulation via imitation learning using language-conditioned policies

created 3 years ago
504 stars

Top 62.6% on sourcepulse

GitHubView on GitHub
Project Summary

CLIPort is an end-to-end imitation learning agent for robotic manipulation tasks, designed to learn a single, generalizable policy from limited demonstrations. It combines CLIP's semantic understanding with TransporterNets' spatial precision, enabling robots to perform tasks based on natural language instructions. The target audience includes robotics researchers and engineers seeking to develop more adaptable and language-aware robotic systems.

How It Works

CLIPort integrates CLIP's visual-language understanding with TransporterNets' spatial reasoning. It uses CLIP to interpret natural language commands (the "what") and TransporterNets to predict precise end-effector movements (the "where"). This dual-pathway approach allows for learning generalizable skills from a small number of demonstrations, bridging the gap between high-level semantic goals and low-level robotic actions.

Quick Start & Requirements

  • Installation: Clone the repository, set up a virtual environment (Python 3.8 recommended), install requirements (pip install -r requirements.txt), and develop the package (python setup.py develop).
  • Prerequisites: NVIDIA GPU with 8.5-9.5GB memory, CUDA-compatible PyTorch (v1.7.1) and torchvision (v0.8.2).
  • Quickstart: Download pre-trained models (sh scripts/quickstart_download.sh), generate test data (python cliport/demos.py), and evaluate (python cliport/eval.py).
  • Resources: Full dataset requires ~1.6TB storage. Pre-trained checkpoints are available via Google Drive.
  • Docs: cliport.github.io

Highlighted Details

  • Learns a single policy for multiple tabletop manipulation tasks.
  • Leverages CLIP for language understanding and TransporterNets for spatial prediction.
  • Supports both single-task and multi-task training from scratch.
  • Includes scripts for dataset generation, training, evaluation, and video recording.

Maintenance & Community

The project was presented at CoRL 2021. Issues can be filed via the GitHub issue tracker.

Licensing & Compatibility

The project incorporates code from Google Ravens (Apache 2.0), OpenAI CLIP (MIT), and Pytorch-UNet (GPL 3.0). The GPL 3.0 license for the UNet component may impose restrictions on commercial use or linking in closed-source projects.

Limitations & Caveats

The code is noted as "Tired grad student" quality and currently only supports a batch size of 1 due to memory constraints. Rotation augmentation may cause issues for tasks requiring precise spatial relationships. Multi-task models are not trained on the full seen object split for certain tasks, potentially disadvantaging them compared to single-task models. The README suggests averaging evaluation metrics over multiple runs with different seeds, which is not automated.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.