vlm_arm by TommyZihao

Human-robot collaboration embodied agent

Created 1 year ago

1,075 stars

Top 35.2% on SourcePulse

Project Summary

This project demonstrates a robotic arm controlled by large language models (LLMs) and multimodal AI, enabling human-robot collaboration for embodied intelligence. It targets researchers and developers interested in integrating advanced AI with physical robotics for tasks like object manipulation and instruction following. The primary benefit is a practical framework for creating intelligent robotic agents that can understand and act upon natural language and visual cues.

How It Works

The system orchestrates a chain of LLMs and multimodal vision-language models (VLMs) to interpret user commands, process visual input from a camera, and generate robotic arm movements. It supports various state-of-the-art models, including Yi-Large, Claude 3 Opus, and Baidu Wenxin for LLM tasks, and GPT-4o, Yi-Vision, and CogVLM2 for multimodal understanding. This modular approach allows for flexibility in model selection and integration, leveraging the strengths of different AI providers.

Quick Start & Requirements

Installation: Requires Python 3.12 and necessary toolkits.
Prerequisites:
- Elephant Robotics Mycobot 280 Pi robotic arm.
- Raspberry Pi 4B with Ubuntu 20.04.
- Camera, microphone, and speaker setup.
- API keys for chosen LLMs and VLMs (e.g., GPT-4o, Claude 3 Opus).
Setup: Involves configuring API keys, microphone/speaker IDs, and camera/voice functionality. Detailed setup guides are available: 复现教程, 开机教程.

Highlighted Details

Integrates with leading LLMs like GPT-4o and Claude 3 Opus.
Supports multimodal understanding via models like GPT-4v and CogVLM2.
Demonstrates practical applications of embodied AI in robotics.
Provides links to extensive video tutorials and demonstrations on Bilibili.

Maintenance & Community

The project is associated with TommyZihao and acknowledges contributions from various institutions and individuals, including Zero1.ai, Manos, Baidu PaddlePaddle, and Elephant Robotics. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The licensing information is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The project relies on specific hardware (Mycobot 280 Pi, Raspberry Pi 4B) and requires careful configuration of external API keys and system audio/video devices. The setup process may be complex for users unfamiliar with robotics or AI model integration.

vlm_arm by TommyZihao

Explore Similar Projects

Embodied-AI-Paper-TopConf by Songwxuan

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

Awesome-Embodied-AI by haoranD

ChatGPT-Robot-Manipulation-Prompts by microsoft

agentchain by jina-ai

ROS-LLM by Auromix

VIMA by vimalabs

awesome-embodied-vla-va-vln by jonyzhang2023

reachy_mini by pollen-robotics

opencog by opencog

Isaac-GR00T by NVIDIA