vlm_arm  by TommyZihao

Human-robot collaboration embodied agent

created 1 year ago
892 stars

Top 41.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project demonstrates a robotic arm controlled by large language models (LLMs) and multimodal AI, enabling human-robot collaboration for embodied intelligence. It targets researchers and developers interested in integrating advanced AI with physical robotics for tasks like object manipulation and instruction following. The primary benefit is a practical framework for creating intelligent robotic agents that can understand and act upon natural language and visual cues.

How It Works

The system orchestrates a chain of LLMs and multimodal vision-language models (VLMs) to interpret user commands, process visual input from a camera, and generate robotic arm movements. It supports various state-of-the-art models, including Yi-Large, Claude 3 Opus, and Baidu Wenxin for LLM tasks, and GPT-4o, Yi-Vision, and CogVLM2 for multimodal understanding. This modular approach allows for flexibility in model selection and integration, leveraging the strengths of different AI providers.

Quick Start & Requirements

  • Installation: Requires Python 3.12 and necessary toolkits.
  • Prerequisites:
    • Elephant Robotics Mycobot 280 Pi robotic arm.
    • Raspberry Pi 4B with Ubuntu 20.04.
    • Camera, microphone, and speaker setup.
    • API keys for chosen LLMs and VLMs (e.g., GPT-4o, Claude 3 Opus).
  • Setup: Involves configuring API keys, microphone/speaker IDs, and camera/voice functionality. Detailed setup guides are available: 复现教程, 开机教程.

Highlighted Details

  • Integrates with leading LLMs like GPT-4o and Claude 3 Opus.
  • Supports multimodal understanding via models like GPT-4v and CogVLM2.
  • Demonstrates practical applications of embodied AI in robotics.
  • Provides links to extensive video tutorials and demonstrations on Bilibili.

Maintenance & Community

The project is associated with TommyZihao and acknowledges contributions from various institutions and individuals, including Zero1.ai, Manos, Baidu PaddlePaddle, and Elephant Robotics. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The licensing information is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The project relies on specific hardware (Mycobot 280 Pi, Raspberry Pi 4B) and requires careful configuration of external API keys and system audio/video devices. The setup process may be complex for users unfamiliar with robotics or AI model integration.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
108 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.