llama_ros  by mgonzs13

ROS 2 integration for GGUF LLMs and VLMs

Created 3 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides ROS 2 packages for integrating llama.cpp and llava.cpp (GGUF LLMs and VLMs) into robotics applications. It targets ROS 2 developers seeking to leverage powerful, optimized language and vision models directly within their robotic systems, offering benefits like real-time LoRA adaptation and multimodal understanding.

How It Works

The project exposes llama.cpp and llava.cpp functionalities through ROS 2 nodes (llama_node, llava_node). It supports loading models in the GGUF format, enabling features such as GBNF grammars for constrained generation and speculative decoding for accelerated inference. The integration allows for seamless incorporation of LLM/VLM capabilities, including image and audio processing, into ROS 2 workflows.

Quick Start & Requirements

  • Installation: Requires ROS 2, Python, and optionally CUDA Toolkit. Installation involves cloning the repository, synchronizing Python dependencies with uv sync, installing ROS dependencies with rosdep, and building with colcon build. CUDA support is enabled via colcon build --cmake-args -DGGML_CUDA=ON. Docker images are also available for various ROS 2 distros.
  • Prerequisites: CUDA Toolkit (for GPU acceleration), ROS 2 (Humble, Iron, Jazzy, Kilted, Rolling).
  • Links:

Highlighted Details

  • Multimodal Support: Integrates llava.cpp for Visual Language Models (VLMs), enabling image and audio input processing.
  • Speculative Decoding: Accelerates text generation by using a smaller draft model to predict tokens, verifiable in parallel by the main model.
  • LoRA Adapters: Supports dynamic loading and scaling of LoRA adapters for real-time model fine-tuning.
  • LangChain Integration: Offers ROS 2 clients and LangChain integrations for LLM/VLM functionalities, RAG, embeddings, and reranking.
  • ROS 2 CLI: Includes ros2 llama launch and ros2 llama prompt commands for streamlined interaction.

Maintenance & Community

The project shows signs of active maintenance with CI/CD pipelines across multiple ROS 2 distributions and recent commits. It lists multiple contributors, indicating a collaborative effort. No specific community channels (like Discord/Slack) are detailed in the README.

Licensing & Compatibility

The project is released under the MIT License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

GPU acceleration requires manual CUDA Toolkit installation and specific build flags. Speculative decoding is not compatible with embedding or reranking models and requires context.n_parallel: 1. Running large language models typically demands substantial computational resources (CPU, RAM, VRAM).

Health Check
Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.