ToolkenGPT by Ber666

Research code for augmenting frozen LLMs with tools via embeddings

Created 2 years ago

267 stars

Top 96.1% on SourcePulse

Project Summary

ToolkenGPT augments frozen large language models (LLMs) with a vast array of tools through tool embeddings, enabling them to perform complex tasks requiring external functionalities. This NeurIPS 2023 oral presentation and Best Paper Award recipient targets researchers and developers aiming to enhance LLM capabilities beyond their inherent knowledge.

How It Works

ToolkenGPT introduces a novel "tool embedding" mechanism that allows pre-trained, frozen LLMs to interact with external tools. This approach avoids costly fine-tuning of the entire LLM, instead focusing on learning how to effectively query and utilize tools. The system represents tools as embeddings, which are then integrated into the LLM's input or attention mechanisms, enabling it to select and invoke the appropriate tool for a given task.

Quick Start & Requirements

Installation: Requires acquiring LLaMA checkpoints from MetaAI and installing dependencies.
Prerequisites: LLaMA-13B/33B checkpoints, Python, PyTorch, and CUDA (>= 24GB GPU memory per instance, e.g., 2-4 GPUs for LLaMA-33B). Specific datasets (GSM8K-XL, FuncQA, VirtualHome, KAMEL) need to be downloaded separately.
Setup: Requires obtaining LLaMA weights and preparing datasets, which can be time-consuming.
Resources: Training and inference demand significant GPU resources.
Documentation: Links to LLaMA official repo and VirtualHome repo are provided for data acquisition and setup.

Highlighted Details

NeurIPS 2023 (oral) and Best Paper Award at SoCalNLP 2023.
Augments frozen LLMs, reducing computational cost compared to full fine-tuning.
Supports multiple tool-use scenarios including GSM8K-XL, FuncQA (1-hop and multi-hop), VirtualHome, and KAMEL.
Provides specific training and inference commands for various datasets and configurations.

Maintenance & Community

The project is associated with the authors of the NeurIPS 2023 paper. No specific community channels (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state the license for the ToolkenGPT code. However, it relies on LLaMA checkpoints, which have their own usage terms. Compatibility with commercial or closed-source applications would depend on the LLaMA license and the ToolkenGPT license once clarified.

Limitations & Caveats

The project requires access to LLaMA model checkpoints, which are not provided directly and have specific distribution terms. The setup process involves downloading large datasets and potentially fixing bugs in external repositories (e.g., VirtualHome), indicating a non-trivial setup effort.

ToolkenGPT by Ber666

Explore Similar Projects

LLaMA-Cult-and-More by shm007g

GPT4RoI by jshilong

mlx-llm by riccardomusmeci

Kolosal by KolosalAI

Seed-Coder by ByteDance-Seed

llm_qlora by georgesung

bce-qianfan-sdk by baidubce

awesome-llm-and-aigc by coderonion

david-share by david-xinyuwei

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing by ghimiresunil

Transformers-for-NLP-and-Computer-Vision-3rd-Edition by Denis2054

oumi by oumi-ai