AgentTuning by THUDM

Agent tuning for generalized LLM agent abilities

Created 2 years ago

1,474 stars

Top 27.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Johannes Hagemann

Cofounder of Prime Intellect

Vincent Weisser

Cofounder of Prime Intellect

Elvis Saravia

Founder of DAIR.AI

Lianmin Zheng

Coauthor of SGLang, vLLM

Project Summary

AgentTuning is a framework for instruction-tuning Large Language Models (LLMs) to enhance their generalized agent abilities. It targets researchers and developers aiming to build more capable AI agents that can perform diverse, real-world tasks. The project provides the AgentInstruct dataset and AgentLM models, demonstrating improved performance on unseen agent tasks while retaining general language proficiency.

How It Works

AgentTuning utilizes the AgentInstruct dataset, a collection of 1,866 high-quality interaction trajectories across six real-world scenarios. These trajectories are meticulously filtered for precision and include detailed thought explanations (Chain-of-Thought) to guide agent decision-making. The AgentLM models are then trained on this dataset, mixed with ShareGPT data, and follow the Llama-2-chat conversation format. This approach aims to imbue LLMs with robust agentic capabilities through targeted, high-quality interaction data.

Quick Start & Requirements

Install/Run: Use Docker Compose for inference: cd docker && docker compose -f agentlm-70b.yml up.
Prerequisites: Requires Docker and a GPU for running the AgentLM-70b instance. Evaluation scripts may require additional setup like FastChat.
Resources: Running AgentLM-70b requires significant GPU resources.
Links: Project Page, AgentLM-70B Docker, AgentInstruct Dataset

Highlighted Details

Open-sourced AgentInstruct dataset (1,866 interactions, 6 tasks, CoT, rigorously filtered).
AgentLM models available in 7B, 13B, and 70B parameter sizes.
Demonstrates robust generalization on unseen agent tasks.
Evaluation scripts provided for held-in and held-out tasks, including benchmarks like MMLU, GSM8k, and MT-Bench.

Maintenance & Community

Developed by THUDM.
Paper available on arXiv: 2310.12823.

Licensing & Compatibility

The specific license for the dataset and models is not explicitly stated in the README, but models are hosted on Hugging Face, typically implying Apache 2.0 or similar for the code, while model weights might have specific terms. Compatibility for commercial use should be verified.

Limitations & Caveats