AgentTuning  by THUDM

Agent tuning for generalized LLM agent abilities

created 1 year ago
1,454 stars

Top 28.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

AgentTuning is a framework for instruction-tuning Large Language Models (LLMs) to enhance their generalized agent abilities. It targets researchers and developers aiming to build more capable AI agents that can perform diverse, real-world tasks. The project provides the AgentInstruct dataset and AgentLM models, demonstrating improved performance on unseen agent tasks while retaining general language proficiency.

How It Works

AgentTuning utilizes the AgentInstruct dataset, a collection of 1,866 high-quality interaction trajectories across six real-world scenarios. These trajectories are meticulously filtered for precision and include detailed thought explanations (Chain-of-Thought) to guide agent decision-making. The AgentLM models are then trained on this dataset, mixed with ShareGPT data, and follow the Llama-2-chat conversation format. This approach aims to imbue LLMs with robust agentic capabilities through targeted, high-quality interaction data.

Quick Start & Requirements

  • Install/Run: Use Docker Compose for inference: cd docker && docker compose -f agentlm-70b.yml up.
  • Prerequisites: Requires Docker and a GPU for running the AgentLM-70b instance. Evaluation scripts may require additional setup like FastChat.
  • Resources: Running AgentLM-70b requires significant GPU resources.
  • Links: Project Page, AgentLM-70B Docker, AgentInstruct Dataset

Highlighted Details

  • Open-sourced AgentInstruct dataset (1,866 interactions, 6 tasks, CoT, rigorously filtered).
  • AgentLM models available in 7B, 13B, and 70B parameter sizes.
  • Demonstrates robust generalization on unseen agent tasks.
  • Evaluation scripts provided for held-in and held-out tasks, including benchmarks like MMLU, GSM8k, and MT-Bench.

Maintenance & Community

  • Developed by THUDM.
  • Paper available on arXiv: 2310.12823.

Licensing & Compatibility

  • The specific license for the dataset and models is not explicitly stated in the README, but models are hosted on Hugging Face, typically implying Apache 2.0 or similar for the code, while model weights might have specific terms. Compatibility for commercial use should be verified.

Limitations & Caveats

  • Held-in task evaluation results might not be fully reproducible due to the active development of the AgentBench framework.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Robert Stojnic Robert Stojnic(Creator of Papers with Code).

Agent-S by simular-ai

1.2%
6k
Agentic framework for autonomous computer interaction
created 9 months ago
updated 1 day ago
Feedback? Help us improve.