Agent framework for LLM-based agent development and evaluation
Top 61.9% on sourcepulse
AgentGym provides a unified framework for evaluating and developing generalist Large Language Model (LLM)-based agents across diverse interactive environments. It aims to facilitate research into LLM agents capable of self-evolution and broad task adaptation, offering a platform, a curated trajectory dataset (AgentTraj-L), and a benchmark suite (AgentEval).
How It Works
AgentGym decouples diverse interactive environments (14 types including web navigation, text games, and tool use) by exposing them via encapsulated HTTP services. These services provide standardized APIs for environment creation, observation, action availability, stepping, and resetting. An AgentController component orchestrates agent interaction with these services, facilitating evaluation, data collection, and training. The framework supports the ReAct format for uniform interaction and allows for real-time feedback and concurrency.
Quick Start & Requirements
agentenv
package: pip install agentenv
agentenv-*
directories.Highlighted Details
Maintenance & Community
The project is actively developed, with recent releases of the paper, model, and datasets in June 2024. Contributions for new environments are welcomed. Contact: zhxi22@m.fudan.edu.cn.
Licensing & Compatibility
The repository appears to be under a permissive license, but specific licensing for the datasets (AgentTraj-L, AgentEval) and the model (AgentEvol-7B) should be verified on their respective Hugging Face pages. Compatibility for commercial use is likely, but requires confirmation of underlying licenses.
Limitations & Caveats
The framework is recently released, and while it includes 14 environments, the breadth and depth of support for each may vary. The "AgentEvol" method is presented as a novel approach, and its performance and generalizability beyond the reported experiments would require further validation.
4 months ago
1+ week