AgentSims  by py499372727

Open-source sandbox for LLM evaluation via task-based agent simulations

Created 2 years ago
942 stars

Top 38.7% on SourcePulse

GitHubView on GitHub
Project Summary

AgentSims provides an open-source sandbox for evaluating Large Language Models (LLMs) through task-based simulations. It addresses limitations in existing evaluation methods by offering customizable task building, robust benchmarks, and objective metrics, making it suitable for researchers across disciplines seeking to test specific LLM capacities.

How It Works

AgentSims utilizes a simulated environment where LLM agents perform tasks. This approach allows for more comprehensive and objective evaluation compared to traditional benchmarks. The system is designed for high customization, enabling users to build their own evaluation tasks via an interactive GUI or by coding new support mechanisms like memory and planning systems.

Quick Start & Requirements

  • Install: pip install tornado mysql-connector-python websockets openai_async or pip install -r requirements.txt
  • Prerequisites: Python 3.9.x, MySQL 8.0.31.
  • Setup: Requires creating config/api_key.json with LLM API keys and snapshot/logs directories. MySQL database initialization is necessary.
  • Docs: Wiki

Highlighted Details

  • Task-based evaluation framework for LLMs.
  • Interactive GUI for task building and agent/building creation.
  • Supports custom memory and planning systems for agents.
  • Evaluation via customizable QA forms during simulation.

Maintenance & Community

  • Developed by PTA Studio, focused on open-source NLP architecture and AI games.
  • Contact: zhaohaoran@buaa.edu.cn
  • Citation: @misc{lin2023agentsims, title={AgentSims: An Open-Source Sandbox for Large Language Model Evaluation}, author={Jiaju Lin and Haoran Zhao and Aochi Zhang and Yiting Wu and Huqiuyue Ping and Qin Chen}, year={2023}, eprint={2308.04026}, archivePrefix={arXiv}, primaryClass={cs.AI}}

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project recommends deployment on MacOS or Linux for stability. The README does not specify the license, which may impact commercial or closed-source usage. Initial setup involves several manual steps including database configuration and API key management.

Health Check
Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 1 year ago
Starred by Wes McKinney Wes McKinney(Author of Pandas), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

autogen by microsoft

0.5%
57k
Agentic framework for multi-agent AI applications
Created 2 years ago
Updated 5 days ago
Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
59 more.

AutoGPT by Significant-Gravitas

0.1%
183k
AI agent platform for building, deploying, and running autonomous workflows
Created 3 years ago
Updated 23 hours ago
Feedback? Help us improve.