AgentSims  by py499372727

Open-source sandbox for LLM evaluation via task-based agent simulations

Created 2 years ago
899 stars

Top 40.4% on SourcePulse

GitHubView on GitHub
Project Summary

AgentSims provides an open-source sandbox for evaluating Large Language Models (LLMs) through task-based simulations. It addresses limitations in existing evaluation methods by offering customizable task building, robust benchmarks, and objective metrics, making it suitable for researchers across disciplines seeking to test specific LLM capacities.

How It Works

AgentSims utilizes a simulated environment where LLM agents perform tasks. This approach allows for more comprehensive and objective evaluation compared to traditional benchmarks. The system is designed for high customization, enabling users to build their own evaluation tasks via an interactive GUI or by coding new support mechanisms like memory and planning systems.

Quick Start & Requirements

  • Install: pip install tornado mysql-connector-python websockets openai_async or pip install -r requirements.txt
  • Prerequisites: Python 3.9.x, MySQL 8.0.31.
  • Setup: Requires creating config/api_key.json with LLM API keys and snapshot/logs directories. MySQL database initialization is necessary.
  • Docs: Wiki

Highlighted Details

  • Task-based evaluation framework for LLMs.
  • Interactive GUI for task building and agent/building creation.
  • Supports custom memory and planning systems for agents.
  • Evaluation via customizable QA forms during simulation.

Maintenance & Community

  • Developed by PTA Studio, focused on open-source NLP architecture and AI games.
  • Contact: zhaohaoran@buaa.edu.cn
  • Citation: @misc{lin2023agentsims, title={AgentSims: An Open-Source Sandbox for Large Language Model Evaluation}, author={Jiaju Lin and Haoran Zhao and Aochi Zhang and Yiting Wu and Huqiuyue Ping and Qin Chen}, year={2023}, eprint={2308.04026}, archivePrefix={arXiv}, primaryClass={cs.AI}}

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project recommends deployment on MacOS or Linux for stability. The README does not specify the license, which may impact commercial or closed-source usage. Initial setup involves several manual steps including database configuration and API key management.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
4 more.

ag2 by ag2ai

0.9%
4k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 10 months ago
Updated 2 days ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 7 months ago
Starred by Wes McKinney Wes McKinney(Author of Pandas), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

autogen by microsoft

0.5%
50k
Agentic framework for multi-agent AI applications
Created 2 years ago
Updated 19 hours ago
Feedback? Help us improve.