Discover and explore top open-source AI tools and projects—updated daily.
Evaluate LLM agents' tool-use capabilities
New!
Top 85.9% on SourcePulse
MCP-Bench is an evaluation framework for benchmarking Large Language Model (LLM) agents' tool-use capabilities in complex, real-world tasks via the Model Context Protocol (MCP). It offers an end-to-end pipeline for assessing how effectively LLMs discover, select, and utilize tools, providing valuable insights for researchers and developers in the LLM agent space.
How It Works
The framework employs the Model Context Protocol (MCP) to facilitate LLM interaction with a suite of 28 diverse real-world services, referred to as MCP servers. It orchestrates benchmark runs, evaluating LLMs on their ability to understand task requirements, select appropriate tools from available MCP servers, and execute them effectively. This approach allows for systematic measurement of tool-use proficiency across various complex scenarios.
Quick Start & Requirements
Installation involves cloning the repository (https://github.com/accenture/mcp-bench.git
), setting up a Conda environment with Python 3.10, and running an installation script for MCP servers. Key requirements include obtaining API keys for multiple services (OpenRouter, Azure OpenAI, NPS, NASA, Hugging Face, Google Maps, NCI). Some API key registration portals may require a US IP address. Model providers can be browsed at https://openrouter.ai/models
.
Highlighted Details
Maintenance & Community
The project's primary documentation is the README. Specific details regarding active maintenance, community channels (like Discord or Slack), or sponsorships are not explicitly detailed in the provided text. The citation lists several authors, indicating research contributions.
Licensing & Compatibility
The specific open-source license for the MCP-Bench repository is not explicitly stated in the provided README content.
Limitations & Caveats
Acquiring necessary API keys for certain services might present challenges, as some registration portals may require a US IP address, as noted in Issue #10. The setup process necessitates configuring multiple external API keys, which can add complexity to the initial deployment.
1 week ago
Inactive