inspect_ai  by UKGovernmentBEIS

Framework for large language model evaluations

created 1 year ago
1,184 stars

Top 33.6% on sourcepulse

GitHubView on GitHub
Project Summary

Inspect is a Python framework for evaluating large language models (LLMs), developed by the UK AI Security Institute. It offers built-in components for prompt engineering, tool usage, multi-turn dialogue, and model-graded evaluations, enabling users to systematically assess LLM performance.

How It Works

Inspect provides a modular architecture allowing extensions via other Python packages. This design facilitates the integration of new elicitation and scoring techniques, promoting flexibility and extensibility in LLM evaluation methodologies.

Quick Start & Requirements

  • Install with: pip install -e ".[dev]"
  • Development setup requires cloning the repository and installing optional dependencies.
  • Pre-commit hooks can be installed via make hooks.
  • Linting, formatting, and tests are available via make check and make test.
  • Recommended VS Code extensions include Python, Ruff, and MyPy.
  • Official documentation is available at https://inspect.aisi.org.uk/.

Highlighted Details

  • Comprehensive framework for LLM evaluations.
  • Built-in support for prompt engineering, tool usage, and multi-turn dialogue.
  • Facilitates model-graded evaluations.
  • Extensible architecture for custom elicitation and scoring techniques.

Maintenance & Community

The project is developed by the UK AI Security Institute. Further community engagement details are not specified in the README.

Licensing & Compatibility

The license is not specified in the README.

Limitations & Caveats

The README does not specify licensing details, which may impact commercial use or closed-source integration.

Health Check
Last commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)
108
Issues (30d)
19
Star History
279 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.