preparedness  by openai

Code repo for OpenAI preparedness evals

created 4 months ago
812 stars

Top 44.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for multiple "Preparedness" evaluations, leveraging the nanoeval and alcatraz frameworks. It is intended for researchers and developers working on evaluating AI system preparedness, offering a structured approach to benchmarking.

How It Works

The project utilizes the nanoeval and alcatraz libraries to define and execute evaluation benchmarks. This modular approach allows for the creation of diverse evaluation suites, such as PaperBench and forthcoming MLE-bench, facilitating systematic assessment of AI capabilities.

Quick Start & Requirements

  • Primary install: pip install -e project/<project_name> for nanoeval, alcatraz, and nanoeval_alcatraz.
  • Prerequisites: Python 3.11 (3.12 untested, 3.13 will break chz).
  • Setup: Requires running a bash script to install project dependencies.

Highlighted Details

  • Supports multiple evaluation frameworks including nanoeval and alcatraz.
  • Includes PaperBench evaluation.
  • Forthcoming evaluations: SWELancer and MLE-bench.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not specify a license.

Limitations & Caveats

Python 3.12 is untested, and Python 3.13 is known to break the chz dependency. The repository is focused on specific evaluation frameworks and does not offer broader AI development tools.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
31
Issues (30d)
3
Star History
100 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.