Code repo for OpenAI preparedness evals
Top 44.5% on sourcepulse
This repository provides code for multiple "Preparedness" evaluations, leveraging the nanoeval and alcatraz frameworks. It is intended for researchers and developers working on evaluating AI system preparedness, offering a structured approach to benchmarking.
How It Works
The project utilizes the nanoeval and alcatraz libraries to define and execute evaluation benchmarks. This modular approach allows for the creation of diverse evaluation suites, such as PaperBench and forthcoming MLE-bench, facilitating systematic assessment of AI capabilities.
Quick Start & Requirements
pip install -e project/<project_name>
for nanoeval
, alcatraz
, and nanoeval_alcatraz
.chz
).Highlighted Details
Maintenance & Community
No specific community channels or maintenance details are provided in the README.
Licensing & Compatibility
The repository does not specify a license.
Limitations & Caveats
Python 3.12 is untested, and Python 3.13 is known to break the chz
dependency. The repository is focused on specific evaluation frameworks and does not offer broader AI development tools.
2 days ago
1 day