Benchmark for data science code generation
Top 99.6% on SourcePulse
DS-1000 provides a benchmark for evaluating data science code generation models, focusing on natural language prompts and reliable execution. It targets researchers and developers building AI assistants for data science tasks, offering a standardized way to measure model performance across various libraries.
How It Works
The benchmark consists of 1000 data science problems, each with a natural language prompt, execution context, and evaluation logic. Generated code is executed within a sandboxed environment that includes test execution and string validation functions. This approach ensures that solutions are not only syntactically correct but also produce the expected outputs for given inputs and library states.
Quick Start & Requirements
conda env create -f environment.yml
and conda activate ds1000-3.10
.pip install datasets tqdm
.load_dataset("xlangai/DS-1000")
) or a local data/ds1000.jsonl.gz
file.Highlighted Details
Maintenance & Community
The project is associated with ICML 2023 and includes citation information for the original paper.
Licensing & Compatibility
The repository does not explicitly state a license.
Limitations & Caveats
A small percentage of executions are stateful, requiring each problem to be run in an independent process. Minor inconsistencies with the original dataset may exist due to import handling. The dataset may contain a small number of errors inherent in human-labeled data.
9 months ago
Inactive