DS-1000  by xlang-ai

Benchmark for data science code generation

Created 2 years ago
256 stars

Top 98.6% on SourcePulse

GitHubView on GitHub
Project Summary

DS-1000 provides a benchmark for evaluating data science code generation models, focusing on natural language prompts and reliable execution. It targets researchers and developers building AI assistants for data science tasks, offering a standardized way to measure model performance across various libraries.

How It Works

The benchmark consists of 1000 data science problems, each with a natural language prompt, execution context, and evaluation logic. Generated code is executed within a sandboxed environment that includes test execution and string validation functions. This approach ensures that solutions are not only syntactically correct but also produce the expected outputs for given inputs and library states.

Quick Start & Requirements

  • Install via conda env create -f environment.yml and conda activate ds1000-3.10.
  • Additional dependencies: pip install datasets tqdm.
  • The dataset can be loaded from Hugging Face (load_dataset("xlangai/DS-1000")) or a local data/ds1000.jsonl.gz file.
  • Official project page: https://github.com/xlang-ai/DS-1000

Highlighted Details

  • Evaluates models on 7 popular data science libraries: Matplotlib, Numpy, Pandas, Pytorch, Scipy, Scikit-learn, and Tensorflow.
  • Simplified dataset format hosted on Hugging Face for improved usability.
  • Includes reference solutions and evaluation scripts for testing generated code.

Maintenance & Community

The project is associated with ICML 2023 and includes citation information for the original paper.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

A small percentage of executions are stateful, requiring each problem to be run in an independent process. Minor inconsistencies with the original dataset may exist due to import handling. The dataset may contain a small number of errors inherent in human-labeled data.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

AlphaCodium by Codium-ai

0%
4k
Code generation research paper implementation
Created 1 year ago
Updated 10 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
5 more.

CodeXGLUE by microsoft

0.3%
2k
Benchmark for code intelligence tasks
Created 5 years ago
Updated 1 year ago
Starred by Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
14 more.

SWE-bench by SWE-bench

0.6%
4k
Benchmark for evaluating LLMs on real-world GitHub issues
Created 2 years ago
Updated 3 days ago
Feedback? Help us improve.