MultiPL-E  by nuprl

Benchmark for evaluating code generation LLMs across multiple programming languages

created 3 years ago
268 stars

Top 95.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MultiPL-E is a benchmark system for evaluating Large Language Models (LLMs) on code generation tasks across multiple programming languages. It translates existing Python-based unit test-driven benchmarks like HumanEval and MBPP into 18 other languages, enabling comprehensive multilingual code LLM assessment.

How It Works

MultiPL-E employs a two-stage process: first, it generates code completions using LLMs, and second, it executes these completions against translated unit tests. The system's core innovation lies in its flexible translation framework, which allows users to define language-specific translators and execution scripts, facilitating the extension of benchmarks to new programming languages. This approach simplifies the creation of polyglot evaluation suites.

Quick Start & Requirements

  • Install: pip3 install aiohttp numpy tqdm pytest datasets torch transformers
  • Prerequisites: Python 3.8+, Docker or Podman.
  • Setup: Clone the repository (git clone https://github.com/nuprl/MultiPL-E), cd MultiPL-E.
  • Resources: Generation requires a GPU (e.g., ~13 GB VRAM for SantaCoder with batch-size 20). Execution requires a containerized environment or manually installed toolchains for each target language.
  • Docs: BigCode Code Generation LM Harness

Highlighted Details

  • Translates HumanEval and MBPP benchmarks to 18 languages.
  • Supports evaluation of various code generation LLMs (e.g., SantaCoder).
  • Provides a containerized execution environment with pre-installed toolchains.
  • Detailed instructions for adding support for new languages and benchmarks.

Maintenance & Community

The project is authored by a large team from Northeastern University and other institutions. Contributions are welcomed, and a changelog is available for acknowledgments.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify a license, which is a significant caveat for adoption. While the system supports adding new languages, the process for statically typed languages is noted as more challenging.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
41 more.

llama.cpp by ggml-org

0.7%
85k
C/C++ library for local LLM inference
created 2 years ago
updated 1 day ago
Feedback? Help us improve.