AlphaCodium by Codium-ai

Code generation research paper implementation

Created 2 years ago

3,914 stars

Top 12.3% on SourcePulse

View on GitHub

8 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

and 4 more!

Project Summary

AlphaCodium is an official implementation of a test-based, multi-stage, iterative flow designed to significantly improve Large Language Model (LLM) performance on code generation tasks, particularly competitive programming problems. It targets researchers and developers aiming to enhance LLM accuracy and robustness in generating syntactically correct and functionally sound code.

How It Works

AlphaCodium employs a "flow engineering" approach, prioritizing structured, multi-stage interactions with LLMs over simple prompt engineering. The core methodology involves iterative refinement of generated code, guided by test cases (both public and AI-generated) and self-reflection mechanisms. This test-driven, iterative process allows the LLM to identify and correct errors, leading to higher success rates compared to single-shot prompting.

Quick Start & Requirements

Install:
1. Create and activate a virtual environment: python3 -m venv venv && source ./venv/bin/activate
2. Install dependencies: pip install -r requirements.txt
3. Copy alpha_codium/settings/.secrets_template.toml to alpha_codium/settings/.secrets.toml and add your OpenAI API key.
4. Download and extract the CodeContest dataset to the project root.
Prerequisites: Python 3, OpenAI API key.
Running:
- Solve a specific problem: python -m alpha_codium.solve_problem --dataset_name /path/to/dataset --split_name test --problem_number 0
- Solve an entire dataset split: python -m alpha_codium.solve_dataset --dataset_name /path/to/dataset --split_name test --database_solution_path /path/to/output/dir/dataset_output.json
Links: Paper, Dataset

Highlighted Details

Achieves a 44% pass@5 accuracy on the CodeContests validation set using GPT-4, a significant improvement from 19% with a single prompt.
Emphasizes "flow engineering" (95% of effort) over traditional prompt engineering.
The approach is language-agnostic, demonstrated with Python generation.
Uses LLM calls sparingly (15-20 calls per solution), four orders of magnitude fewer than AlphaCode.

Maintenance & Community

The project is associated with CodiumAI. The README mentions updates to the AlphaCodium leaderboard with new GPT and Claude models, indicating ongoing activity.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The README notes that for LLMs with larger context windows, injecting too much information can lead to the model ignoring critical details. While the number of LLM calls is reduced, it may still be substantial for some applications. The project's relevance to specific programming languages is stated as none, but the implementation details and dataset are Python-centric.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days