Paper2Code by going-doer

Multi-agent LLM system for paper-to-code generation

Created 8 months ago

3,969 stars

Top 12.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This project provides Paper2Code, a multi-agent Large Language Model (LLM) system designed to automate the generation of code repositories from scientific papers in machine learning. It targets researchers and developers seeking to quickly reproduce or build upon published ML work, offering a significant time-saving benefit by translating complex research papers into functional code.

How It Works

Paper2Code employs a three-stage pipeline: planning, analysis, and code generation, with each stage handled by specialized LLM agents. This modular approach allows for focused expertise at each step, leading to more accurate and faithful code implementations. The system aims to produce high-quality, reproducible code directly from the paper's content.

Quick Start & Requirements

OpenAI API: pip install openai, export OPENAI_API_KEY=" ", cd scripts && bash run.sh
Open Source Models (vLLM): pip install vllm, cd scripts && bash run_llm.sh (default model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)
Prerequisites: Python, openai or vllm, tiktoken for evaluation. PDF to JSON conversion requires cloning allenai/s2orc-doc2json and running its GROBID service.
Resources: OpenAI API usage incurs costs ($0.50-$0.70 estimated for o3-mini). Running open-source models requires vLLM setup.
Links: Paper on arXiv, s2orc-doc2json

Highlighted Details

Outperforms strong baselines on Paper2Code and PaperBench benchmarks.
Supports both OpenAI API and open-source LLMs via vLLM.
Includes a model-based evaluation framework for generated repositories (reference-based and reference-free).
Provides example usage for the "Attention Is All You Need" paper.

Maintenance & Community

The project is associated with going-doer and allenai. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license for the Paper2Code project itself. However, it relies on external repositories like s2orc-doc2json which may have their own licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

The system's performance is dependent on the quality of the input PDF and the capabilities of the chosen LLM. Estimated costs for API usage are provided, and users must manage their own API keys. The PDF-to-JSON conversion step requires setting up a separate service.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

97 stars in the last 30 days