Paper2Code  by going-doer

Multi-agent LLM system for paper-to-code generation

created 3 months ago
3,064 stars

Top 16.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides Paper2Code, a multi-agent Large Language Model (LLM) system designed to automate the generation of code repositories from scientific papers in machine learning. It targets researchers and developers seeking to quickly reproduce or build upon published ML work, offering a significant time-saving benefit by translating complex research papers into functional code.

How It Works

Paper2Code employs a three-stage pipeline: planning, analysis, and code generation, with each stage handled by specialized LLM agents. This modular approach allows for focused expertise at each step, leading to more accurate and faithful code implementations. The system aims to produce high-quality, reproducible code directly from the paper's content.

Quick Start & Requirements

  • OpenAI API: pip install openai, export OPENAI_API_KEY=" ", cd scripts && bash run.sh
  • Open Source Models (vLLM): pip install vllm, cd scripts && bash run_llm.sh (default model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)
  • Prerequisites: Python, openai or vllm, tiktoken for evaluation. PDF to JSON conversion requires cloning allenai/s2orc-doc2json and running its GROBID service.
  • Resources: OpenAI API usage incurs costs ($0.50-$0.70 estimated for o3-mini). Running open-source models requires vLLM setup.
  • Links: Paper on arXiv, s2orc-doc2json

Highlighted Details

  • Outperforms strong baselines on Paper2Code and PaperBench benchmarks.
  • Supports both OpenAI API and open-source LLMs via vLLM.
  • Includes a model-based evaluation framework for generated repositories (reference-based and reference-free).
  • Provides example usage for the "Attention Is All You Need" paper.

Maintenance & Community

The project is associated with going-doer and allenai. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license for the Paper2Code project itself. However, it relies on external repositories like s2orc-doc2json which may have their own licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

The system's performance is dependent on the quality of the input PDF and the capabilities of the chosen LLM. Estimated costs for API usage are provided, and users must manage their own API keys. The PDF-to-JSON conversion step requires setting up a separate service.

Health Check
Last commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1,863 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.