Multi-agent LLM system for paper-to-code generation
Top 16.0% on sourcepulse
This project provides Paper2Code, a multi-agent Large Language Model (LLM) system designed to automate the generation of code repositories from scientific papers in machine learning. It targets researchers and developers seeking to quickly reproduce or build upon published ML work, offering a significant time-saving benefit by translating complex research papers into functional code.
How It Works
Paper2Code employs a three-stage pipeline: planning, analysis, and code generation, with each stage handled by specialized LLM agents. This modular approach allows for focused expertise at each step, leading to more accurate and faithful code implementations. The system aims to produce high-quality, reproducible code directly from the paper's content.
Quick Start & Requirements
pip install openai
, export OPENAI_API_KEY=" "
, cd scripts && bash run.sh
pip install vllm
, cd scripts && bash run_llm.sh
(default model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
)openai
or vllm
, tiktoken
for evaluation. PDF to JSON conversion requires cloning allenai/s2orc-doc2json
and running its GROBID service.o3-mini
). Running open-source models requires vLLM setup.Highlighted Details
Maintenance & Community
The project is associated with going-doer and allenai. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state the license for the Paper2Code project itself. However, it relies on external repositories like s2orc-doc2json
which may have their own licenses. Compatibility for commercial use is not specified.
Limitations & Caveats
The system's performance is dependent on the quality of the input PDF and the capabilities of the chosen LLM. Estimated costs for API usage are provided, and users must manage their own API keys. The PDF-to-JSON conversion step requires setting up a separate service.
2 weeks ago
Inactive