monitors4codegen by microsoft

Research paper code/data for monitor-guided code LM decoding via static analysis

Created 2 years ago

277 stars

Top 93.6% on SourcePulse

Project Summary

This repository provides the code and data for "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context," a NeurIPS 2023 paper. It introduces Monitor-Guided Decoding (MGD) to improve code generation by using static analysis monitors to guide Large Language Models (LLMs). The project is relevant for researchers and engineers working on code generation, LLM evaluation, and static analysis integration.

How It Works

The core innovation is Monitor-Guided Decoding (MGD), which leverages static analysis to constrain LLM output during generation. A "monitor" component, built using the multilspy library, queries language servers for static analysis results (e.g., type information, method signatures). These results are then used to guide the LLM's decoding process, preventing common errors like "symbol not found" and improving code correctness. This approach enhances compilation rates and ground-truth matching without requiring model retraining.

Quick Start & Requirements

Installation: Create a Python virtual environment (venv or conda) with Python 3.10+. Install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.10+, Git LFS for large data files.
Evaluation: Run python3 eval_results.py <inference_results_csv> <pragmatic_code_filecontents_json> <output_directory> to reproduce paper results. A sample evaluation can be run with python3 evaluation_scripts/eval_results.py inference_results/dotprompts_results_sample.csv datasets/PragmaticCode/fileContentsByRepo.json results_sample/.
Datasets: PragmaticCode (Java projects) and DotPrompts (method completion examples) are available at Zenodo.
Documentation: Usage examples for multilspy are in its repository tests.

Highlighted Details

MGD improves compilation rates by 19-25% and boosts ground-truth match across granularities.
Supports multiple monitors for joint property enforcement.
multilspy library provides a unified interface to language servers for static analysis.
Includes datasets (PragmaticCode, DotPrompts) and inference results for various LLMs.

Maintenance & Community

The multilspy library has been migrated to its own repository (microsoft/multilspy). Contributions are welcome via pull requests, subject to a Contributor License Agreement (CLA). The project follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. The multilspy library is available via a zip archive from GitHub, implying a permissive license, but specific terms should be verified.

Limitations & Caveats

The README mentions a RuntimeError related to asyncio event loops when running tests, recommending Python >= 3.10. The primary datasets and inference results are large and require Git LFS. The specific license for the monitors4codegen repository itself is not clearly defined.

monitors4codegen by microsoft

Explore Similar Projects

godot-dodo by minosvasilias

llm-verified-with-monte-carlo-tree-search by namin

cclsp by ktnyt

LLMDebugger by FloridSleeves

cwm by facebookresearch

verilog-eval by NVlabs

granite-code-models by ibm-granite

multilspy by microsoft

CodeTF by salesforce

LiveCodeBench by LiveCodeBench

Awesome-Code-LLM by codefuse-ai

CodeGeeX by zai-org