llm-verified-with-monte-carlo-tree-search  by namin

LLM code synthesis via Monte Carlo Tree Search

created 1 year ago
277 stars

Top 94.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of synthesizing verified code by integrating Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) and formal verification. It enables weaker LLMs to compete with stronger ones by guiding generation with a verifier, targeting researchers and developers in program synthesis and formal methods.

How It Works

The core approach uses MCTS to explore the vast space of potential program generations. At each step of the search, a formal verifier (Dafny, Coq, Lean, Scala, or Rust) is invoked to check the partial program's correctness. This feedback loop prunes incorrect branches early, ensuring that only valid or promising code paths are pursued, leading to provably correct program synthesis.

Quick Start & Requirements

  • Install: Clone with --recurse-submodules, create a conda environment (conda create --name llm-verified python=3.10), activate it, and run pip install -r requirements.txt. Hugging Face authentication is required.
  • Prerequisites: GPU access (tested on NVIDIA A100s), Python 3.10, Conda, Hugging Face CLI, and potentially specific verifier installations (Dafny, Coq, Lean). Docker/Singularity images are available.
  • Links: Dockerfile, Docker Hub

Highlighted Details

  • Leverages MCTS for guided program synthesis.
  • Supports multiple formal verification backends: Dafny, Coq, Lean, Scala, Rust.
  • Achieved an Outstanding Paper Award at the Math-AI workshop co-located with NeurIPS 2024.
  • Includes trainers for PPO and DPO reinforcement learning methods.

Maintenance & Community

The project is associated with authors from institutions like Google DeepMind and the University of Cambridge. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text.

Limitations & Caveats

The project is described as a prototype. Setup requires significant technical expertise, including managing LLM dependencies, Hugging Face authentication, and potentially complex installations for formal verifiers like Coq or Lean. GPU resources are mandatory.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.