LLMDebugger by FloridSleeves

LLM debugger refines programs using runtime execution info

Created 1 year ago

574 stars

Top 56.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gabriel Almeida

Cofounder of Langflow

Project Summary

LDB is a novel debugging framework for Large Language Models (LLMs) that enhances their ability to refine generated programs by incorporating runtime execution information. It targets researchers and developers working on LLM-based code generation and translation, aiming to improve the accuracy and reliability of LLM-produced code.

How It Works

LDB mimics human debugging by segmenting programs into basic blocks and tracking intermediate variable values after each block's execution. This approach allows LLMs to focus on smaller, verifiable code units, identify errors step-by-step, and efficiently correct them. The framework supports various LLM backends, including OpenAI models and open-source models like StarCoder and CodeLlama via vLLM.

Quick Start & Requirements

Installation: Use Conda to create an environment (conda create -n ldb python=3.10) and activate it (conda activate ldb), then install dependencies (python -m pip install -r requirements.txt).
Prerequisites: OpenAI API key for OpenAI models. For StarCoder/CodeLlama, an OpenAI-compatible server setup using vLLM is recommended.
Usage:
- Generate program seeds: ./run_simple.sh [dataset] [model] [output_dir]
- Debug programs: ./run_ldb.sh [dataset] [model] [seed] [output_dir]
Resources: Refer to the README for detailed instructions on setting up vLLM backends and example usage.

Highlighted Details

Achieves 98.2% accuracy on GPT-4o based on Reflexion seeds.
Supports debugging at 'line', 'block', or 'function' levels.
Provides direct APIs (PyGenerator.ldb_debug, PyGenerator.ldb_generate) for programmatic integration.
Compatible with datasets like HumanEval, MBPP, and TransCoder.

Maintenance & Community

The project is associated with the ACL 2024 paper "Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step by Step." It adapts code from Reflexion and staticfg. Users can post issues for questions or bugs.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify any explicit limitations or caveats regarding supported platforms, known bugs, or alpha status. Users should be aware of the dependency on external LLM providers or self-hosted vLLM servers.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days