moatless-tools by aorwall

CLI tool for LLM-based code editing experiments on large codebases

Created 2 years ago

618 stars

Top 53.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Paul Gauthier

Founder of Aider

Project Summary

This project provides tools for using Large Language Models (LLMs) to edit code in large codebases, focusing on building robust tooling for prompt context and response handling rather than relying solely on agent reasoning. It's targeted at researchers and developers experimenting with LLM-powered software engineering tasks, offering a framework for evaluating LLM performance on code modification benchmarks.

How It Works

The core approach involves creating a structured environment for LLMs to interact with code. It leverages a CodeIndex for semantic code search and a FileContext to manage code changes. The AgenticLoop orchestrates the interaction between an LLM (CodingAgent) and the codebase, iterating through problem-solving steps. This method aims to improve LLM accuracy by providing precise context and handling LLM outputs effectively, particularly for complex code editing tasks.

Quick Start & Requirements

Installation: pip install moatless (base), pip install "moatless[streamlit]" (with UI), pip install "moatless[api]" (with API server), or pip install "moatless[all]" (all features). Alternatively, install from source using Poetry.
Prerequisites: API keys for LLM providers (OpenAI, Anthropic, etc.) and Voyage AI are required. Environment variables (.env file or direct export) must be configured for API keys and directories. A specific litellm dependency is needed for Claude 3.5 Sonnet.
Resources: Setup involves cloning the repo and installing dependencies. Running evaluations requires LLM API access and potentially a testbed environment.
Docs: https://github.com/aorwall/moatless-tools

Highlighted Details

Evaluates LLMs on the SWE-Bench benchmark, reporting up to 39% solve rate for Claude 3.5 Sonnet and 50% for Deepseek Reasoner.
Supports various LLMs including Claude, GPT, Deepseek, and Gemini, with configurable response formats and message history types.
Includes a SvelteKit-based UI for visualizing code editing trajectories.
Provides example Python code for integrating CodingAgent and AgenticLoop with specific LLMs.

Maintenance & Community

This appears to be a hobby project with a single primary contributor. Community engagement channels are not explicitly mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that the current version of litellm lacks support for computer use tools required by Claude 3.5 Sonnet, necessitating a specific dependency fork. Some models may not have been extensively tested.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days