supercharger  by catid

CLI tool for LLM-powered code generation and unit testing

created 2 years ago
351 stars

Top 80.4% on sourcepulse

GitHubView on GitHub
Project Summary

Supercharger aims to automate software development by leveraging locally-hosted Large Language Models (LLMs) to generate code and unit tests. It targets developers and researchers seeking to accelerate the coding process, offering a robust framework for distributed LLM inference and automated code validation.

How It Works

Supercharger employs a distributed architecture with a load balancer managing multiple worker nodes. Each worker node runs LLMs, specifically optimized for code generation and testing. The system uses prompt engineering tailored for code, generates multiple code/test pairs, and iteratively tests them until a passing pair is found. An AI evaluator scores the code and tests, and a virtual machine sandbox ensures the safety of executed candidate code.

Quick Start & Requirements

  • Install: Clone the repository, set up a Conda environment (conda create -n supercharger python=3.10), activate it (conda activate supercharger), and run ./update.sh.
  • Prerequisites: Docker, Python 3.10, Conda, passwordless SSH access between nodes.
  • Hardware: Designed for clusters of Linux servers, each with multiple GPUs (e.g., two 3090 or 4090 GPUs) for model parallelism. Tested with Baize-30B model using 8-bit quantization.
  • Resources: Requires significant GPU resources and a distributed setup.
  • Docs: https://docs.google.com/spreadsheets/d/1TYBNr_UPJ7wCzJThuk5ysje7K1x-_62JhBeXDbmrjA8/edit?usp=sharing

Highlighted Details

  • Generates multiple code and unit test combinations, executing them until a valid pair passes.
  • Utilizes an AI to score code and test quality.
  • Implements thorough code cleaning to remove LLM artifacts.
  • Executes candidate code within a virtual machine for safety.
  • Supports distributed inference across multiple nodes via a load balancer.

Maintenance & Community

  • The project is maintained by catid.io.
  • No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. It is crucial to verify licensing for any usage, especially commercial.

Limitations & Caveats

  • The setup requires a distributed environment with multiple GPUs and specific hardware configurations.
  • The launch_cluster.sh script may leave zombie processes, requiring manual cleanup via ./kill_gpu_users.sh.
  • The project is described as having future work items, suggesting it may not be feature-complete.
Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.