turbopilot  by ravenscroftj

Self-hosted code completion engine (deprecated)

created 2 years ago
3,822 stars

Top 13.0% on sourcepulse

GitHubView on GitHub
Project Summary

TurboPilot was an open-source, self-hosted code completion engine designed to run large language models locally on CPU, targeting developers seeking an alternative to cloud-based AI coding assistants. It aimed to provide efficient, private code suggestions by leveraging quantized models and the llama.cpp library.

How It Works

TurboPilot utilizes the llama.cpp library to run quantized versions of large language models, such as Salesforce Codegen, WizardCoder, and Starcoder, on consumer hardware. This approach allows for local inference, reducing reliance on external servers and enhancing privacy. The project supports various model formats and quantization levels, enabling users with limited RAM (as low as 4GB) to run capable models, while also offering GPU offloading for enhanced performance.

Quick Start & Requirements

  • Install/Run: Download binaries or run via Docker. Example: ./turbopilot -m starcoder -f ./models/santacoder-q4_0.bin or docker run --rm -it -v ./models:/models -e THREADS=6 -e MODEL_TYPE=starcoder -e MODEL="/models/santacoder-q4_0.bin" -p 18080:18080 ghcr.io/ravenscroftj/turbopilot:latest.
  • Prerequisites: Models must be downloaded separately (e.g., from Huggingface). CUDA 11/12 required for GPU acceleration via specific Docker images.
  • Resources: Can run on 4GB RAM for smaller models; GPU recommended for larger models and better performance.
  • Docs: MODELS.md for model catalog.

Highlighted Details

  • Supports multiple state-of-the-art local code completion models including WizardCoder, Starcoder, and Santacoder.
  • Offers CUDA inference support via Docker for GPU acceleration.
  • API is broadly compatible with OpenAI's format, usable with the vscode-fauxpilot plugin.
  • Refactored source code for easier extension and model integration.

Maintenance & Community

TurboPilot is deprecated and archived as of September 30, 2023. The author recommends exploring more mature alternatives.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README, but it relies on GGML and llama.cpp, which are typically under permissive licenses. Compatibility for commercial use or closed-source linking would require verification of the specific model licenses and the project's own licensing.

Limitations & Caveats

The project is explicitly marked as deprecated and archived. It was considered a proof-of-concept with potentially slow autocompletion and only supports one GPU device at a time.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.