llm-engine  by scaleapi

Open-source engine for fine-tuning and serving LLMs

created 2 years ago
808 stars

Top 44.6% on sourcepulse

GitHubView on GitHub
Project Summary

Scale LLM Engine provides an open-source solution for fine-tuning and serving large language models (LLMs). It targets developers and ML engineers seeking to customize and deploy models like LLaMA, MPT, and Falcon, offering both a hosted API via Scale and self-hosted deployment on Kubernetes. The engine aims to simplify LLM operations, reduce costs, and improve inference performance.

How It Works

LLM Engine offers a Python library and CLI for interacting with LLMs, abstracting away infrastructure complexities. It supports deploying Hugging Face models with a single command and provides optimized inference with features like streaming responses and dynamic batching. For self-hosting, it utilizes Helm charts for Kubernetes deployments, enabling fine-tuning on custom data and efficient model scaling to zero when idle for cost savings.

Quick Start & Requirements

  • Install via pip: pip install scale-llm-engine
  • Requires an API key from Scale Spellbook, set as the SCALE_API_KEY environment variable.
  • Example usage provided via Python client.
  • Documentation: https://docs.scale.com/llm-engine/

Highlighted Details

  • Supports popular open-source models (LLaMA, MPT, Falcon) and any Hugging Face model.
  • Offers fine-tuning capabilities on user-provided data.
  • Features optimized inference with streaming and dynamic batching for higher throughput and lower latency.
  • Implements fast cold-start times by scaling models to zero when inactive.

Maintenance & Community

  • Developed by Scale AI.
  • Links to documentation and blog posts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project is actively developing Kubernetes installation documentation for self-hosted deployments, with current documentation primarily focused on Scale's hosted infrastructure.

Health Check
Last commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.