llm-engine by scaleapi

Open-source engine for fine-tuning and serving LLMs

Created 2 years ago

820 stars

Top 43.3% on SourcePulse

View on GitHub

8 Experts Love This Project

Alexandr Wang

Chief AI Officer at Meta; Cofounder of Scale AI

Jeff Hammerbacher

Cofounder of Cloudera

and 4 more!

Project Summary

Scale LLM Engine provides an open-source solution for fine-tuning and serving large language models (LLMs). It targets developers and ML engineers seeking to customize and deploy models like LLaMA, MPT, and Falcon, offering both a hosted API via Scale and self-hosted deployment on Kubernetes. The engine aims to simplify LLM operations, reduce costs, and improve inference performance.

How It Works

LLM Engine offers a Python library and CLI for interacting with LLMs, abstracting away infrastructure complexities. It supports deploying Hugging Face models with a single command and provides optimized inference with features like streaming responses and dynamic batching. For self-hosting, it utilizes Helm charts for Kubernetes deployments, enabling fine-tuning on custom data and efficient model scaling to zero when idle for cost savings.

Quick Start & Requirements

Install via pip: pip install scale-llm-engine
Requires an API key from Scale Spellbook, set as the SCALE_API_KEY environment variable.
Example usage provided via Python client.
Documentation: https://docs.scale.com/llm-engine/

Highlighted Details

Supports popular open-source models (LLaMA, MPT, Falcon) and any Hugging Face model.
Offers fine-tuning capabilities on user-provided data.
Features optimized inference with streaming and dynamic batching for higher throughput and lower latency.
Implements fast cold-start times by scaling models to zero when inactive.

Maintenance & Community

Developed by Scale AI.
Links to documentation and blog posts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The project is actively developing Kubernetes installation documentation for self-hosted deployments, with current documentation primarily focused on Scale's hosted infrastructure.

Health Check

Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days