llm-engine  by scaleapi

Open-source engine for fine-tuning and serving LLMs

Created 2 years ago
820 stars

Top 43.3% on SourcePulse

GitHubView on GitHub
Project Summary

Scale LLM Engine provides an open-source solution for fine-tuning and serving large language models (LLMs). It targets developers and ML engineers seeking to customize and deploy models like LLaMA, MPT, and Falcon, offering both a hosted API via Scale and self-hosted deployment on Kubernetes. The engine aims to simplify LLM operations, reduce costs, and improve inference performance.

How It Works

LLM Engine offers a Python library and CLI for interacting with LLMs, abstracting away infrastructure complexities. It supports deploying Hugging Face models with a single command and provides optimized inference with features like streaming responses and dynamic batching. For self-hosting, it utilizes Helm charts for Kubernetes deployments, enabling fine-tuning on custom data and efficient model scaling to zero when idle for cost savings.

Quick Start & Requirements

  • Install via pip: pip install scale-llm-engine
  • Requires an API key from Scale Spellbook, set as the SCALE_API_KEY environment variable.
  • Example usage provided via Python client.
  • Documentation: https://docs.scale.com/llm-engine/

Highlighted Details

  • Supports popular open-source models (LLaMA, MPT, Falcon) and any Hugging Face model.
  • Offers fine-tuning capabilities on user-provided data.
  • Features optimized inference with streaming and dynamic batching for higher throughput and lower latency.
  • Implements fast cold-start times by scaling models to zero when inactive.

Maintenance & Community

  • Developed by Scale AI.
  • Links to documentation and blog posts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project is actively developing Kubernetes installation documentation for self-hosted deployments, with current documentation primarily focused on Scale's hosted infrastructure.

Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
6
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
8 more.

h2o-llmstudio by h2oai

0.1%
5k
LLM Studio: framework for LLM fine-tuning via GUI or CLI
Created 2 years ago
Updated 3 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.