llm-engine  by scaleapi

Open-source engine for fine-tuning and serving LLMs

Created 2 years ago
811 stars

Top 43.6% on SourcePulse

GitHubView on GitHub
Project Summary

Scale LLM Engine provides an open-source solution for fine-tuning and serving large language models (LLMs). It targets developers and ML engineers seeking to customize and deploy models like LLaMA, MPT, and Falcon, offering both a hosted API via Scale and self-hosted deployment on Kubernetes. The engine aims to simplify LLM operations, reduce costs, and improve inference performance.

How It Works

LLM Engine offers a Python library and CLI for interacting with LLMs, abstracting away infrastructure complexities. It supports deploying Hugging Face models with a single command and provides optimized inference with features like streaming responses and dynamic batching. For self-hosting, it utilizes Helm charts for Kubernetes deployments, enabling fine-tuning on custom data and efficient model scaling to zero when idle for cost savings.

Quick Start & Requirements

  • Install via pip: pip install scale-llm-engine
  • Requires an API key from Scale Spellbook, set as the SCALE_API_KEY environment variable.
  • Example usage provided via Python client.
  • Documentation: https://docs.scale.com/llm-engine/

Highlighted Details

  • Supports popular open-source models (LLaMA, MPT, Falcon) and any Hugging Face model.
  • Offers fine-tuning capabilities on user-provided data.
  • Features optimized inference with streaming and dynamic batching for higher throughput and lower latency.
  • Implements fast cold-start times by scaling models to zero when inactive.

Maintenance & Community

  • Developed by Scale AI.
  • Links to documentation and blog posts are provided. No direct community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The project is actively developing Kubernetes installation documentation for self-hosted deployments, with current documentation primarily focused on Scale's hosted infrastructure.

Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
10
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

mlx-lm by ml-explore

26.1%
2k
Python package for LLM text generation and fine-tuning on Apple silicon
Created 6 months ago
Updated 22 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.