llm-numbers  by ray-project

LLM developer's reference for key numbers

Created 2 years ago
4,257 stars

Top 11.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a curated list of essential numerical benchmarks and cost-performance ratios for Large Language Model (LLM) development. It aims to equip LLM developers with the data needed for efficient back-of-the-envelope calculations, guiding decisions on model selection, prompt engineering, and infrastructure choices to optimize cost and performance.

How It Works

The project compiles key figures related to LLM usage, such as token-to-word ratios, cost differentials between various OpenAI models (GPT-4 vs. GPT-3.5 Turbo, embeddings vs. generation), and self-hosting versus API costs. It also details GPU memory requirements for inference and provides estimates for training and fine-tuning costs, offering practical insights into the economics of LLM deployment.

Quick Start & Requirements

This repository is a documentation resource, not a software package. No installation or execution is required.

Highlighted Details

  • GPT-4 is approximately 50x more expensive than GPT-3.5 Turbo for inference.
  • Using vector stores for lookups is roughly 5x cheaper than querying GPT-3.5 Turbo.
  • Self-hosting embeddings can be ~10x cheaper than using OpenAI's embedding API.
  • Serving a fine-tuned model on OpenAI costs 6x more than serving a base model.

Maintenance & Community

Last updated May 17, 2023. Contributions are welcomed via GitHub issues or pull requests. The project is associated with Anyscale and the Ray ecosystem. Community discussions can be found on the Ray Slack (#LLM channel) and Discuss forum.

Licensing & Compatibility

The repository content is not explicitly licensed. The associated Ray project is Apache 2.0 licensed.

Limitations & Caveats

The numbers presented are based on specific dates and OpenAI's pricing at that time, and are subject to change. Some figures, like self-hosted embedding costs, are noted as sensitive to load and batch size. The cost to train a 13B parameter model is a highly idealized estimate.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

MobiLlama by mbzuai-oryx

0%
660
Small language model for edge devices
Created 1 year ago
Updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
9 more.

RouteLLM by lm-sys

0.3%
4k
Framework for LLM routing and cost reduction (research paper)
Created 1 year ago
Updated 1 year ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
11 more.

mistral.rs by EricLBuehler

0.3%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 22 hours ago
Feedback? Help us improve.