llm-numbers by ray-project

LLM developer's reference for key numbers

Created 2 years ago

4,279 stars

Top 11.4% on SourcePulse

View on GitHub

13 Experts Love This Project

David Cournapeau

Author of scikit-learn

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 9 more!

Project Summary

This repository provides a curated list of essential numerical benchmarks and cost-performance ratios for Large Language Model (LLM) development. It aims to equip LLM developers with the data needed for efficient back-of-the-envelope calculations, guiding decisions on model selection, prompt engineering, and infrastructure choices to optimize cost and performance.

How It Works

The project compiles key figures related to LLM usage, such as token-to-word ratios, cost differentials between various OpenAI models (GPT-4 vs. GPT-3.5 Turbo, embeddings vs. generation), and self-hosting versus API costs. It also details GPU memory requirements for inference and provides estimates for training and fine-tuning costs, offering practical insights into the economics of LLM deployment.

Quick Start & Requirements

This repository is a documentation resource, not a software package. No installation or execution is required.

Highlighted Details

GPT-4 is approximately 50x more expensive than GPT-3.5 Turbo for inference.
Using vector stores for lookups is roughly 5x cheaper than querying GPT-3.5 Turbo.
Self-hosting embeddings can be ~10x cheaper than using OpenAI's embedding API.
Serving a fine-tuned model on OpenAI costs 6x more than serving a base model.

Maintenance & Community

Last updated May 17, 2023. Contributions are welcomed via GitHub issues or pull requests. The project is associated with Anyscale and the Ray ecosystem. Community discussions can be found on the Ray Slack (#LLM channel) and Discuss forum.

Licensing & Compatibility

The repository content is not explicitly licensed. The associated Ray project is Apache 2.0 licensed.

Limitations & Caveats

The numbers presented are based on specific dates and OpenAI's pricing at that time, and are subject to change. Some figures, like self-hosted embedding costs, are noted as sensitive to load and batch size. The cost to train a 13B parameter model is a highly idealized estimate.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days