FastChat  by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

created 2 years ago
38,917 stars

Top 0.8% on sourcepulse

GitHubView on GitHub
Project Summary

FastChat provides an open platform for training, serving, and evaluating large language model (LLM) based chatbots. It is the engine behind Chatbot Arena, a popular platform for comparing LLM performance, and offers tools for researchers and developers to deploy and benchmark their own models.

How It Works

FastChat employs a distributed architecture for serving LLMs, comprising a controller, model workers, and a web server. This design allows for scalable deployment of multiple models and provides OpenAI-compatible RESTful APIs for seamless integration. It supports various inference backends and quantization methods for efficient deployment.

Quick Start & Requirements

  • Install: pip3 install "fschat[model_worker,webui]" or from source.
  • Prerequisites: Python 3.x, PyTorch. GPU with CUDA is recommended for performance. Specific models may require transformers>=4.31 for 16K context.
  • Resources: Vicuna-7B requires ~14GB GPU VRAM; Vicuna-13B requires ~28GB. 8-bit quantization reduces memory by ~50%.
  • Docs: FastChat, Demo, Chatbot Arena

Highlighted Details

  • Powers Chatbot Arena, serving 10M+ requests and collecting 1.5M+ human votes for LLM Elo rankings.
  • Supports a wide range of LLMs including Vicuna, Llama 2, Falcon, Mistral, and API-based models (OpenAI, Anthropic, Gemini).
  • Offers OpenAI-compatible RESTful APIs for easy integration.
  • Includes MT-Bench for multi-turn evaluation and LMSYS-Chat-1M dataset.

Maintenance & Community

  • Actively developed by LMSYS Org.
  • Community support via Discord.
  • X handle: @lmsysorg

Licensing & Compatibility

  • Code is typically under Apache 2.0. Model weights (e.g., Vicuna) are subject to their base model licenses (e.g., Llama 2 license).
  • Commercial use of model weights depends on their respective licenses.

Limitations & Caveats

  • 8-bit quantization may slightly degrade model quality. CPU offloading is Linux-only and requires bitsandbytes.
  • Performance can vary significantly based on hardware and chosen inference backend (e.g., vLLM integration for higher throughput).
Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
568 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

LightLLM by ModelTC

0.7%
3k
Python framework for LLM inference and serving
created 2 years ago
updated 11 hours ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Feedback? Help us improve.