FastChat by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

Created 2 years ago

39,352 stars

Top 0.8% on SourcePulse

View on GitHub

41 Experts Love This Project

Jasper Zhang

Cofounder of Hyperbolic

Anastasios Angelopoulos

Cofounder of LMArena

Omar Sanseviero

DevRel at Google DeepMind

Nathan Lambert

Research Scientist at AI2

and 37 more!

Project Summary

FastChat provides an open platform for training, serving, and evaluating large language model (LLM) based chatbots. It is the engine behind Chatbot Arena, a popular platform for comparing LLM performance, and offers tools for researchers and developers to deploy and benchmark their own models.

How It Works

FastChat employs a distributed architecture for serving LLMs, comprising a controller, model workers, and a web server. This design allows for scalable deployment of multiple models and provides OpenAI-compatible RESTful APIs for seamless integration. It supports various inference backends and quantization methods for efficient deployment.

Quick Start & Requirements

Install: pip3 install "fschat[model_worker,webui]" or from source.
Prerequisites: Python 3.x, PyTorch. GPU with CUDA is recommended for performance. Specific models may require transformers>=4.31 for 16K context.
Resources: Vicuna-7B requires ~14GB GPU VRAM; Vicuna-13B requires ~28GB. 8-bit quantization reduces memory by ~50%.
Docs: FastChat, Demo, Chatbot Arena

Highlighted Details

Powers Chatbot Arena, serving 10M+ requests and collecting 1.5M+ human votes for LLM Elo rankings.
Supports a wide range of LLMs including Vicuna, Llama 2, Falcon, Mistral, and API-based models (OpenAI, Anthropic, Gemini).
Offers OpenAI-compatible RESTful APIs for easy integration.
Includes MT-Bench for multi-turn evaluation and LMSYS-Chat-1M dataset.

Maintenance & Community

Actively developed by LMSYS Org.
Community support via Discord.
X handle: @lmsysorg

Licensing & Compatibility

Code is typically under Apache 2.0. Model weights (e.g., Vicuna) are subject to their base model licenses (e.g., Llama 2 license).
Commercial use of model weights depends on their respective licenses.

Limitations & Caveats

8-bit quantization may slightly degrade model quality. CPU offloading is Linux-only and requires bitsandbytes.
Performance can vary significantly based on hardware and chosen inference backend (e.g., vLLM integration for higher throughput).

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

94 stars in the last 30 days