FastChat  by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

Created 2 years ago
39,112 stars

Top 0.8% on SourcePulse

GitHubView on GitHub
Project Summary

FastChat provides an open platform for training, serving, and evaluating large language model (LLM) based chatbots. It is the engine behind Chatbot Arena, a popular platform for comparing LLM performance, and offers tools for researchers and developers to deploy and benchmark their own models.

How It Works

FastChat employs a distributed architecture for serving LLMs, comprising a controller, model workers, and a web server. This design allows for scalable deployment of multiple models and provides OpenAI-compatible RESTful APIs for seamless integration. It supports various inference backends and quantization methods for efficient deployment.

Quick Start & Requirements

  • Install: pip3 install "fschat[model_worker,webui]" or from source.
  • Prerequisites: Python 3.x, PyTorch. GPU with CUDA is recommended for performance. Specific models may require transformers>=4.31 for 16K context.
  • Resources: Vicuna-7B requires ~14GB GPU VRAM; Vicuna-13B requires ~28GB. 8-bit quantization reduces memory by ~50%.
  • Docs: FastChat, Demo, Chatbot Arena

Highlighted Details

  • Powers Chatbot Arena, serving 10M+ requests and collecting 1.5M+ human votes for LLM Elo rankings.
  • Supports a wide range of LLMs including Vicuna, Llama 2, Falcon, Mistral, and API-based models (OpenAI, Anthropic, Gemini).
  • Offers OpenAI-compatible RESTful APIs for easy integration.
  • Includes MT-Bench for multi-turn evaluation and LMSYS-Chat-1M dataset.

Maintenance & Community

  • Actively developed by LMSYS Org.
  • Community support via Discord.
  • X handle: @lmsysorg

Licensing & Compatibility

  • Code is typically under Apache 2.0. Model weights (e.g., Vicuna) are subject to their base model licenses (e.g., Llama 2 license).
  • Commercial use of model weights depends on their respective licenses.

Limitations & Caveats

  • 8-bit quantization may slightly degrade model quality. CPU offloading is Linux-only and requires bitsandbytes.
  • Performance can vary significantly based on hardware and chosen inference backend (e.g., vLLM integration for higher throughput).
Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
3
Star History
161 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

JittorLLMs by Jittor

0.0%
2k
Low-resource LLM inference library
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.