FastChat  by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

Created 3 years ago
39,444 stars

Top 0.8% on SourcePulse

GitHubView on GitHub
Project Summary

FastChat provides an open platform for training, serving, and evaluating large language model (LLM) based chatbots. It is the engine behind Chatbot Arena, a popular platform for comparing LLM performance, and offers tools for researchers and developers to deploy and benchmark their own models.

How It Works

FastChat employs a distributed architecture for serving LLMs, comprising a controller, model workers, and a web server. This design allows for scalable deployment of multiple models and provides OpenAI-compatible RESTful APIs for seamless integration. It supports various inference backends and quantization methods for efficient deployment.

Quick Start & Requirements

  • Install: pip3 install "fschat[model_worker,webui]" or from source.
  • Prerequisites: Python 3.x, PyTorch. GPU with CUDA is recommended for performance. Specific models may require transformers>=4.31 for 16K context.
  • Resources: Vicuna-7B requires ~14GB GPU VRAM; Vicuna-13B requires ~28GB. 8-bit quantization reduces memory by ~50%.
  • Docs: FastChat, Demo, Chatbot Arena

Highlighted Details

  • Powers Chatbot Arena, serving 10M+ requests and collecting 1.5M+ human votes for LLM Elo rankings.
  • Supports a wide range of LLMs including Vicuna, Llama 2, Falcon, Mistral, and API-based models (OpenAI, Anthropic, Gemini).
  • Offers OpenAI-compatible RESTful APIs for easy integration.
  • Includes MT-Bench for multi-turn evaluation and LMSYS-Chat-1M dataset.

Maintenance & Community

  • Actively developed by LMSYS Org.
  • Community support via Discord.
  • X handle: @lmsysorg

Licensing & Compatibility

  • Code is typically under Apache 2.0. Model weights (e.g., Vicuna) are subject to their base model licenses (e.g., Llama 2 license).
  • Commercial use of model weights depends on their respective licenses.

Limitations & Caveats

  • 8-bit quantization may slightly degrade model quality. CPU offloading is Linux-only and requires bitsandbytes.
  • Performance can vary significantly based on hardware and chosen inference backend (e.g., vLLM integration for higher throughput).
Health Check
Last Commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
33
Issues (30d)
10
Star History
114 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

JittorLLMs by Jittor

0%
2k
Low-resource LLM inference library
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory).

AstrBot by AstrBotDevs

2.2%
30k
LLM chatbot/framework for multiple platforms
Created 3 years ago
Updated 2 hours ago
Feedback? Help us improve.