Fast-LLM by ServiceNow

Accelerating LLM training with PyTorch and Triton

Created 1 year ago

262 stars

Top 97.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

Fast-LLM accelerates large language model training, targeting AI teams seeking enhanced speed, scalability, and flexibility. Developed by ServiceNow Research, it offers significant cost and time savings by optimizing training processes for models of all sizes, distinguishing itself from similarly named projects through its focus on LLM training efficiency.

How It Works

Fast-LLM employs a PyTorch and Triton-based architecture featuring fine-tuned kernels and advanced memory management for optimal performance. Its core approach integrates 3D parallelism (Data, Tensor, Pipeline) with sequence length parallelism, supported by ZeRO-1/2/3 implementations and mixed-precision training. This combination enables high throughput and efficient scaling across distributed multi-GPU and multi-node environments, reducing training time and resource consumption.

Quick Start & Requirements

Installation: Pre-built Docker images are available. Installation via pip: pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]".
Prerequisites: Requires a Slurm or Kubernetes cluster with multiple DGX nodes (e.g., 4 nodes, 8 A100/H100 GPUs each). CUDA 12.1+, PyTorch, Triton, and Apex are necessary dependencies. Kubernetes requires KubeFlow and unlimited locked memory limits.
Setup: Configuration is managed via YAML files. Example configurations for multi-node setups are provided.
Resources: Documentation and practical tutorials are in progress.

Highlighted Details

Achieves high throughput, with an expected 9,800 tokens/s/H100 for Mistral-7B training (batch size 32, sequence length 8k) on a 4-node cluster.
Supports advanced parallelism techniques (3D, sequence length) and ZeRO-3 for efficient distributed training.
Features an efficient dropless Mixture-of-Experts (MoE) implementation with state-of-the-art performance.
Offers seamless integration with Hugging Face Transformers and a user-friendly YAML configuration system.

Maintenance & Community

Developed transparently on GitHub by ServiceNow Research, the project welcomes contributions and collaboration. A public roadmap and issue tracking are maintained.

Licensing & Compatibility

Licensed under the Apache 2.0 License, Fast-LLM permits broad use, modification, and distribution, including for commercial purposes, without copyleft restrictions.

Limitations & Caveats

Certain features, such as customizable language model architectures, data loaders, loss functions, and optimizers, are noted as "in progress." Practical tutorials are also under development.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days