Fast-LLM  by ServiceNow

Accelerating LLM training with PyTorch and Triton

Created 1 year ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Fast-LLM accelerates large language model training, targeting AI teams seeking enhanced speed, scalability, and flexibility. Developed by ServiceNow Research, it offers significant cost and time savings by optimizing training processes for models of all sizes, distinguishing itself from similarly named projects through its focus on LLM training efficiency.

How It Works

Fast-LLM employs a PyTorch and Triton-based architecture featuring fine-tuned kernels and advanced memory management for optimal performance. Its core approach integrates 3D parallelism (Data, Tensor, Pipeline) with sequence length parallelism, supported by ZeRO-1/2/3 implementations and mixed-precision training. This combination enables high throughput and efficient scaling across distributed multi-GPU and multi-node environments, reducing training time and resource consumption.

Quick Start & Requirements

  • Installation: Pre-built Docker images are available. Installation via pip: pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]".
  • Prerequisites: Requires a Slurm or Kubernetes cluster with multiple DGX nodes (e.g., 4 nodes, 8 A100/H100 GPUs each). CUDA 12.1+, PyTorch, Triton, and Apex are necessary dependencies. Kubernetes requires KubeFlow and unlimited locked memory limits.
  • Setup: Configuration is managed via YAML files. Example configurations for multi-node setups are provided.
  • Resources: Documentation and practical tutorials are in progress.

Highlighted Details

  • Achieves high throughput, with an expected 9,800 tokens/s/H100 for Mistral-7B training (batch size 32, sequence length 8k) on a 4-node cluster.
  • Supports advanced parallelism techniques (3D, sequence length) and ZeRO-3 for efficient distributed training.
  • Features an efficient dropless Mixture-of-Experts (MoE) implementation with state-of-the-art performance.
  • Offers seamless integration with Hugging Face Transformers and a user-friendly YAML configuration system.

Maintenance & Community

Developed transparently on GitHub by ServiceNow Research, the project welcomes contributions and collaboration. A public roadmap and issue tracking are maintained.

Licensing & Compatibility

Licensed under the Apache 2.0 License, Fast-LLM permits broad use, modification, and distribution, including for commercial purposes, without copyleft restrictions.

Limitations & Caveats

Certain features, such as customizable language model architectures, data loaders, loss functions, and optimizers, are noted as "in progress." Practical tutorials are also under development.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
28
Issues (30d)
4
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 3 years ago
Updated 3 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
407
Lightweight training framework for model pre-training
Created 1 year ago
Updated 1 month ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.