llm-foundry by mosaicml

LLM training code for Databricks foundation models

Created 2 years ago

4,373 stars

Top 11.1% on SourcePulse

View on GitHub

19 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Vincent Weisser

Cofounder of Prime Intellect

Jeff Hammerbacher

Cofounder of Cloudera

Aman Sanger

Cofounder of Cursor

and 15 more!

Project Summary

LLM Foundry provides a comprehensive toolkit for training, fine-tuning, evaluating, and deploying Large Language Models (LLMs) using the Composer library and the MosaicML platform. It's designed for researchers and engineers looking to rapidly experiment with LLM techniques, offering support for models ranging from 125M to 70B parameters, including the state-of-the-art DBRX and MPT series.

How It Works

The codebase is structured around modular scripts for data preparation, training, inference, and evaluation. It leverages Composer for efficient distributed training and integrates features like Flash Attention and ALiBi for performance and extended context lengths. The architecture supports customization through a registry system, allowing users to register new models, loggers, and callbacks without forking the repository.

Quick Start & Requirements

Installation: Recommended via Docker. For local setup: git clone the repo, cd llm-foundry, and pip install -e ".[gpu]".
Prerequisites: PyTorch (tested with 2.6.0), CUDA 12.4 (for NVIDIA A100/H100), CMake, packaging. Experimental support for AMD GPUs (ROCm 5.4.2) and Intel Gaudi (habana_alpha branch).
Resources: Requires significant GPU resources for training LLMs. Docker images are available on Docker Hub.
Docs: TUTORIAL.md provides detailed workflows and FAQs.

Highlighted Details

Supports training and inference for models up to 70B parameters, including DBRX (132B total, 36B active) and MPT series.
Features like Flash Attention and ALiBi enhance efficiency and context length extrapolation.
Includes benchmarking scripts for both training throughput and inference latency.
Offers a registry system for extending functionality without forking.

Maintenance & Community

The project is actively maintained by Databricks Mosaic. Community support is available via GitHub issues. Contact demo@mosaicml.com for MosaicML platform inquiries.

Licensing & Compatibility

Model weights and code are licensed under the Databricks Open Source License, permitting both research and commercial use. Some MPT models have specific commercial use restrictions (e.g., MPT-30B-Chat, MPT-7B-8k-Chat).

Limitations & Caveats

Experimental support for AMD GPUs may require package version adjustments. Intel Gaudi support is also experimental and requires a specific branch. The README notes that non-Docker setups are not recommended.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days