llm-foundry  by mosaicml

LLM training code for Databricks foundation models

Created 2 years ago
4,321 stars

Top 11.3% on SourcePulse

GitHubView on GitHub
Project Summary

LLM Foundry provides a comprehensive toolkit for training, fine-tuning, evaluating, and deploying Large Language Models (LLMs) using the Composer library and the MosaicML platform. It's designed for researchers and engineers looking to rapidly experiment with LLM techniques, offering support for models ranging from 125M to 70B parameters, including the state-of-the-art DBRX and MPT series.

How It Works

The codebase is structured around modular scripts for data preparation, training, inference, and evaluation. It leverages Composer for efficient distributed training and integrates features like Flash Attention and ALiBi for performance and extended context lengths. The architecture supports customization through a registry system, allowing users to register new models, loggers, and callbacks without forking the repository.

Quick Start & Requirements

  • Installation: Recommended via Docker. For local setup: git clone the repo, cd llm-foundry, and pip install -e ".[gpu]".
  • Prerequisites: PyTorch (tested with 2.6.0), CUDA 12.4 (for NVIDIA A100/H100), CMake, packaging. Experimental support for AMD GPUs (ROCm 5.4.2) and Intel Gaudi (habana_alpha branch).
  • Resources: Requires significant GPU resources for training LLMs. Docker images are available on Docker Hub.
  • Docs: TUTORIAL.md provides detailed workflows and FAQs.

Highlighted Details

  • Supports training and inference for models up to 70B parameters, including DBRX (132B total, 36B active) and MPT series.
  • Features like Flash Attention and ALiBi enhance efficiency and context length extrapolation.
  • Includes benchmarking scripts for both training throughput and inference latency.
  • Offers a registry system for extending functionality without forking.

Maintenance & Community

The project is actively maintained by Databricks Mosaic. Community support is available via GitHub issues. Contact demo@mosaicml.com for MosaicML platform inquiries.

Licensing & Compatibility

Model weights and code are licensed under the Databricks Open Source License, permitting both research and commercial use. Some MPT models have specific commercial use restrictions (e.g., MPT-30B-Chat, MPT-7B-8k-Chat).

Limitations & Caveats

Experimental support for AMD GPUs may require package version adjustments. Intel Gaudi support is also experimental and requires a specific branch. The README notes that non-Docker setups are not recommended.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
25 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Feedback? Help us improve.