llm-foundry  by mosaicml

LLM training code for Databricks foundation models

created 2 years ago
4,297 stars

Top 11.6% on sourcepulse

GitHubView on GitHub
Project Summary

LLM Foundry provides a comprehensive toolkit for training, fine-tuning, evaluating, and deploying Large Language Models (LLMs) using the Composer library and the MosaicML platform. It's designed for researchers and engineers looking to rapidly experiment with LLM techniques, offering support for models ranging from 125M to 70B parameters, including the state-of-the-art DBRX and MPT series.

How It Works

The codebase is structured around modular scripts for data preparation, training, inference, and evaluation. It leverages Composer for efficient distributed training and integrates features like Flash Attention and ALiBi for performance and extended context lengths. The architecture supports customization through a registry system, allowing users to register new models, loggers, and callbacks without forking the repository.

Quick Start & Requirements

  • Installation: Recommended via Docker. For local setup: git clone the repo, cd llm-foundry, and pip install -e ".[gpu]".
  • Prerequisites: PyTorch (tested with 2.6.0), CUDA 12.4 (for NVIDIA A100/H100), CMake, packaging. Experimental support for AMD GPUs (ROCm 5.4.2) and Intel Gaudi (habana_alpha branch).
  • Resources: Requires significant GPU resources for training LLMs. Docker images are available on Docker Hub.
  • Docs: TUTORIAL.md provides detailed workflows and FAQs.

Highlighted Details

  • Supports training and inference for models up to 70B parameters, including DBRX (132B total, 36B active) and MPT series.
  • Features like Flash Attention and ALiBi enhance efficiency and context length extrapolation.
  • Includes benchmarking scripts for both training throughput and inference latency.
  • Offers a registry system for extending functionality without forking.

Maintenance & Community

The project is actively maintained by Databricks Mosaic. Community support is available via GitHub issues. Contact demo@mosaicml.com for MosaicML platform inquiries.

Licensing & Compatibility

Model weights and code are licensed under the Databricks Open Source License, permitting both research and commercial use. Some MPT models have specific commercial use restrictions (e.g., MPT-30B-Chat, MPT-7B-8k-Chat).

Limitations & Caveats

Experimental support for AMD GPUs may require package version adjustments. Intel Gaudi support is also experimental and requires a specific branch. The README notes that non-Docker setups are not recommended.

Health Check
Last commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
15
Issues (30d)
0
Star History
76 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 13 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 17 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.