levanter  by stanford-crfm

Framework for training foundation models with JAX

created 3 years ago
628 stars

Top 53.5% on sourcepulse

GitHubView on GitHub
Project Summary

Levanter is a JAX-based framework for training large foundation models, prioritizing legibility, scalability, and reproducibility. It targets researchers and engineers building and experimenting with LLMs, offering a high-performance, deterministic training environment.

How It Works

Levanter leverages JAX for its high-performance, auto-vectorizing, and JIT-compiling capabilities. It utilizes the named tensor library Haliax to enable composable and readable deep learning code, abstracting away complex tensor manipulations. This approach facilitates distributed training across GPUs and TPUs, supporting techniques like Fully Sharded Data Parallelism (FSDP) and tensor parallelism.

Quick Start & Requirements

  • Install: pip install levanter or pip install -e . after cloning the repository.
  • Prerequisites: JAX with appropriate configuration for your platform (GPU/TPU). CUDA support is in progress.
  • Example: python -m levanter.main.train_lm --config_path config/gpt2_nano.yaml
  • Docs: levanter.readthedocs.io, haliax.readthedocs.io

Highlighted Details

  • Supports distributed training on TPUs and GPUs with FSDP and tensor parallelism.
  • Compatible with Hugging Face ecosystem for model and tokenizer import/export via SafeTensors.
  • Offers bitwise deterministic training on TPUs for reproducibility.
  • Includes the Sophia optimizer for potential 2x speedup over Adam.

Maintenance & Community

  • Developed by Stanford's Center for Research on Foundation Models (CRFM).
  • Community channel: #levanter on the unofficial Jax LLM Discord.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0. Permissive for commercial use and closed-source linking.

Limitations & Caveats

GPU support is still in progress. Resuming training on a different number of hosts currently breaks reproducibility.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
96
Issues (30d)
2
Star History
65 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.