levanter  by marin-community

Framework for training foundation models with JAX

Created 3 years ago
663 stars

Top 50.6% on SourcePulse

GitHubView on GitHub
Project Summary

Levanter is a JAX-based framework for training large foundation models, prioritizing legibility, scalability, and reproducibility. It targets researchers and engineers building and experimenting with LLMs, offering a high-performance, deterministic training environment.

How It Works

Levanter leverages JAX for its high-performance, auto-vectorizing, and JIT-compiling capabilities. It utilizes the named tensor library Haliax to enable composable and readable deep learning code, abstracting away complex tensor manipulations. This approach facilitates distributed training across GPUs and TPUs, supporting techniques like Fully Sharded Data Parallelism (FSDP) and tensor parallelism.

Quick Start & Requirements

  • Install: pip install levanter or pip install -e . after cloning the repository.
  • Prerequisites: JAX with appropriate configuration for your platform (GPU/TPU). CUDA support is in progress.
  • Example: python -m levanter.main.train_lm --config_path config/gpt2_nano.yaml
  • Docs: levanter.readthedocs.io, haliax.readthedocs.io

Highlighted Details

  • Supports distributed training on TPUs and GPUs with FSDP and tensor parallelism.
  • Compatible with Hugging Face ecosystem for model and tokenizer import/export via SafeTensors.
  • Offers bitwise deterministic training on TPUs for reproducibility.
  • Includes the Sophia optimizer for potential 2x speedup over Adam.

Maintenance & Community

  • Developed by Stanford's Center for Research on Foundation Models (CRFM).
  • Community channel: #levanter on the unofficial Jax LLM Discord.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0. Permissive for commercial use and closed-source linking.

Limitations & Caveats

GPU support is still in progress. Resuming training on a different number of hosts currently breaks reproducibility.

Health Check
Last Commit

17 hours ago

Responsiveness

1 day

Pull Requests (30d)
35
Issues (30d)
3
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 19 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.