levanter by marin-community

Framework for training foundation models with JAX

Created 3 years ago

689 stars

Top 49.5% on SourcePulse

8 Experts Love This Project

pcmoritz

Cofounder of Anyscale

Jiayi-Pan

Author of SWE-Gym; MTS at xAI

thomwolf

Cofounder of Hugging Face

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 4 more!

Project Summary

Levanter is a JAX-based framework for training large foundation models, prioritizing legibility, scalability, and reproducibility. It targets researchers and engineers building and experimenting with LLMs, offering a high-performance, deterministic training environment.

How It Works

Levanter leverages JAX for its high-performance, auto-vectorizing, and JIT-compiling capabilities. It utilizes the named tensor library Haliax to enable composable and readable deep learning code, abstracting away complex tensor manipulations. This approach facilitates distributed training across GPUs and TPUs, supporting techniques like Fully Sharded Data Parallelism (FSDP) and tensor parallelism.

Quick Start & Requirements

Install: pip install levanter or pip install -e . after cloning the repository.
Prerequisites: JAX with appropriate configuration for your platform (GPU/TPU). CUDA support is in progress.
Example: python -m levanter.main.train_lm --config_path config/gpt2_nano.yaml
Docs: levanter.readthedocs.io, haliax.readthedocs.io

Highlighted Details

Supports distributed training on TPUs and GPUs with FSDP and tensor parallelism.
Compatible with Hugging Face ecosystem for model and tokenizer import/export via SafeTensors.
Offers bitwise deterministic training on TPUs for reproducibility.
Includes the Sophia optimizer for potential 2x speedup over Adam.

Maintenance & Community

Developed by Stanford's Center for Research on Foundation Models (CRFM).
Community channel: #levanter on the unofficial Jax LLM Discord.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. Permissive for commercial use and closed-source linking.

Limitations & Caveats

GPU support is still in progress. Resuming training on a different number of hosts currently breaks reproducibility.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

4 more.

veScale by volcengine

PyTorch-native framework for LLM training

Created 1 year ago

Updated 1 month ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

1 more.

libai by Oneflow-Inc

Large-scale distributed parallel training toolbox

Created 4 years ago

Updated 5 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind), and

2 more.

Megatron-LLM by epfLLM

Distributed trainer for LLMs

Created 2 years ago

Updated 1 year ago

Starred by

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM),

Zhiqiang Xie

Zhiqiang Xie(Coauthor of SGLang), and

2 more.

SpecForge by sgl-project

Train speculative decoding models for faster inference

Created 7 months ago

Updated 9 hours ago

Starred by

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow) and

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind).

simple_tensorflow_serving by tobegit3hub

Serving service for machine learning models

Created 8 years ago

Updated 9 months ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

15 more.

torchtitan by pytorch

PyTorch platform for generative AI model training research

Created 2 years ago

Updated 1 day ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Chaoyu Yang

Chaoyu Yang(Founder of Bento), and

2 more.

oneflow by Oneflow-Inc

Deep learning framework for user-friendly, scalable, efficient model development

Created 9 years ago

Updated 1 month ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

25 more.

gpt-neox by EleutherAI

Framework for training large-scale autoregressive language models

Created 5 years ago

Updated 1 month ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Amit Jain

Amit Jain(Cofounder of Luma AI), and

22 more.

Megatron-LM by NVIDIA

Framework for training transformer models at scale

Created 6 years ago

Updated 14 hours ago

Starred by

Albert Gu

Albert Gu(Cofounder of Cartesia; Professor at CMU),

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2), and

34 more.

pytorch-lightning by Lightning-AI

Deep learning framework for pretraining, finetuning, and deploying AI models

Created 6 years ago

Updated 3 days ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

28 more.

ColossalAI by hpcaitech

AI system for large-scale parallel training

Created 4 years ago

Updated 2 weeks ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Lilian Weng

Lilian Weng(Cofounder of Thinking Machines Lab), and

99 more.

transformers by huggingface

ML library for pretrained model inference and training

Created 7 years ago

Updated 1 day ago

Feedback? Help us improve.