EasyLM  by young-geng

LLM training/finetuning framework in JAX/Flax

Created 2 years ago
2,499 stars

Top 18.6% on SourcePulse

GitHubView on GitHub
Project Summary

EasyLM provides a streamlined, JAX/Flax-based framework for pre-training, fine-tuning, evaluating, and serving large language models (LLMs). It targets researchers and practitioners needing to scale LLM training across hundreds of accelerators, leveraging JAX's pjit for efficient model and data sharding.

How It Works

EasyLM utilizes JAX's pjit to distribute model weights and training data across multiple accelerators (TPUs/GPUs), enabling the training of models that exceed single-device memory. This approach allows for seamless scaling from single-host multi-accelerator setups to multi-host Google Cloud TPU Pods, simplifying distributed training complexity.

Quick Start & Requirements

  • Install via Anaconda for GPU hosts (conda env create -f scripts/gpu_environment.yml) or a setup script for Cloud TPU hosts (./scripts/tpu_vm_setup.sh).
  • Requires Python and JAX/Flax. GPU installation requires specific CUDA versions managed by the provided environment file.
  • Documentation is available in the docs directory.

Highlighted Details

  • Supports LLaMA, LLaMA 2, and LLaMA 3 models.
  • Built upon Hugging Face's transformers and datasets.
  • Enables training of models like OpenLLaMA and Koala.
  • Scales training to hundreds of TPU/GPU accelerators.

Maintenance & Community

  • An unofficial Discord server is available for discussions on JAX-based LLM frameworks, including EasyLM.
  • The project is primarily authored by Xinyang Geng.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. However, it references LLaMA, which has specific usage terms, and OpenLLaMA, which is permissively licensed for commercial use. Compatibility with commercial or closed-source projects depends on the underlying model licenses used.

Limitations & Caveats

The README does not specify a license for the EasyLM codebase itself, which may create ambiguity for commercial use. The framework's primary focus on JAX/Flax means users unfamiliar with this ecosystem may face a steeper learning curve.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

jaxformer by salesforce

0.7%
301
JAX library for LLM training on TPUs
Created 3 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

4.8%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
17 more.

open_llama by openlm-research

0.1%
8k
Open-source reproduction of LLaMA models
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.