EasyLM  by young-geng

LLM training/finetuning framework in JAX/Flax

created 2 years ago
2,491 stars

Top 19.2% on sourcepulse

GitHubView on GitHub
Project Summary

EasyLM provides a streamlined, JAX/Flax-based framework for pre-training, fine-tuning, evaluating, and serving large language models (LLMs). It targets researchers and practitioners needing to scale LLM training across hundreds of accelerators, leveraging JAX's pjit for efficient model and data sharding.

How It Works

EasyLM utilizes JAX's pjit to distribute model weights and training data across multiple accelerators (TPUs/GPUs), enabling the training of models that exceed single-device memory. This approach allows for seamless scaling from single-host multi-accelerator setups to multi-host Google Cloud TPU Pods, simplifying distributed training complexity.

Quick Start & Requirements

  • Install via Anaconda for GPU hosts (conda env create -f scripts/gpu_environment.yml) or a setup script for Cloud TPU hosts (./scripts/tpu_vm_setup.sh).
  • Requires Python and JAX/Flax. GPU installation requires specific CUDA versions managed by the provided environment file.
  • Documentation is available in the docs directory.

Highlighted Details

  • Supports LLaMA, LLaMA 2, and LLaMA 3 models.
  • Built upon Hugging Face's transformers and datasets.
  • Enables training of models like OpenLLaMA and Koala.
  • Scales training to hundreds of TPU/GPU accelerators.

Maintenance & Community

  • An unofficial Discord server is available for discussions on JAX-based LLM frameworks, including EasyLM.
  • The project is primarily authored by Xinyang Geng.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. However, it references LLaMA, which has specific usage terms, and OpenLLaMA, which is permissively licensed for commercial use. Compatibility with commercial or closed-source projects depends on the underlying model licenses used.

Limitations & Caveats

The README does not specify a license for the EasyLM codebase itself, which may create ambiguity for commercial use. The framework's primary focus on JAX/Flax means users unfamiliar with this ecosystem may face a steeper learning curve.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.