lightron  by lwj2015

A lightweight, educational LLM distributed training framework

Created 3 weeks ago

New!

514 stars

Top 60.9% on SourcePulse

GitHubView on GitHub
Project Summary

Lightron is a lightweight, educational, and modern distributed training framework for Large Language Models (LLMs). It aims to bridge the gap between minimal research implementations and production-ready features, offering a clean and efficient platform for LLM development and study. The framework benefits researchers and students by providing access to advanced techniques in a streamlined package.

How It Works

Lightron employs a modern architectural design incorporating RMSNorm, SwiGLU activation, and Rotary Embeddings (RoPE). For enhanced efficiency, it leverages native PyTorch scaled_dot_product_attention, which is equivalent to FlashAttention-2. The framework provides first-class support for PyTorch Fully Sharded Data Parallel (FSDP), enabling robust distributed training capabilities. Its core codebase is kept concise, under 1000 lines, and features type-hinted, dataclass-based configuration for clarity and maintainability.

Quick Start & Requirements

Installation involves cloning the repository, navigating into the directory, and installing dependencies via pip install -r requirements.txt. The primary execution command for distributed training on 4 GPUs is torchrun --nproc_per_node=4 examples/train_llama.py. While specific hardware requirements like CUDA versions are not detailed, the example implies multi-GPU setups are intended. The official GitHub repository serves as the primary resource: https://github.com/lwj2015/lightron.

Highlighted Details

  • Modern LLM architecture components: RMSNorm, SwiGLU, Rotary Embeddings (RoPE).
  • Efficiency through native PyTorch scaled_dot_product_attention (FlashAttention-2 equivalent).
  • First-class support for PyTorch FSDP for distributed training.
  • Compatibility with Llama-3 architectures.
  • Clean, type-hinted, dataclass-based configuration with a core codebase under 1000 lines.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord or Slack), or project roadmap were provided in the README snippet.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

The README snippet does not detail any specific limitations, known bugs, alpha status, or unsupported platforms. The focus appears to be on core functionality for research and study.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
738 stars in the last 26 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

0.4%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), David Cournapeau David Cournapeau(Author of scikit-learn), and
1 more.

TorchLeet by Exorust

0.6%
2k
PyTorch interview practice platform
Created 1 year ago
Updated 5 months ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
7 more.

lingua by facebookresearch

0.0%
5k
LLM research codebase for training and inference
Created 1 year ago
Updated 5 months ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
Created 5 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Stefan van der Walt Stefan van der Walt(Core Contributor to scientific Python ecosystem), and
12 more.

litgpt by Lightning-AI

0.1%
13k
LLM SDK for pretraining, finetuning, and deploying 20+ high-performance LLMs
Created 2 years ago
Updated 3 days ago
Feedback? Help us improve.