llm_from_scratch  by vivekkalyanarangan30

Building Large Language Models from scratch with PyTorch

Created 2 months ago
730 stars

Top 47.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository offers a comprehensive, hands-on curriculum for building Large Language Models (LLMs) from scratch using PyTorch. It targets engineers and researchers seeking a deep understanding of LLM architecture, training, and fine-tuning processes, providing a structured path from foundational concepts to advanced techniques like Reinforcement Learning from Human Feedback (RLHF).

How It Works

The project is structured as a modular, step-by-step curriculum covering nine parts. It begins with core Transformer architecture components (attention, embeddings, LayerNorm) and progresses through training a basic LLM, modernizing the architecture with techniques like RMSNorm and RoPE, scaling strategies, Mixture-of-Experts (MoE), supervised fine-tuning (SFT), and finally, advanced alignment methods including PPO and GRPO for RLHF. This pedagogical approach emphasizes building components manually before integrating them, facilitating a thorough grasp of internal mechanics.

Quick Start & Requirements

  • Installation:
    conda create -n llm_from_scratch python=3.11
    conda activate llm_from_scratch
    pip install -r requirements.txt
    
  • Prerequisites: Python 3.11, PyTorch, CUDA (for GPU acceleration, implied). Mixed precision is discussed.
  • Links: No specific demo or quick-start links provided beyond the setup commands.

Highlighted Details

  • Covers foundational Transformer blocks (self-attention, multi-head attention, feed-forward networks) implemented from first principles.
  • Explores modern architectural improvements: RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU activations, and KV caching for efficient inference.
  • Details advanced training and alignment techniques: Mixture-of-Experts (MoE), Supervised Fine-Tuning (SFT), PPO, and GRPO for RLHF.
  • Includes practical aspects like byte-level and BPE tokenization, gradient accumulation, mixed precision, and learning rate scheduling.

Maintenance & Community

No specific information regarding maintainers, community channels (like Discord/Slack), or project roadmap is present in the provided README.

Licensing & Compatibility

The README does not specify a software license. Therefore, licensing terms and compatibility for commercial or closed-source use are undetermined.

Limitations & Caveats

This project is presented as a curriculum for learning and understanding LLM construction ("from scratch"), rather than a production-ready framework. The absence of a specified license poses a significant adoption blocker for many use cases. Details on hardware requirements beyond the need for potential GPU acceleration (CUDA) are not explicitly stated.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
148 stars in the last 30 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), and
43 more.

spaCy by explosion

0.1%
33k
NLP library for production applications
Created 11 years ago
Updated 1 week ago
Feedback? Help us improve.