llm_from_scratch by vivekkalyanarangan30

Building Large Language Models from scratch with PyTorch

Created 5 months ago

809 stars

Top 43.7% on SourcePulse

Project Summary

This repository offers a comprehensive, hands-on curriculum for building Large Language Models (LLMs) from scratch using PyTorch. It targets engineers and researchers seeking a deep understanding of LLM architecture, training, and fine-tuning processes, providing a structured path from foundational concepts to advanced techniques like Reinforcement Learning from Human Feedback (RLHF).

How It Works

The project is structured as a modular, step-by-step curriculum covering nine parts. It begins with core Transformer architecture components (attention, embeddings, LayerNorm) and progresses through training a basic LLM, modernizing the architecture with techniques like RMSNorm and RoPE, scaling strategies, Mixture-of-Experts (MoE), supervised fine-tuning (SFT), and finally, advanced alignment methods including PPO and GRPO for RLHF. This pedagogical approach emphasizes building components manually before integrating them, facilitating a thorough grasp of internal mechanics.

Quick Start & Requirements

Installation:

conda create -n llm_from_scratch python=3.11
conda activate llm_from_scratch
pip install -r requirements.txt

Prerequisites: Python 3.11, PyTorch, CUDA (for GPU acceleration, implied). Mixed precision is discussed.
Links: No specific demo or quick-start links provided beyond the setup commands.

Highlighted Details

Covers foundational Transformer blocks (self-attention, multi-head attention, feed-forward networks) implemented from first principles.
Explores modern architectural improvements: RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU activations, and KV caching for efficient inference.
Details advanced training and alignment techniques: Mixture-of-Experts (MoE), Supervised Fine-Tuning (SFT), PPO, and GRPO for RLHF.
Includes practical aspects like byte-level and BPE tokenization, gradient accumulation, mixed precision, and learning rate scheduling.

Maintenance & Community

No specific information regarding maintainers, community channels (like Discord/Slack), or project roadmap is present in the provided README.

Licensing & Compatibility

The README does not specify a software license. Therefore, licensing terms and compatibility for commercial or closed-source use are undetermined.

Limitations & Caveats

This project is presented as a curriculum for learning and understanding LLM construction ("from scratch"), rather than a production-ready framework. The absence of a specified license poses a significant adoption blocker for many use cases. Details on hardware requirements beyond the need for potential GPU acceleration (CUDA) are not explicitly stated.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

34 stars in the last 30 days