Building-a-Small-LLM-from-Scratch by KaihuaTang

Tutorial for building LLMs from scratch using PyTorch

Created 11 months ago

377 stars

Top 75.5% on SourcePulse

Project Summary

This repository provides a tutorial series for building a small Large Language Model (LLM) from scratch using PyTorch. It targets individuals with basic Python, PyTorch, and deep learning knowledge, aiming to demystify LLM components and training processes without relying on pre-existing libraries. The benefit is a hands-on understanding of LLM architecture and implementation.

How It Works

The tutorial breaks down LLM construction into fundamental components, implemented purely in PyTorch. It covers core elements like attention mechanisms, feed-forward networks, normalization layers, and tokenizers. The approach emphasizes building from first principles, allowing learners to grasp the underlying mechanics and practical challenges encountered in industry projects.

Quick Start & Requirements

Install: Python, PyTorch, NumPy.
Prerequisites: Basic deep learning background.
Resources: Designed for local execution on personal machines.
Links: 知乎链接, B站链接, Substack (under construction).

Highlighted Details

Focuses on building a small LLM from scratch, without external LLM libraries.
Covers practical industry challenges like ONNX conversion for dynamic to static shapes, optimizing inference speed, and handling FP16 numerical stability.
Explores specific model architectures and techniques such as DeepSeekV3's attention optimizations and LoRA fine-tuning.
Planned chapters include text pre-training, dialogue fine-tuning, and potentially multimodal models and multi-GPU deployment.

Maintenance & Community

The project is authored by Kaihua Tang, with Huaizheng Zhang also listed as an author. Community engagement channels are under construction (Bilibili, Substack).

Licensing & Compatibility

Content (text, images, code) is available for non-profit personal use and sharing. Commercial use, including paid courses or content platforms, requires explicit author approval.

Limitations & Caveats

The tutorial is a work in progress, with several planned chapters marked as "to be updated." Some advanced topics like multimodal networks and tensor parallelism are tentative.

Building-a-Small-LLM-from-Scratch by KaihuaTang

Explore Similar Projects

cobra by h-zhao1997

pytorch-tutorials by niconielsen32

Telechat by Tele-AI

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing by ghimiresunil

bert4torch by Tongjilibo

translate by pytorch

Bert-Multi-Label-Text-Classification by lonePatient

DLTFpT by jonkrohn

PyTorch-Tutorial-2nd by TingsongYu

Deep-Learning-Experiments by roatienza

PyTorch_Tutorial by TingsongYu

transformers by huggingface