llama3-from-scratch-zh by wdndev

Chinese Llama3 implementation from scratch

Created 1 year ago

1,002 stars

Top 37.1% on SourcePulse

Project Summary

This repository provides a step-by-step implementation of the Llama 3 model from scratch, focusing on explaining the underlying tensor and matrix operations. It's designed for researchers and engineers who want to understand the internal workings of large language models by dissecting and rebuilding a core component. The project allows users to load Llama 3 model weights and trace the data flow through its transformer layers.

How It Works

The project meticulously reconstructs the Llama 3 architecture, including tokenization, embedding, RMS normalization, multi-head attention (with RoPE positional encoding), feed-forward networks, and the final output layer. It loads pre-trained weights from Meta's Llama 3 model and demonstrates how to perform forward passes by manually implementing each mathematical operation using PyTorch. The implementation highlights key Llama 3 features like grouped-query attention and SwiGLU activation.

Quick Start & Requirements

Install: Requires Python and PyTorch.
Prerequisites:
- PyTorch (tested with torch.bfloat16)
- tiktoken library
- matplotlib for visualizations
- Download Llama 3 model weights (e.g., Meta-Llama-3-8B-Instruct) from Meta or ModelScope.
Setup: Clone the repository and download model weights. The provided Colab notebooks offer a ready-to-run environment.
Links:
- Hugging Face: https://huggingface.co/wdndev/Meta-Llama-3-8B-Instruct-2layers
- ModelScope: https://www.modelscope.cn/models/wdndev/Meta-Llama-3-8B-Instruct-2layers
- English Colab: https://colab.research.google.com/drive/1X9yEa4hAZzgrwTuxHValBoVt1qfx6AXv?usp=sharing
- Chinese Colab: https://colab.research.google.com/drive/11MQb8Bn4Ck707VEcqqGVdytqOk3OrQQK?usp=sharing

Highlighted Details

Detailed breakdown of RoPE (Rotary Positional Embeddings) implementation.
Step-by-step matrix operations for attention and feed-forward layers.
Visualization of attention scores and positional encoding.
Demonstrates how to load and process Llama 3 weights.
Explains weight sharing in grouped-query attention.

Maintenance & Community

The repository is a translation and adaptation of Andrej Karpathy's llama3-from-scratch project. The primary contributor is wdndev. Community interaction is likely through GitHub issues and stars.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it is based on Andrej Karpathy's work, which is typically permissive (e.g., MIT). Compatibility with commercial or closed-source projects would depend on the original project's license and the terms of use for the Llama 3 model weights.

Limitations & Caveats

The project uses a modified Llama 3 8B model with only the first two layers to reduce memory requirements, which means the output will not be accurate for full inference. The focus is purely educational, demonstrating the mechanics rather than providing a functional, full-scale model. The original Llama 3 weights are large (15GB), and the modified version is ~2.7GB.

llama3-from-scratch-zh by wdndev

Explore Similar Projects

llama-nuts-and-bolts by adalkiran

Deepdive-llama3-from-scratch by therealoliver

llm_qlora by georgesung

llama-from-scratch by bkitano

llama by ypeleg

Chinese-alpaca-lora by LC1332

llama3.java by mukel

cgft-llm by echonoshy

open_llama by openlm-research

llama3-from-scratch by naklecha

Llama-Chinese by LlamaFamily

llama-models by meta-llama