llama3-from-scratch-zh  by wdndev

Chinese Llama3 implementation from scratch

Created 1 year ago
963 stars

Top 38.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step implementation of the Llama 3 model from scratch, focusing on explaining the underlying tensor and matrix operations. It's designed for researchers and engineers who want to understand the internal workings of large language models by dissecting and rebuilding a core component. The project allows users to load Llama 3 model weights and trace the data flow through its transformer layers.

How It Works

The project meticulously reconstructs the Llama 3 architecture, including tokenization, embedding, RMS normalization, multi-head attention (with RoPE positional encoding), feed-forward networks, and the final output layer. It loads pre-trained weights from Meta's Llama 3 model and demonstrates how to perform forward passes by manually implementing each mathematical operation using PyTorch. The implementation highlights key Llama 3 features like grouped-query attention and SwiGLU activation.

Quick Start & Requirements

Highlighted Details

  • Detailed breakdown of RoPE (Rotary Positional Embeddings) implementation.
  • Step-by-step matrix operations for attention and feed-forward layers.
  • Visualization of attention scores and positional encoding.
  • Demonstrates how to load and process Llama 3 weights.
  • Explains weight sharing in grouped-query attention.

Maintenance & Community

The repository is a translation and adaptation of Andrej Karpathy's llama3-from-scratch project. The primary contributor is wdndev. Community interaction is likely through GitHub issues and stars.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it is based on Andrej Karpathy's work, which is typically permissive (e.g., MIT). Compatibility with commercial or closed-source projects would depend on the original project's license and the terms of use for the Llama 3 model weights.

Limitations & Caveats

The project uses a modified Llama 3 8B model with only the first two layers to reduce memory requirements, which means the output will not be accurate for full inference. The focus is purely educational, demonstrating the mechanics rather than providing a functional, full-scale model. The original Llama 3 weights are large (15GB), and the modified version is ~2.7GB.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
17 more.

open_llama by openlm-research

0.1%
8k
Open-source reproduction of LLaMA models
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.