llama3-from-scratch-zh  by wdndev

Chinese Llama3 implementation from scratch

created 1 year ago
920 stars

Top 40.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step implementation of the Llama 3 model from scratch, focusing on explaining the underlying tensor and matrix operations. It's designed for researchers and engineers who want to understand the internal workings of large language models by dissecting and rebuilding a core component. The project allows users to load Llama 3 model weights and trace the data flow through its transformer layers.

How It Works

The project meticulously reconstructs the Llama 3 architecture, including tokenization, embedding, RMS normalization, multi-head attention (with RoPE positional encoding), feed-forward networks, and the final output layer. It loads pre-trained weights from Meta's Llama 3 model and demonstrates how to perform forward passes by manually implementing each mathematical operation using PyTorch. The implementation highlights key Llama 3 features like grouped-query attention and SwiGLU activation.

Quick Start & Requirements

Highlighted Details

  • Detailed breakdown of RoPE (Rotary Positional Embeddings) implementation.
  • Step-by-step matrix operations for attention and feed-forward layers.
  • Visualization of attention scores and positional encoding.
  • Demonstrates how to load and process Llama 3 weights.
  • Explains weight sharing in grouped-query attention.

Maintenance & Community

The repository is a translation and adaptation of Andrej Karpathy's llama3-from-scratch project. The primary contributor is wdndev. Community interaction is likely through GitHub issues and stars.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it is based on Andrej Karpathy's work, which is typically permissive (e.g., MIT). Compatibility with commercial or closed-source projects would depend on the original project's license and the terms of use for the Llama 3 model weights.

Limitations & Caveats

The project uses a modified Llama 3 8B model with only the first two layers to reduce memory requirements, which means the output will not be accurate for full inference. The focus is purely educational, demonstrating the mechanics rather than providing a functional, full-scale model. The original Llama 3 weights are large (15GB), and the modified version is ~2.7GB.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
77 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.