llama2-burn by Gadersd

Llama2 port to Rust's Burn framework

Created 2 years ago

278 stars

Top 93.5% on SourcePulse

Project Summary

This project provides a port of Meta's Llama2 large language model to the Rust-based Burn deep learning framework. It enables Rust developers to leverage Llama2's capabilities by converting and loading model weights, facilitating inference and experimentation within the Rust ecosystem.

How It Works

The project utilizes Python scripts to load the original Llama2 model weights and tokenizer, then dumps them into a format suitable for conversion. Rust binaries then take these dumped weights, convert them into Burn's internal model format, and provide functionalities for testing inference and generating text samples. This two-stage process (Python for initial loading/dumping, Rust for conversion/inference) bridges the gap between PyTorch-based models and the Burn framework.

Quick Start & Requirements

Install/Run: Requires Rust toolchain. Python scripts are executed via python3. Rust binaries via cargo run.
Prerequisites: Llama2 model files (downloaded from Meta or Hugging Face), tokenizer.model file.
GPU: Optional, requires TORCH_CUDA_VERSION environment variable set (e.g., export TORCH_CUDA_VERSION=cu113).
Resources: CPU RAM is critical for loading and converting weights.
Docs: Usage examples provided in README.

Highlighted Details

Port of Llama2 to the Rust Burn framework.
Python scripts for initial model loading, testing, and weight dumping.
Rust binaries for weight conversion, inference testing, and text sampling.
Supports CPU and GPU inference.

Maintenance & Community

Open to contributions via pull requests.
No specific community channels or notable contributors mentioned.

Licensing & Compatibility

Licensed under the terms specified in the LICENSE file (likely MIT or Apache 2.0 based on typical Rust projects, but requires checking the file).
Compatibility for commercial use depends on the underlying Llama2 license and the project's specific license.

Limitations & Caveats

Weight conversion and loading are CPU-bound and can be resource-intensive, requiring significant RAM. The project appears to be a direct port, and performance benchmarks or advanced features may not be fully optimized or documented.

llama2-burn by Gadersd

Explore Similar Projects

stable-diffusion-burn by Gadersd

llama-nuts-and-bolts by adalkiran

point-alpaca by pointnetwork

rllama by Noeda

openvino2tensorflow by PINTO0309

llguidance by guidance-ai

LLaMA_MPS by jankais3r

rust-llama.cpp by mdrokz

llm_qlora by georgesung

llama2.rs by srush

rust-bert by guillaume-be

alpaca-lora by tloen