minimal-llama by zphang

Code for running and fine-tuning LLaMA models

Created 2 years ago

457 stars

Top 66.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Wing Lian

Founder of Axolotl AI

Teknium

Cofounder of Nous Research

James Reed

Cofounder of Fireworks AI

Project Summary

This repository provides a minimal set of tools for running and fine-tuning LLaMA models, targeting researchers and practitioners who need to adapt large language models. It offers methods for data preparation and efficient fine-tuning techniques, aiming to simplify the process of customizing LLaMA for specific tasks.

How It Works

The project focuses on efficient fine-tuning strategies, including 8-bit quantization and Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA. It also introduces a naive implementation of pipeline parallelism to enable training on larger models that exceed single-GPU memory capacity. Data is pre-tokenized into fixed-length chunks for consistent processing.

Quick Start & Requirements

Tokenization: python tokenize_dataset.py --tokenizer_path <path> --jsonl_path <path> --save_path <path> --max_seq_length 512
PEFT Fine-tuning: python finetune_peft.py --model_path <path> --dataset_path <path> --peft_mode lora --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 2500 --learning_rate 2e-4 --fp16 --logging_steps 10 --output_dir <path>
Prerequisites: Requires specific Transformers and PEFT library versions (links provided in README), LLaMA model weights in HF format, and Python. 8-bit fine-tuning uses ~20GB VRAM for max_seq_length=512, bs=2.

Highlighted Details

PEFT fine-tuning with 8-bit quantization reduces memory usage significantly.
Naive pipeline parallelism is implemented for training larger models across multiple GPUs.
Supports LoRA for efficient parameter-efficient fine-tuning.
Includes a script for tokenizing datasets into fixed-length sequences.

Maintenance & Community

The repository is a personal project with feedback welcomed. Specific contributors, sponsorships, or community channels are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The code is described as "fairly quickly thrown together" and may contain bugs. The PEFT fine-tuning with pipeline parallelism is noted as "buggy, don't use this yet." Hyperparameter tuning advice is minimal, and the impact of max_sequence_length on performance is unknown.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days