minimal-llama  by zphang

Code for running and fine-tuning LLaMA models

created 2 years ago
458 stars

Top 67.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a minimal set of tools for running and fine-tuning LLaMA models, targeting researchers and practitioners who need to adapt large language models. It offers methods for data preparation and efficient fine-tuning techniques, aiming to simplify the process of customizing LLaMA for specific tasks.

How It Works

The project focuses on efficient fine-tuning strategies, including 8-bit quantization and Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA. It also introduces a naive implementation of pipeline parallelism to enable training on larger models that exceed single-GPU memory capacity. Data is pre-tokenized into fixed-length chunks for consistent processing.

Quick Start & Requirements

  • Tokenization: python tokenize_dataset.py --tokenizer_path <path> --jsonl_path <path> --save_path <path> --max_seq_length 512
  • PEFT Fine-tuning: python finetune_peft.py --model_path <path> --dataset_path <path> --peft_mode lora --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 2500 --learning_rate 2e-4 --fp16 --logging_steps 10 --output_dir <path>
  • Prerequisites: Requires specific Transformers and PEFT library versions (links provided in README), LLaMA model weights in HF format, and Python. 8-bit fine-tuning uses ~20GB VRAM for max_seq_length=512, bs=2.

Highlighted Details

  • PEFT fine-tuning with 8-bit quantization reduces memory usage significantly.
  • Naive pipeline parallelism is implemented for training larger models across multiple GPUs.
  • Supports LoRA for efficient parameter-efficient fine-tuning.
  • Includes a script for tokenizing datasets into fixed-length sequences.

Maintenance & Community

The repository is a personal project with feedback welcomed. Specific contributors, sponsorships, or community channels are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The code is described as "fairly quickly thrown together" and may contain bugs. The PEFT fine-tuning with pipeline parallelism is noted as "buggy, don't use this yet." Hyperparameter tuning advice is minimal, and the impact of max_sequence_length on performance is unknown.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

llm-analysis by cli99

0.2%
441
CLI tool for LLM latency/memory analysis during training/inference
created 2 years ago
updated 3 months ago
Feedback? Help us improve.