minimal-llama  by zphang

Code for running and fine-tuning LLaMA models

Created 2 years ago
460 stars

Top 65.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a minimal set of tools for running and fine-tuning LLaMA models, targeting researchers and practitioners who need to adapt large language models. It offers methods for data preparation and efficient fine-tuning techniques, aiming to simplify the process of customizing LLaMA for specific tasks.

How It Works

The project focuses on efficient fine-tuning strategies, including 8-bit quantization and Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA. It also introduces a naive implementation of pipeline parallelism to enable training on larger models that exceed single-GPU memory capacity. Data is pre-tokenized into fixed-length chunks for consistent processing.

Quick Start & Requirements

  • Tokenization: python tokenize_dataset.py --tokenizer_path <path> --jsonl_path <path> --save_path <path> --max_seq_length 512
  • PEFT Fine-tuning: python finetune_peft.py --model_path <path> --dataset_path <path> --peft_mode lora --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 2500 --learning_rate 2e-4 --fp16 --logging_steps 10 --output_dir <path>
  • Prerequisites: Requires specific Transformers and PEFT library versions (links provided in README), LLaMA model weights in HF format, and Python. 8-bit fine-tuning uses ~20GB VRAM for max_seq_length=512, bs=2.

Highlighted Details

  • PEFT fine-tuning with 8-bit quantization reduces memory usage significantly.
  • Naive pipeline parallelism is implemented for training larger models across multiple GPUs.
  • Supports LoRA for efficient parameter-efficient fine-tuning.
  • Includes a script for tokenizing datasets into fixed-length sequences.

Maintenance & Community

The repository is a personal project with feedback welcomed. Specific contributors, sponsorships, or community channels are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The code is described as "fairly quickly thrown together" and may contain bugs. The PEFT fine-tuning with pipeline parallelism is noted as "buggy, don't use this yet." Hyperparameter tuning advice is minimal, and the impact of max_sequence_length on performance is unknown.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.