Code for running and fine-tuning LLaMA models
Top 67.0% on sourcepulse
This repository provides a minimal set of tools for running and fine-tuning LLaMA models, targeting researchers and practitioners who need to adapt large language models. It offers methods for data preparation and efficient fine-tuning techniques, aiming to simplify the process of customizing LLaMA for specific tasks.
How It Works
The project focuses on efficient fine-tuning strategies, including 8-bit quantization and Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA. It also introduces a naive implementation of pipeline parallelism to enable training on larger models that exceed single-GPU memory capacity. Data is pre-tokenized into fixed-length chunks for consistent processing.
Quick Start & Requirements
python tokenize_dataset.py --tokenizer_path <path> --jsonl_path <path> --save_path <path> --max_seq_length 512
python finetune_peft.py --model_path <path> --dataset_path <path> --peft_mode lora --lora_rank 8 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --max_steps 2500 --learning_rate 2e-4 --fp16 --logging_steps 10 --output_dir <path>
max_seq_length=512
, bs=2
.Highlighted Details
Maintenance & Community
The repository is a personal project with feedback welcomed. Specific contributors, sponsorships, or community channels are not detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The code is described as "fairly quickly thrown together" and may contain bugs. The PEFT fine-tuning with pipeline parallelism is noted as "buggy, don't use this yet." Hyperparameter tuning advice is minimal, and the impact of max_sequence_length
on performance is unknown.
1 year ago
Inactive