Fine-tuning script for Mistral-7B
Top 49.0% on sourcepulse
This repository provides code for full fine-tuning the Mistral-7B language model, targeting users with multi-GPU setups (e.g., 3090s, A100s, H100s). It enables customization of the Mistral-7B model for specific downstream tasks by training on custom datasets.
How It Works
The project utilizes PyTorch and FSDP (Fully Sharded Data Parallel) for distributed training across multiple GPUs. It focuses on full parameter fine-tuning, meaning all model weights are updated, offering potentially deeper adaptation than parameter-efficient methods. The train.py
script orchestrates the training process, accepting configurations for data, learning rate, and FSDP backward prefetching.
Quick Start & Requirements
python -m venv env && source env/bin/activate && pip install -r requirements.txt
export HF_TOKEN="[insert token here]"
torchrun --nnodes=1 --nproc-per-node=<REPLACE_WITH_NUMBER_OF_GPUS> train.py
data
folder as train.jsonl
and validation.jsonl
.Highlighted Details
BACKWARD_PRE
, BACKWARD_POST
) for potential memory optimization.Maintenance & Community
No specific information on maintainers, community channels, or roadmap is provided in the README.
Licensing & Compatibility
The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is therefore undetermined.
Limitations & Caveats
The project is explicitly for full fine-tuning and does not support parameter-efficient methods like QLoRA. The optimal number of epochs and the necessity of adjusting gradient clipping or weight decay may vary depending on the dataset and hardware configuration.
1 year ago
1 day