fine-tune-mistral by abacaj

Fine-tuning script for Mistral-7B

Created 2 years ago

723 stars

Top 47.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Project Summary

This repository provides code for full fine-tuning the Mistral-7B language model, targeting users with multi-GPU setups (e.g., 3090s, A100s, H100s). It enables customization of the Mistral-7B model for specific downstream tasks by training on custom datasets.

How It Works

The project utilizes PyTorch and FSDP (Fully Sharded Data Parallel) for distributed training across multiple GPUs. It focuses on full parameter fine-tuning, meaning all model weights are updated, offering potentially deeper adaptation than parameter-efficient methods. The train.py script orchestrates the training process, accepting configurations for data, learning rate, and FSDP backward prefetching.

Quick Start & Requirements

Install dependencies: python -m venv env && source env/bin/activate && pip install -r requirements.txt
Set Hugging Face token: export HF_TOKEN="[insert token here]"
Run training: torchrun --nnodes=1 --nproc-per-node=<REPLACE_WITH_NUMBER_OF_GPUS> train.py
Prerequisites: Python environment, Hugging Face token, multi-GPU setup (e.g., NVIDIA GPUs with CUDA).
Data: Place custom data in the data folder as train.jsonl and validation.jsonl.

Highlighted Details

Supports full fine-tuning of Mistral-7B, not QLoRA or other PEFT methods.
Recommends >1k training samples for effective fine-tuning.
Suggests evaluating model performance on a separate validation set to monitor improvement and prevent overfitting.
Offers FSDP backward prefetching options (BACKWARD_PRE, BACKWARD_POST) for potential memory optimization.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The project is explicitly for full fine-tuning and does not support parameter-efficient methods like QLoRA. The optimal number of epochs and the necessity of adjusting gradient clipping or weight decay may vary depending on the dataset and hardware configuration.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days