fine-tune-mistral  by abacaj

Fine-tuning script for Mistral-7B

created 1 year ago
716 stars

Top 49.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for full fine-tuning the Mistral-7B language model, targeting users with multi-GPU setups (e.g., 3090s, A100s, H100s). It enables customization of the Mistral-7B model for specific downstream tasks by training on custom datasets.

How It Works

The project utilizes PyTorch and FSDP (Fully Sharded Data Parallel) for distributed training across multiple GPUs. It focuses on full parameter fine-tuning, meaning all model weights are updated, offering potentially deeper adaptation than parameter-efficient methods. The train.py script orchestrates the training process, accepting configurations for data, learning rate, and FSDP backward prefetching.

Quick Start & Requirements

  • Install dependencies: python -m venv env && source env/bin/activate && pip install -r requirements.txt
  • Set Hugging Face token: export HF_TOKEN="[insert token here]"
  • Run training: torchrun --nnodes=1 --nproc-per-node=<REPLACE_WITH_NUMBER_OF_GPUS> train.py
  • Prerequisites: Python environment, Hugging Face token, multi-GPU setup (e.g., NVIDIA GPUs with CUDA).
  • Data: Place custom data in the data folder as train.jsonl and validation.jsonl.

Highlighted Details

  • Supports full fine-tuning of Mistral-7B, not QLoRA or other PEFT methods.
  • Recommends >1k training samples for effective fine-tuning.
  • Suggests evaluating model performance on a separate validation set to monitor improvement and prevent overfitting.
  • Offers FSDP backward prefetching options (BACKWARD_PRE, BACKWARD_POST) for potential memory optimization.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The project is explicitly for full fine-tuning and does not support parameter-efficient methods like QLoRA. The optimal number of epochs and the necessity of adjusting gradient clipping or weight decay may vary depending on the dataset and hardware configuration.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Feedback? Help us improve.