fine-tune-mistral  by abacaj

Fine-tuning script for Mistral-7B

Created 1 year ago
717 stars

Top 48.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code for full fine-tuning the Mistral-7B language model, targeting users with multi-GPU setups (e.g., 3090s, A100s, H100s). It enables customization of the Mistral-7B model for specific downstream tasks by training on custom datasets.

How It Works

The project utilizes PyTorch and FSDP (Fully Sharded Data Parallel) for distributed training across multiple GPUs. It focuses on full parameter fine-tuning, meaning all model weights are updated, offering potentially deeper adaptation than parameter-efficient methods. The train.py script orchestrates the training process, accepting configurations for data, learning rate, and FSDP backward prefetching.

Quick Start & Requirements

  • Install dependencies: python -m venv env && source env/bin/activate && pip install -r requirements.txt
  • Set Hugging Face token: export HF_TOKEN="[insert token here]"
  • Run training: torchrun --nnodes=1 --nproc-per-node=<REPLACE_WITH_NUMBER_OF_GPUS> train.py
  • Prerequisites: Python environment, Hugging Face token, multi-GPU setup (e.g., NVIDIA GPUs with CUDA).
  • Data: Place custom data in the data folder as train.jsonl and validation.jsonl.

Highlighted Details

  • Supports full fine-tuning of Mistral-7B, not QLoRA or other PEFT methods.
  • Recommends >1k training samples for effective fine-tuning.
  • Suggests evaluating model performance on a separate validation set to monitor improvement and prevent overfitting.
  • Offers FSDP backward prefetching options (BACKWARD_PRE, BACKWARD_POST) for potential memory optimization.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The project is explicitly for full fine-tuning and does not support parameter-efficient methods like QLoRA. The optimal number of epochs and the necessity of adjusting gradient clipping or weight decay may vary depending on the dataset and hardware configuration.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SkyThought by NovaSky-AI

0.1%
3k
Training recipes for Sky-T1 family of models
Created 8 months ago
Updated 2 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 15 hours ago
Feedback? Help us improve.