LLM fine-tuning guide using Modal and Axolotl
Top 54.5% on sourcepulse
This repository provides a guide and tooling for fine-tuning large language models (LLMs) like Llama, Mistral, and CodeLlama using the axolotl
library on Modal's serverless GPU infrastructure. It targets developers and researchers aiming for efficient, scalable LLM fine-tuning without managing underlying hardware.
How It Works
The project leverages axolotl
for its comprehensive LLM fine-tuning capabilities, including support for DeepSpeed ZeRO, LoRA adapters, and Flash Attention. Modal provides a serverless execution environment, abstracting away Docker image management and GPU provisioning. This allows users to scale training jobs across multiple GPUs and deploy inference endpoints easily.
Quick Start & Requirements
pip install modal
python3 -m modal setup
), Hugging Face API token (as my-huggingface-secret
), and optionally Weights & Biases credentials.modal run --detach src.train --config=config/mistral-memorize.yml --data=data/sqlqa.subsample.jsonl
Highlighted Details
axolotl
with Modal for serverless LLM fine-tuning.GPU_CONFIG=a100-80gb:4
).modal deploy src.inference
.Maintenance & Community
Licensing & Compatibility
axolotl
and models from Hugging Face, which have their own licenses. Users must comply with the terms of service for Modal, Hugging Face models, and Weights & Biases.Limitations & Caveats
axolotl
's CLI-centric approach.ZeroDivisionError
.2 months ago
1+ week