llm-finetuning  by modal-labs

LLM fine-tuning guide using Modal and Axolotl

Created 2 years ago
621 stars

Top 53.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a guide and tooling for fine-tuning large language models (LLMs) like Llama, Mistral, and CodeLlama using the axolotl library on Modal's serverless GPU infrastructure. It targets developers and researchers aiming for efficient, scalable LLM fine-tuning without managing underlying hardware.

How It Works

The project leverages axolotl for its comprehensive LLM fine-tuning capabilities, including support for DeepSpeed ZeRO, LoRA adapters, and Flash Attention. Modal provides a serverless execution environment, abstracting away Docker image management and GPU provisioning. This allows users to scale training jobs across multiple GPUs and deploy inference endpoints easily.

Quick Start & Requirements

  • Install: pip install modal
  • Authentication: Requires Modal account and token (python3 -m modal setup), Hugging Face API token (as my-huggingface-secret), and optionally Weights & Biases credentials.
  • Data: Agree to terms for specific Hugging Face models (e.g., Llama 3).
  • Launch: modal run --detach src.train --config=config/mistral-memorize.yml --data=data/sqlqa.subsample.jsonl
  • Docs: Modal Docs, Axolotl Config

Highlighted Details

  • Integrates axolotl with Modal for serverless LLM fine-tuning.
  • Supports state-of-the-art optimizations: DeepSpeed ZeRO, LoRA, Flash Attention.
  • Enables easy multi-GPU training configuration via environment variables (e.g., GPU_CONFIG=a100-80gb:4).
  • Provides serverless inference deployment with modal deploy src.inference.

Maintenance & Community

  • Developed by Modal Labs.
  • Links to Modal documentation and community resources are available.

Licensing & Compatibility

  • The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0), but it relies on axolotl and models from Hugging Face, which have their own licenses. Users must comply with the terms of service for Modal, Hugging Face models, and Weights & Biases.

Limitations & Caveats

  • Configuration is primarily managed through YAML files, differing from axolotl's CLI-centric approach.
  • CUDA Out of Memory (OOM) errors can occur if GPU resources are insufficient or batch sizes/sequence lengths are too high.
  • Training on very small datasets may lead to ZeroDivisionError.
Health Check
Last Commit

4 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Starred by Casper Hansen Casper Hansen(Author of AutoAWQ), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
5 more.

xtuner by InternLM

0.5%
5k
LLM fine-tuning toolkit for research
Created 2 years ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
8 more.

lit-llama by Lightning-AI

0.1%
6k
LLaMA implementation for pretraining, finetuning, and inference
Created 2 years ago
Updated 2 months ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.