llm-finetuning by modal-labs

LLM fine-tuning guide using Modal and Axolotl

Created 2 years ago

645 stars

Top 51.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Beyang Liu

Cofounder of Sourcegraph

Shyamal Anadkat

Research Scientist at OpenAI

Jeff Hammerbacher

Cofounder of Cloudera

Erik Bernhardsson

Founder of Modal

and 1 more!

Project Summary

This repository provides a guide and tooling for fine-tuning large language models (LLMs) like Llama, Mistral, and CodeLlama using the axolotl library on Modal's serverless GPU infrastructure. It targets developers and researchers aiming for efficient, scalable LLM fine-tuning without managing underlying hardware.

How It Works

The project leverages axolotl for its comprehensive LLM fine-tuning capabilities, including support for DeepSpeed ZeRO, LoRA adapters, and Flash Attention. Modal provides a serverless execution environment, abstracting away Docker image management and GPU provisioning. This allows users to scale training jobs across multiple GPUs and deploy inference endpoints easily.

Quick Start & Requirements

Install: pip install modal
Authentication: Requires Modal account and token (python3 -m modal setup), Hugging Face API token (as my-huggingface-secret), and optionally Weights & Biases credentials.
Data: Agree to terms for specific Hugging Face models (e.g., Llama 3).
Launch: modal run --detach src.train --config=config/mistral-memorize.yml --data=data/sqlqa.subsample.jsonl
Docs: Modal Docs, Axolotl Config

Highlighted Details

Integrates axolotl with Modal for serverless LLM fine-tuning.
Supports state-of-the-art optimizations: DeepSpeed ZeRO, LoRA, Flash Attention.
Enables easy multi-GPU training configuration via environment variables (e.g., GPU_CONFIG=a100-80gb:4).
Provides serverless inference deployment with modal deploy src.inference.

Maintenance & Community

Developed by Modal Labs.
Links to Modal documentation and community resources are available.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0), but it relies on axolotl and models from Hugging Face, which have their own licenses. Users must comply with the terms of service for Modal, Hugging Face models, and Weights & Biases.

Limitations & Caveats

Configuration is primarily managed through YAML files, differing from axolotl's CLI-centric approach.
CUDA Out of Memory (OOM) errors can occur if GPU resources are insufficient or batch sizes/sequence lengths are too high.
Training on very small datasets may lead to ZeroDivisionError.

Health Check

Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days