falcontune by rmihaylov

CLI tool for finetuning Falcon LLMs

Created 2 years ago

464 stars

Top 65.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Luca Soldaini

Research Scientist at Ai2

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

This project enables 4-bit finetuning of Falcon language models on consumer-grade GPUs, making large model customization accessible to a wider audience. It targets researchers and developers looking to adapt large language models for specific tasks without requiring extensive hardware resources. The primary benefit is the ability to finetune powerful models like Falcon-40B on a single A100 40GB GPU.

How It Works

Falcontune implements the LoRA (Low-Rank Adaptation) algorithm on top of LLMs compressed using the GPTQ quantization method. This approach requires a custom backward pass for the quantized model, enabling efficient finetuning. The use of Triton for the backend provides high performance, with text generation on an A100 40GB taking approximately 10 seconds for a 50-token output.

Quick Start & Requirements

Install: pip install -r requirements.txt followed by python setup.py install. For CUDA support, run python setup_cuda.py install.
Prerequisites: An A100 40GB GPU is stated as required for finetuning.
Model Download: Requires downloading model weights, e.g., wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors.
Dataset Download: Requires a dataset, e.g., wget https://github.com/gururise/AlpacaDataCleaned/raw/main/alpaca_data_cleaned.json.
Demo: A Google Colab notebook is linked in the README.

Highlighted Details

Enables 4-bit finetuning of Falcon models.
Leverages GPTQ for model compression and LoRA for efficient adaptation.
Custom backward pass implementation for quantized models.
Triton backend for fast inference and finetuning.

Maintenance & Community

The project acknowledges contributions from the GPTQ codebase, alpaca_lora_4bit, and PEFT. Contact information for custom solutions is provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project explicitly states that an A100 40GB GPU is required for finetuning, which may be a significant barrier for users without access to such hardware. The absence of a specified license raises concerns about usage rights.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days