falcontune  by rmihaylov

CLI tool for finetuning Falcon LLMs

Created 2 years ago
465 stars

Top 65.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project enables 4-bit finetuning of Falcon language models on consumer-grade GPUs, making large model customization accessible to a wider audience. It targets researchers and developers looking to adapt large language models for specific tasks without requiring extensive hardware resources. The primary benefit is the ability to finetune powerful models like Falcon-40B on a single A100 40GB GPU.

How It Works

Falcontune implements the LoRA (Low-Rank Adaptation) algorithm on top of LLMs compressed using the GPTQ quantization method. This approach requires a custom backward pass for the quantized model, enabling efficient finetuning. The use of Triton for the backend provides high performance, with text generation on an A100 40GB taking approximately 10 seconds for a 50-token output.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by python setup.py install. For CUDA support, run python setup_cuda.py install.
  • Prerequisites: An A100 40GB GPU is stated as required for finetuning.
  • Model Download: Requires downloading model weights, e.g., wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors.
  • Dataset Download: Requires a dataset, e.g., wget https://github.com/gururise/AlpacaDataCleaned/raw/main/alpaca_data_cleaned.json.
  • Demo: A Google Colab notebook is linked in the README.

Highlighted Details

  • Enables 4-bit finetuning of Falcon models.
  • Leverages GPTQ for model compression and LoRA for efficient adaptation.
  • Custom backward pass implementation for quantized models.
  • Triton backend for fast inference and finetuning.

Maintenance & Community

The project acknowledges contributions from the GPTQ codebase, alpaca_lora_4bit, and PEFT. Contact information for custom solutions is provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project explicitly states that an A100 40GB GPU is required for finetuning, which may be a significant barrier for users without access to such hardware. The absence of a specified license raises concerns about usage rights.

Health Check
Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

qlora by artidoro

0.1%
11k
Finetuning tool for quantized LLMs
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 14 hours ago
Feedback? Help us improve.