falcontune  by rmihaylov

CLI tool for finetuning Falcon LLMs

created 2 years ago
465 stars

Top 66.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project enables 4-bit finetuning of Falcon language models on consumer-grade GPUs, making large model customization accessible to a wider audience. It targets researchers and developers looking to adapt large language models for specific tasks without requiring extensive hardware resources. The primary benefit is the ability to finetune powerful models like Falcon-40B on a single A100 40GB GPU.

How It Works

Falcontune implements the LoRA (Low-Rank Adaptation) algorithm on top of LLMs compressed using the GPTQ quantization method. This approach requires a custom backward pass for the quantized model, enabling efficient finetuning. The use of Triton for the backend provides high performance, with text generation on an A100 40GB taking approximately 10 seconds for a 50-token output.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by python setup.py install. For CUDA support, run python setup_cuda.py install.
  • Prerequisites: An A100 40GB GPU is stated as required for finetuning.
  • Model Download: Requires downloading model weights, e.g., wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors.
  • Dataset Download: Requires a dataset, e.g., wget https://github.com/gururise/AlpacaDataCleaned/raw/main/alpaca_data_cleaned.json.
  • Demo: A Google Colab notebook is linked in the README.

Highlighted Details

  • Enables 4-bit finetuning of Falcon models.
  • Leverages GPTQ for model compression and LoRA for efficient adaptation.
  • Custom backward pass implementation for quantized models.
  • Triton backend for fast inference and finetuning.

Maintenance & Community

The project acknowledges contributions from the GPTQ codebase, alpaca_lora_4bit, and PEFT. Contact information for custom solutions is provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project explicitly states that an A100 40GB GPU is required for finetuning, which may be a significant barrier for users without access to such hardware. The absence of a specified license raises concerns about usage rights.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 17 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
4 more.

llm-awq by mit-han-lab

0.4%
3k
Weight quantization research paper for LLM compression/acceleration
created 2 years ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

GPTQ-for-LLaMa by qwopqwop200

0.0%
3k
4-bit quantization for LLaMA models using GPTQ
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
6 more.

AutoGPTQ by AutoGPTQ

0.1%
5k
LLM quantization package using GPTQ algorithm
created 2 years ago
updated 3 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.