llmtools  by kuleshov-group

SDK for finetuning LLMs on consumer GPUs

Created 2 years ago
731 stars

Top 47.3% on SourcePulse

GitHubView on GitHub
Project Summary

LLMTools provides a Python library for finetuning and running Large Language Models (LLMs) on consumer-grade GPUs with significantly reduced memory requirements. It targets researchers and developers needing to adapt LLMs in low-resource environments, enabling efficient finetuning using novel quantization techniques.

How It Works

LLMTools leverages the ModuLoRA algorithm, which integrates low-precision LoRA finetuning with modular quantizers. This approach allows for finetuning LLMs at 2-bit, 3-bit, and 4-bit precision, a significant advancement over previous methods. The library offers a modular architecture supporting various LLMs, quantizers (like QuIP# and OPTQ), and optimization algorithms, facilitating easy experimentation and integration with the HuggingFace ecosystem.

Quick Start & Requirements

  • Installation: Clone the repository with submodules (git clone --recursive), set up a conda environment (Python 3.9.18, PyTorch 2.1.1 with CUDA 12.1), install dependencies from requirements.txt, and then run python setup.py install for both quiptools and llmtools.
  • Hardware: Requires an NVIDIA GPU (Pascal architecture or newer). Memory requirements vary by model size and quantization (e.g., 7B 2-bit requires 3GB, 65B 4-bit requires 40GB).
  • Resources: Pre-quantized model weights are available on HuggingFace Hub.
  • Documentation: Examples and blog posts are linked in the README.

Highlighted Details

  • Enables finetuning of 2-bit LLMs using the ModuLoRA and QuIP# integration.
  • Achieves performance comparable to or exceeding 4-bit and 8-bit finetuning methods on benchmarks like SAMSum.
  • Supports Naive Pipeline Parallelism (NPP) and Data Distributed Parallel (DDP) for multi-GPU training.
  • Provides a HuggingFace-like API for loading, generating, and finetuning quantized models.

Maintenance & Community

This is a research project from Cornell University. Feedback can be sent to Junjie Oscar Yin and Volodymyr Kuleshov. The project cites foundational work from Relax-ML Lab, HuggingFace PEFT, and LLAMA/OPT/BLOOM models.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, but it is based on other projects which may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is experimental work in progress. Out-of-the-box support for additional LLMs and quantizers is still under development. The README does not specify the exact license.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

AQLM by Vahe1994

0.4%
1k
PyTorch code for LLM compression via Additive Quantization (AQLM)
Created 1 year ago
Updated 1 month ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

qlora by artidoro

0.1%
11k
Finetuning tool for quantized LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.