llmtools  by kuleshov-group

SDK for finetuning LLMs on consumer GPUs

created 2 years ago
727 stars

Top 48.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LLMTools provides a Python library for finetuning and running Large Language Models (LLMs) on consumer-grade GPUs with significantly reduced memory requirements. It targets researchers and developers needing to adapt LLMs in low-resource environments, enabling efficient finetuning using novel quantization techniques.

How It Works

LLMTools leverages the ModuLoRA algorithm, which integrates low-precision LoRA finetuning with modular quantizers. This approach allows for finetuning LLMs at 2-bit, 3-bit, and 4-bit precision, a significant advancement over previous methods. The library offers a modular architecture supporting various LLMs, quantizers (like QuIP# and OPTQ), and optimization algorithms, facilitating easy experimentation and integration with the HuggingFace ecosystem.

Quick Start & Requirements

  • Installation: Clone the repository with submodules (git clone --recursive), set up a conda environment (Python 3.9.18, PyTorch 2.1.1 with CUDA 12.1), install dependencies from requirements.txt, and then run python setup.py install for both quiptools and llmtools.
  • Hardware: Requires an NVIDIA GPU (Pascal architecture or newer). Memory requirements vary by model size and quantization (e.g., 7B 2-bit requires 3GB, 65B 4-bit requires 40GB).
  • Resources: Pre-quantized model weights are available on HuggingFace Hub.
  • Documentation: Examples and blog posts are linked in the README.

Highlighted Details

  • Enables finetuning of 2-bit LLMs using the ModuLoRA and QuIP# integration.
  • Achieves performance comparable to or exceeding 4-bit and 8-bit finetuning methods on benchmarks like SAMSum.
  • Supports Naive Pipeline Parallelism (NPP) and Data Distributed Parallel (DDP) for multi-GPU training.
  • Provides a HuggingFace-like API for loading, generating, and finetuning quantized models.

Maintenance & Community

This is a research project from Cornell University. Feedback can be sent to Junjie Oscar Yin and Volodymyr Kuleshov. The project cites foundational work from Relax-ML Lab, HuggingFace PEFT, and LLAMA/OPT/BLOOM models.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, but it is based on other projects which may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is experimental work in progress. Out-of-the-box support for additional LLMs and quantizers is still under development. The README does not specify the exact license.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.