llmtools by kuleshov-group

SDK for finetuning LLMs on consumer GPUs

Created 2 years ago

734 stars

Top 47.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

LLMTools provides a Python library for finetuning and running Large Language Models (LLMs) on consumer-grade GPUs with significantly reduced memory requirements. It targets researchers and developers needing to adapt LLMs in low-resource environments, enabling efficient finetuning using novel quantization techniques.

How It Works

LLMTools leverages the ModuLoRA algorithm, which integrates low-precision LoRA finetuning with modular quantizers. This approach allows for finetuning LLMs at 2-bit, 3-bit, and 4-bit precision, a significant advancement over previous methods. The library offers a modular architecture supporting various LLMs, quantizers (like QuIP# and OPTQ), and optimization algorithms, facilitating easy experimentation and integration with the HuggingFace ecosystem.

Quick Start & Requirements

Installation: Clone the repository with submodules (git clone --recursive), set up a conda environment (Python 3.9.18, PyTorch 2.1.1 with CUDA 12.1), install dependencies from requirements.txt, and then run python setup.py install for both quiptools and llmtools.
Hardware: Requires an NVIDIA GPU (Pascal architecture or newer). Memory requirements vary by model size and quantization (e.g., 7B 2-bit requires 3GB, 65B 4-bit requires 40GB).
Resources: Pre-quantized model weights are available on HuggingFace Hub.
Documentation: Examples and blog posts are linked in the README.

Highlighted Details

Enables finetuning of 2-bit LLMs using the ModuLoRA and QuIP# integration.
Achieves performance comparable to or exceeding 4-bit and 8-bit finetuning methods on benchmarks like SAMSum.
Supports Naive Pipeline Parallelism (NPP) and Data Distributed Parallel (DDP) for multi-GPU training.
Provides a HuggingFace-like API for loading, generating, and finetuning quantized models.

Maintenance & Community

This is a research project from Cornell University. Feedback can be sent to Junjie Oscar Yin and Volodymyr Kuleshov. The project cites foundational work from Relax-ML Lab, HuggingFace PEFT, and LLAMA/OPT/BLOOM models.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, but it is based on other projects which may have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is experimental work in progress. Out-of-the-box support for additional LLMs and quantizers is still under development. The README does not specify the exact license.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days