BLoRA  by sabetAI

Inference optimization for batched LoRA adapters

created 2 years ago
344 stars

Top 81.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a method for batching multiple LoRA (Low-Rank Adaptation) adapters for simultaneous inference with a single base model. It targets users of large language models who want to leverage multiple specialized LoRAs without the overhead of loading separate models, thereby maximizing GPU utilization and inference throughput.

How It Works

BLoRA leverages the additive nature of LoRA operations, which are applied to specific layers of a base model. By broadcasting and applying multiple LoRA adapters concurrently within a single batch, it allows for parallel inference across different adapter configurations. This approach is advantageous as it avoids the need to load multiple model instances, keeping trainable parameters small and manageable within VRAM.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires Hugging Face Transformers and PEFT.
  • Example usage demonstrates loading a Llama base model and injecting LoRA checkpoints.

Highlighted Details

  • Enables simultaneous inference across multiple LoRA adapters on a single base model.
  • Maximizes GPU utilization by batching inference requests.
  • LoRA adapters are loaded and managed efficiently within VRAM.
  • Demonstrates a "hacky" method for side-loading LoRA batch IDs into the model for parallel processing.

Maintenance & Community

  • Project appears to be a personal or small-team effort, with acknowledgments to @yacineMTB for review.
  • No explicit community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository's default license is likely MIT unless otherwise specified.
  • Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The method for preparing batches involves a "hacky" side-loading of LoRA identifiers, which may indicate potential instability or future breaking changes. The README does not specify supported base models beyond Llama or detail performance benchmarks.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

punica by punica-ai

0%
1k
LoRA serving system (research paper) for multi-tenant LLM inference
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
2 more.

S-LoRA by S-LoRA

0.1%
2k
System for scalable LoRA adapter serving
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

lorax by predibase

0.4%
3k
Multi-LoRA inference server for serving 1000s of fine-tuned LLMs
created 1 year ago
updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
6 more.

LoRA by microsoft

0.3%
12k
PyTorch library for low-rank adaptation (LoRA) of LLMs
created 4 years ago
updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.