BLoRA by sabetAI

Inference optimization for batched LoRA adapters

Created 2 years ago

348 stars

Top 79.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeremy Howard

Cofounder of fast.ai

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides a method for batching multiple LoRA (Low-Rank Adaptation) adapters for simultaneous inference with a single base model. It targets users of large language models who want to leverage multiple specialized LoRAs without the overhead of loading separate models, thereby maximizing GPU utilization and inference throughput.

How It Works

BLoRA leverages the additive nature of LoRA operations, which are applied to specific layers of a base model. By broadcasting and applying multiple LoRA adapters concurrently within a single batch, it allows for parallel inference across different adapter configurations. This approach is advantageous as it avoids the need to load multiple model instances, keeping trainable parameters small and manageable within VRAM.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Requires Hugging Face Transformers and PEFT.
Example usage demonstrates loading a Llama base model and injecting LoRA checkpoints.

Highlighted Details

Enables simultaneous inference across multiple LoRA adapters on a single base model.
Maximizes GPU utilization by batching inference requests.
LoRA adapters are loaded and managed efficiently within VRAM.
Demonstrates a "hacky" method for side-loading LoRA batch IDs into the model for parallel processing.

Maintenance & Community

Project appears to be a personal or small-team effort, with acknowledgments to @yacineMTB for review.
No explicit community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The repository's default license is likely MIT unless otherwise specified.
Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The method for preparing batches involves a "hacky" side-loading of LoRA identifiers, which may indicate potential instability or future breaking changes. The README does not specify supported base models beyond Llama or detail performance benchmarks.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days