Black-Box-Tuning  by txsun1997

Gradient-free tuning method for Language-Model-as-a-Service

created 3 years ago
270 stars

Top 95.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Black-Box Tuning (BBT) and BBTv2, gradient-free methods for few-shot learning with large language models (LLMs). It enables efficient LLM adaptation for Language-Model-as-a-Service (LMaaS) by optimizing soft prompt tokens without backpropagation, achieving performance comparable to full model tuning. The target audience includes researchers and practitioners working with LLMs who need to fine-tune models without direct gradient access or for deployment scenarios.

How It Works

BBT optimizes a sequence of soft prompt tokens prepended to the LLM input. This approach treats the LLM as a black box, requiring only inference API calls. BBTv2 enhances this by employing a divide-and-conquer strategy to optimize prompts across all layers of the LLM, leading to improved performance on various tasks. Both methods leverage gradient-free optimization techniques, typically achieving good results within a limited number of forward passes.

Quick Start & Requirements

  • Install: Clone the repository and set up a conda environment:
    conda create --name bbt python=3.8
    conda activate bbt
    pip install transformers==4.1.1 fastNLP==0.6.0 datasets cma sklearn
    git clone https://github.com/txsun1997/Black-Box-Tuning
    cd Black-Box-Tuning
    
  • Prerequisites: Python 3.8, CUDA (tested with 11.4), NVIDIA GPU (tested on 3090).
  • Running BBT: bash run.sh
  • Running BBTv2: python deepbbt.py --model_name "roberta-large" --task_name "agnews" ...
  • Inference Optimization: Requires onnxruntime-gpu==1.10.0. Export and run with --inference_framework 'ort'.
  • Links: ICML'2022 Paper, EMNLP'2022 Paper

Highlighted Details

  • Supports BERT, BART, T5, GPT-2, and RoBERTa models.
  • Achieves ~2x speedup with ONNX Runtime optimization.
  • Offers parallel evaluation for improved efficiency.
  • Demonstrates competitive performance on various language understanding datasets.

Maintenance & Community

The project was last updated in October 2022 with the release of BBTv2. Key contributors are listed in the papers. No active community channels (Discord/Slack) are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. The code uses libraries like transformers and fastNLP, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The project's last update was in late 2022, and there's no indication of ongoing maintenance or support for newer LLM architectures or techniques. Installation of onnxruntime-gpu can be challenging due to environment-specific issues.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Zhuohan Li Zhuohan Li(Author of vLLM), and
1 more.

Consistency_LLM by hao-ai-lab

0%
397
Parallel decoder for efficient LLM inference
created 1 year ago
updated 8 months ago
Feedback? Help us improve.