transformer-heads  by center-for-humans-and-machines

Toolkit for attaching, training, saving, and loading new heads for transformer models

created 1 year ago
284 stars

Top 93.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a toolkit for attaching, training, saving, and loading custom "heads" onto pre-trained transformer models. It enables researchers and practitioners to easily adapt large language models for new tasks, such as linear probing for interpretability, fine-tuning for classification or regression, and multi-task learning, thereby enhancing model versatility and facilitating efficient experimentation.

How It Works

The core approach involves defining HeadConfig objects that specify the desired head's properties, including its attachment layer, input/output dimensions, activation function, loss function, and target data column. The load_headed function then seamlessly integrates these heads by replacing or augmenting the transformer's original output layer. This modular design allows for flexible experimentation with various downstream tasks and training strategies, including efficient methods like QLoRA.

Quick Start & Requirements

  • Install from PyPI: pip install transformer-heads
  • Or clone and install locally: pip install -e .
  • Requires Python and Hugging Face Transformers.
  • Notebooks demonstrate usage with GPT-2 and Llama models.
  • Official documentation and examples are available via provided links.

Highlighted Details

  • Supports attaching multiple heads simultaneously for multi-task learning.
  • Integrates with Hugging Face's Trainer for simplified training workflows.
  • Offers QLoRA support for reduced memory overhead and efficient fine-tuning.
  • Includes notebooks for linear probing, classification, regression, and joint multi-task learning.

Maintenance & Community

  • Developed by the Center for Humans and Machines.
  • Links to documentation, getting started guides, and Reddit discussions are provided.

Licensing & Compatibility

  • The library is released under an unspecified license. Further clarification on licensing terms is recommended for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not explicitly state the license, which could be a concern for commercial adoption. Support for custom model architectures relies on them having a similar attribute structure to Hugging Face's LlamaForCausalLM, requiring potential modifications for non-standard models.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

adapters by adapter-hub

0.1%
3k
Unified library for parameter-efficient transfer learning in NLP
created 5 years ago
updated 2 months ago
Feedback? Help us improve.