concept-erasure  by EleutherAI

Concept erasure for neural representations

Created 3 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

EleutherAI/concept-erasure provides LEAst-squares Concept Erasure (LEACE), a method designed to remove specified concepts from neural representations. It targets machine learning practitioners aiming to enhance model fairness (e.g., mitigating bias related to protected attributes) and interpretability by observing changes in model behavior after concept removal. LEACE offers provable guarantees against linear classifiers while minimizing damage to the original representation, thereby preserving its utility for downstream tasks.

How It Works

LEACE employs a closed-form solution derived from least-squares principles, offering a mathematically rigorous approach to concept erasure. This method guarantees that no linear classifier can detect the targeted concept in the modified representation, providing strong assurances for fairness and interpretability tasks. The core advantage lies in its minimal impact on the representation's utility, ensuring downstream tasks are not unduly affected. The library facilitates this through LeaceFitter for incremental updates (O(d^2) memory) and LeaceEraser for a compact representation of the erasure function (O(dk) memory), balancing computational needs with storage efficiency for diverse use cases.

Quick Start & Requirements

  • Installation: pip install concept-erasure
  • Prerequisites: Python 3.10+. Examples utilize PyTorch and scikit-learn.
  • Links: Paper link available in the repository (URL not provided in README).

Highlighted Details

  • Provable Concept Removal: Guarantees erasure against linear classifiers, offering strong theoretical backing.
  • Minimal Representation Damage: The least-squares optimization objective specifically aims to preserve the representation's utility for other tasks.
  • Flexible Usage: Supports both batch fitting on static datasets and incremental updates via LeaceFitter.update() for streaming data scenarios.
  • Model Scrubbing: Includes specialized implementations for concept scrubbing within LLaMA and GPT-NeoX models, facilitating direct application on transformer architectures.

Maintenance & Community

No specific details on maintainers, community channels (e.g., Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The repository's license is not specified in the README, which requires clarification for commercial or integration use.

Limitations & Caveats

The concept scrubbing implementation is described as "messy" and subject to refactoring. Provable guarantees are limited to linear classifiers. Tagged datasets for experiments are pending upload.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0.1%
887
Pretraining code for depth-recurrent language model research
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.