concept-erasure by EleutherAI

Concept erasure for neural representations

Created 3 years ago

257 stars

Top 98.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Wing Lian

Founder of Axolotl AI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Summary

EleutherAI/concept-erasure provides LEAst-squares Concept Erasure (LEACE), a method designed to remove specified concepts from neural representations. It targets machine learning practitioners aiming to enhance model fairness (e.g., mitigating bias related to protected attributes) and interpretability by observing changes in model behavior after concept removal. LEACE offers provable guarantees against linear classifiers while minimizing damage to the original representation, thereby preserving its utility for downstream tasks.

How It Works

LEACE employs a closed-form solution derived from least-squares principles, offering a mathematically rigorous approach to concept erasure. This method guarantees that no linear classifier can detect the targeted concept in the modified representation, providing strong assurances for fairness and interpretability tasks. The core advantage lies in its minimal impact on the representation's utility, ensuring downstream tasks are not unduly affected. The library facilitates this through LeaceFitter for incremental updates (O(d^2) memory) and LeaceEraser for a compact representation of the erasure function (O(dk) memory), balancing computational needs with storage efficiency for diverse use cases.

Quick Start & Requirements

Installation: pip install concept-erasure
Prerequisites: Python 3.10+. Examples utilize PyTorch and scikit-learn.
Links: Paper link available in the repository (URL not provided in README).

Highlighted Details

Provable Concept Removal: Guarantees erasure against linear classifiers, offering strong theoretical backing.
Minimal Representation Damage: The least-squares optimization objective specifically aims to preserve the representation's utility for other tasks.
Flexible Usage: Supports both batch fitting on static datasets and incremental updates via LeaceFitter.update() for streaming data scenarios.
Model Scrubbing: Includes specialized implementations for concept scrubbing within LLaMA and GPT-NeoX models, facilitating direct application on transformer architectures.

Maintenance & Community

No specific details on maintainers, community channels (e.g., Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The repository's license is not specified in the README, which requires clarification for commercial or integration use.

Limitations & Caveats

The concept scrubbing implementation is described as "messy" and subject to refactoring. Provable guarantees are limited to linear classifiers. Tagged datasets for experiments are pending upload.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days