OBLITERATUS  by elder-plinius

Advanced toolkit for liberating LLMs from refusal behaviors

Created 4 days ago

New!

1,945 stars

Top 22.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OBLITERATUS is an advanced open-source toolkit for understanding and removing refusal behaviors from large language models (LLMs) without retraining. It targets researchers and engineers seeking to liberate models from artificial gatekeeping while preserving core capabilities. The project also serves as a distributed research experiment, crowdsourcing data to advance LLM interpretability.

How It Works

The project implements "abliteration," a technique that identifies and surgically removes internal representations responsible for content refusal. This involves probing model states, extracting refusal directions via methods like SVD, and intervening at inference time. This approach enables precise model liberation without retraining, preserving general language and reasoning abilities, and contributes to a growing research dataset.

Quick Start & Requirements

  • Primary Install/Run: HuggingFace Spaces (zero setup), local Gradio UI (pip install -e ".[spaces]" then obliteratus ui), Google Colab, CLI (pip install -e . then obliteratus obliterate ...), Python API.
  • Prerequisites: Any HuggingFace transformer model. GPU recommended for local execution.
  • Links: HuggingFace Spaces demo available.

Highlighted Details

  • "Abliteration" techniques for precise refusal removal without retraining.
  • 15 analysis modules for deep mapping of refusal mechanisms (e.g., Concept Cone Geometry, Alignment Imprint Detection).
  • Novel techniques like Expert-Granular Abliteration (EGA) and Analysis-Informed Pipeline for auto-configured liberation.
  • Supports both permanent weight projection and reversible steering vectors.
  • Crowd-sourced research platform via opt-in telemetry, contributing to a dataset on refusal universality.

Maintenance & Community

  • Community-driven research central to the project.
  • Opt-in telemetry and PRs contribute to a shared dataset and leaderboard.
  • Active development indicated by novel techniques and an extensive test suite (837 tests).

Licensing & Compatibility

  • License: Dual-licensed: AGPL-3.0 (open source) and Commercial.
  • Restrictions: AGPL-3.0 requires source disclosure for network services; commercial license available for proprietary SaaS or closed-source products.

Limitations & Caveats

The AGPL-3.0 license's network service clause may require a commercial license for certain deployments. Users opting out of telemetry will not contribute to the shared research dataset. The advanced analytical features may present a steep learning curve.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
10
Star History
1,997 stars in the last 4 days

Explore Similar Projects

Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

transformer-debugger by openai

0.0%
4k
Tool for language model behavior investigation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.