Discover and explore top open-source AI tools and projects—updated daily.
p-e-wAutomatic LLM censorship removal
Top 11.8% on SourcePulse
Heretic provides a fully automatic solution for removing censorship ("safety alignment") from transformer-based language models. It targets engineers and researchers who need to adapt LLMs for broader applications without the need for expensive post-training, offering a way to decensor models while preserving their original intelligence and capabilities.
How It Works
Heretic combines an advanced implementation of directional ablation ("abliteration") with a TPE-based parameter optimizer from Optuna. The tool automatically identifies optimal ablation parameters by simultaneously minimizing model refusals and the KL divergence from the original model. This co-minimization strategy ensures that the decensored model retains as much of its original intelligence as possible, and the process requires no specialized knowledge of transformer internals, making it accessible to users familiar with command-line operations.
Quick Start & Requirements
pip install heretic-llmheretic <model_name_or_path> from the command line.Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding notable contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
Limitations & Caveats
Heretic does not yet support models based on State Space Models (SSMs), hybrid architectures, models with inhomogeneous layers, or certain novel attention systems.
2 days ago
Inactive
apoorvumang
lucidrains