Discover and explore top open-source AI tools and projects—updated daily.
kitftExplain LLM activations with natural language autoencoders
New!
Top 48.2% on SourcePulse
Summary
This repository provides Natural Language Autoencoders (NLA), an open-source library for generating unsupervised explanations of LLM activations. By mapping activation vectors to natural language and back, NLA offers researchers and engineers a tool for understanding internal LLM mechanisms and the semantic content captured by model activations.
How It Works
NLAs comprise an Activation Verbalizer (AV) mapping vectors to text and an Activation Reconstructor (AR) mapping text back to vectors. The AV injects the activation vector as a token embedding into a prompt and autoregresses a description. The AR uses a truncated LM to recover the vector from text. L2-normalized vectors are used, with round-trip Mean Squared Error (MSE) quantifying explanation quality via directional agreement.
Quick Start & Requirements
For inference, install torch transformers safetensors httpx orjson pyyaml numpy "sglang[all]>=0.5.6". Launch SGLang server (python -m sglang.launch_server --model-path <model_path> --port 30000 --disable-radix-cache &) then run inference (python nla_inference.py <model_path> --sglang-url http://localhost:30000 --parquet path/to/activations.parquet). Training requires substantial GPU resources (e.g., multiple H100s) and involves data generation, SFT, and RL stages detailed in configs/TRAINING_NOTES.md. Inference docs are in docs/inference.md.
Highlighted Details
input_embeds.Maintenance & Community
The project lists multiple authors in its academic citation. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The core library is Apache-2.0 licensed, permissive for commercial use. Released checkpoints inherit base model licenses (Gemma, Llama-3.3), which may impose additional restrictions. Users must consult base model NOTICE files.
Limitations & Caveats
Reproducing checkpoints demands significant computational resources (e.g., 8x H100s for RL). Inference relies on specific SGLang configurations. Users must verify base LLM license terms for commercial deployment compatibility.
2 weeks ago
Inactive
oxford-cs-deepnlp-2017