wisent-guard  by wisent-ai

Python package for latent space monitoring and guardrails

created 4 months ago
320 stars

Top 86.0% on sourcepulse

GitHubView on GitHub
Project Summary

Wisent-Guard is a Python package for monitoring and controlling AI model activations, targeting developers and researchers seeking to mitigate harmful outputs and hallucinations. It offers a self-hosted, open-source alternative to traditional guardrails by analyzing internal model representations, providing deeper insights and more robust safety measures.

How It Works

Wisent-Guard employs a representation engineering approach, using contrastive pairs of "harmful" vs. "harmless" phrase activations to identify undesirable model behavior. It trains classifiers on these activation patterns, allowing for real-time monitoring during inference. This method aims to detect out-of-distribution harmful content and hallucinations by analyzing the model's internal "thoughts," rather than just the final output.

Quick Start & Requirements

  • Install: pip install wisent-guard
  • Prerequisites: Python, Hugging Face Transformers models. Apple Silicon (MPS) support is available.
  • Setup: Requires loading a Hugging Face model and tokenizer. Training a classifier involves providing phrase pairs.
  • Docs: Examples folder provide detailed usage.

Highlighted Details

  • Achieves a 43% hallucination rate reduction on Llama 3.1 8B for TruthfulQA.
  • Model-agnostic, supporting most transformer-based language models.
  • Features include customizable thresholds, layer selection, real-time monitoring, and response logging.
  • Offers early termination with customizable placeholder messages.

Maintenance & Community

  • Developed by Lukasz Bartoszcze.
  • Contributions are welcome via Pull Requests.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Wisent-Guard is described as experimental technology requiring careful hyperparameter tuning (model tokens, activation layers) for specific use cases. Latency and compute can be concerns, though support is offered for optimization.

Health Check
Last commit

2 hours ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Dominik Moritz Dominik Moritz(Professor at CMU; ML Researcher at Apple), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

ecco by jalammar

0%
2k
Python library for interactive NLP model visualization in Jupyter notebooks
created 4 years ago
updated 11 months ago
Feedback? Help us improve.