loophole  by brendanhogan

AI system for stress-testing ethical principles and formalizing rules

Created 1 week ago

New!

344 stars

Top 80.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Loophole is an AI system that stress-tests and formalizes ethical principles by simulating adversarial legal system evolution. It targets users defining rules for AI, content moderation, or safety specifications, helping them discover genuine inconsistencies in their moral frameworks and leading to more robust ethical guidelines.

How It Works

An AI "Legislator" translates plain-language moral principles into formal legal code. Adversarial agents ("Loophole Finder," "Overreach Finder") then attack this code by seeking technically legal but morally wrong scenarios, or vice-versa. A "Judge" agent resolves conflicts by patching the code, with resolved cases becoming binding precedents. Unresolvable conflicts are escalated to the user, highlighting genuine moral dilemmas and driving iterative refinement.

Quick Start & Requirements

  • Prerequisites: Python 3.12+ and an Anthropic API key.
  • Installation: Clone repo (git clone), cd loophole, uv sync.
  • Configuration: Set ANTHROPIC_API_KEY. config.yaml tunes models (e.g., claude-sonnet-4-20250514) and loop parameters.
  • Usage: uv run python -m loophole.main new --domain <domain> -p <principles_file.txt>. Sessions auto-save and resume. visualize generates an HTML report. Example principles in examples/privacy_principles.txt.

Highlighted Details

  • Adversarial Simulation: Distinct AI agents simulate real-world legal challenges, uncovering edge cases and unintended consequences.
  • Precedent-Based Refinement: Resolved cases integrate as permanent constraints, creating an evolving test suite for the legal code.
  • User-Escalated Dilemmas: Focuses on surfacing and resolving genuine, irreconcilable tensions within the user's stated moral framework.
  • Code Evolution Visualization: Provides HTML reports detailing legal code changes and a timeline of adversarial cases.

Maintenance & Community

The README does not detail specific contributors, community channels, or a public roadmap.

Licensing & Compatibility

The README does not specify a software license, creating ambiguity regarding usage rights, modification, and distribution, especially for commercial applications.

Limitations & Caveats

Functionality depends on an Anthropic API key, introducing potential costs and external service dependency. The quality and specificity of user-provided moral principles directly impact effectiveness; vague inputs yield less meaningful results. The lack of a defined license is a substantial adoption barrier.

Health Check
Last Commit

11 hours ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
0
Star History
348 stars in the last 7 days

Explore Similar Projects

Feedback? Help us improve.