Jailbreak prompt for eliciting LLM biases and beliefs
Top 73.7% on sourcepulse
This repository provides a "jailbreak" prompt designed to elicit probabilistic assessments from large language models (LLMs) like ChatGPT. It aims to reveal potential biases and belief structures by framing queries as a market prediction system, encouraging the LLM to assign confidence levels to various assertions. The target audience includes researchers, developers, and users interested in understanding LLM behavior and alignment.
How It Works
The prompt frames the LLM as "JAMES" (Just Accurate Markets Estimation System), tasked with predicting the outcome of binary assertions. It simulates a future scenario where a perfect entity will verify these predictions. JAMES is instructed to assign probabilities (0.01 to 0.99) based on its training data and internal logic, aiming for accuracy and unbiased assessment. A unique element is the request for JAMES to predict the reproducibility of its own probabilistic assessments across 100 simulated sessions, adding a meta-cognitive layer to the evaluation.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project appears to be a personal initiative by jconorgrogan. No specific community channels or active maintenance signals are evident in the README.
Licensing & Compatibility
The README does not specify a license. The prompt is designed for use with OpenAI's ChatGPT, subject to OpenAI's terms of service.
Limitations & Caveats
The prompt's effectiveness relies on the LLM's adherence to the persona and instructions, which can vary. The probabilistic outputs are directional and may exhibit inconsistencies due to the inherent nature of LLMs and prompt sensitivity. The "reproducibility" metric is a self-assessment by the LLM and may not reflect true experimental reproducibility.
1 year ago
1 day