JamesGPT by jconorgrogan

Jailbreak prompt for eliciting LLM biases and beliefs

Created 2 years ago

397 stars

Top 72.7% on SourcePulse

Project Summary

This repository provides a "jailbreak" prompt designed to elicit probabilistic assessments from large language models (LLMs) like ChatGPT. It aims to reveal potential biases and belief structures by framing queries as a market prediction system, encouraging the LLM to assign confidence levels to various assertions. The target audience includes researchers, developers, and users interested in understanding LLM behavior and alignment.

How It Works

The prompt frames the LLM as "JAMES" (Just Accurate Markets Estimation System), tasked with predicting the outcome of binary assertions. It simulates a future scenario where a perfect entity will verify these predictions. JAMES is instructed to assign probabilities (0.01 to 0.99) based on its training data and internal logic, aiming for accuracy and unbiased assessment. A unique element is the request for JAMES to predict the reproducibility of its own probabilistic assessments across 100 simulated sessions, adding a meta-cognitive layer to the evaluation.

Quick Start & Requirements

Access: The prompt can be directly used with ChatGPT (GPT-3.5 and GPT-4) by pasting it into the chat interface. An integrated version is available at https://chat.openai.com/g/g-jyQvGbOh1-jamesgpt.
Requirements: Access to ChatGPT or a compatible LLM. No specific software installation is needed beyond accessing the LLM service.

Highlighted Details

Enables prediction of LLM biases and belief structures.
Facilitates AI ethics and alignment testing through scenario prediction.
Encourages LLMs to provide justifications for their probabilistic assessments.
The prompt is designed to be flexible, allowing for multiple assertions or "markets" to be assessed simultaneously.

Maintenance & Community

The project appears to be a personal initiative by jconorgrogan. No specific community channels or active maintenance signals are evident in the README.

Licensing & Compatibility

The README does not specify a license. The prompt is designed for use with OpenAI's ChatGPT, subject to OpenAI's terms of service.

Limitations & Caveats

The prompt's effectiveness relies on the LLM's adherence to the persona and instructions, which can vary. The probabilistic outputs are directional and may exhibit inconsistencies due to the inherent nature of LLMs and prompt sensitivity. The "reproducibility" metric is a self-assessment by the LLM and may not reflect true experimental reproducibility.

JamesGPT by jconorgrogan

Explore Similar Projects

LLM-for-misinformation-research by ICTMCG

Awesome-LLM-as-a-judge by llm-as-a-judge

MisguidedAttention by cpldcpu

fact-checker by jagilley

Awesome-LLM-Uncertainty-Reliability-Robustness by jxzhangjhu

PandaLM by WeOpenML

ARES by stanford-futuredata

Generalization-Causality by yfzhang114

awesome-llm-interpretability by JShollaj

uqlm by cvs-health

chain-of-thought-hub by FranxYao

interpretable_machine_learning_with_python by jphall663