JamesGPT  by jconorgrogan

Jailbreak prompt for eliciting LLM biases and beliefs

created 2 years ago
398 stars

Top 73.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a "jailbreak" prompt designed to elicit probabilistic assessments from large language models (LLMs) like ChatGPT. It aims to reveal potential biases and belief structures by framing queries as a market prediction system, encouraging the LLM to assign confidence levels to various assertions. The target audience includes researchers, developers, and users interested in understanding LLM behavior and alignment.

How It Works

The prompt frames the LLM as "JAMES" (Just Accurate Markets Estimation System), tasked with predicting the outcome of binary assertions. It simulates a future scenario where a perfect entity will verify these predictions. JAMES is instructed to assign probabilities (0.01 to 0.99) based on its training data and internal logic, aiming for accuracy and unbiased assessment. A unique element is the request for JAMES to predict the reproducibility of its own probabilistic assessments across 100 simulated sessions, adding a meta-cognitive layer to the evaluation.

Quick Start & Requirements

  • Access: The prompt can be directly used with ChatGPT (GPT-3.5 and GPT-4) by pasting it into the chat interface. An integrated version is available at https://chat.openai.com/g/g-jyQvGbOh1-jamesgpt.
  • Requirements: Access to ChatGPT or a compatible LLM. No specific software installation is needed beyond accessing the LLM service.

Highlighted Details

  • Enables prediction of LLM biases and belief structures.
  • Facilitates AI ethics and alignment testing through scenario prediction.
  • Encourages LLMs to provide justifications for their probabilistic assessments.
  • The prompt is designed to be flexible, allowing for multiple assertions or "markets" to be assessed simultaneously.

Maintenance & Community

The project appears to be a personal initiative by jconorgrogan. No specific community channels or active maintenance signals are evident in the README.

Licensing & Compatibility

The README does not specify a license. The prompt is designed for use with OpenAI's ChatGPT, subject to OpenAI's terms of service.

Limitations & Caveats

The prompt's effectiveness relies on the LLM's adherence to the persona and instructions, which can vary. The probabilistic outputs are directional and may exhibit inconsistencies due to the inherent nature of LLMs and prompt sensitivity. The "reproducibility" metric is a self-assessment by the LLM and may not reflect true experimental reproducibility.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.