jade-db  by whitzard-ai

LLM safety evaluation datasets for red-teaming

created 1 year ago
434 stars

Top 69.6% on sourcepulse

GitHubView on GitHub
Project Summary

JADE-DB is a curated dataset designed for the targeted safety evaluation of Large Language Models (LLMs), particularly focusing on Chinese open-source and international commercial models. It addresses the need for robust safety testing by transforming low-trigger-rate seed questions into high-risk queries through linguistic variation, covering categories like core values, illegal activities, rights infringement, and discrimination.

How It Works

The project employs linguistic variation techniques to automatically generate a diverse set of high-risk test cases from a smaller set of seed prompts. This approach aims to create natural language datasets that are effective in probing LLM safety alignment across various categories and sub-categories, enhancing the scalability and comprehensiveness of safety evaluations.

Quick Start & Requirements

  • Data Access: Datasets are available as CSV files (e.g., jade_benchmark_easy_zh.csv, jade_benchmark_medium_zh.csv, jade_benchmark_zh.csv, jade_benchmark_en.csv).
  • Demo: Interactive demos are available at https://whitzard-ai.github.io/jade-demo.html.
  • Contact: For politically sensitive test cases, contact mi_zhang@fudan.edu.cn.

Highlighted Details

  • JADE 5.0: Text-to-Image model safety evaluation.
  • JADE 4.0: Safety regulation RAG.
  • JADE 3.0: LLM safety alignment.
  • Includes datasets for both Chinese and English LLMs.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state a license for the datasets.
  • Users are encouraged to cite the technical report.

Limitations & Caveats

  • Publicly available datasets exclude politically sensitive test cases due to regulations.
  • Datasets contain examples of harmful and illegal content for evaluation purposes.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
2 more.

hh-rlhf by anthropics

0.2%
2k
RLHF dataset for training safe AI assistants
created 3 years ago
updated 1 month ago
Feedback? Help us improve.