jade-db  by whitzard-ai

LLM safety evaluation datasets for red-teaming

Created 2 years ago
487 stars

Top 63.3% on SourcePulse

GitHubView on GitHub
Project Summary

JADE-DB is a curated dataset designed for the targeted safety evaluation of Large Language Models (LLMs), particularly focusing on Chinese open-source and international commercial models. It addresses the need for robust safety testing by transforming low-trigger-rate seed questions into high-risk queries through linguistic variation, covering categories like core values, illegal activities, rights infringement, and discrimination.

How It Works

The project employs linguistic variation techniques to automatically generate a diverse set of high-risk test cases from a smaller set of seed prompts. This approach aims to create natural language datasets that are effective in probing LLM safety alignment across various categories and sub-categories, enhancing the scalability and comprehensiveness of safety evaluations.

Quick Start & Requirements

  • Data Access: Datasets are available as CSV files (e.g., jade_benchmark_easy_zh.csv, jade_benchmark_medium_zh.csv, jade_benchmark_zh.csv, jade_benchmark_en.csv).
  • Demo: Interactive demos are available at https://whitzard-ai.github.io/jade-demo.html.
  • Contact: For politically sensitive test cases, contact mi_zhang@fudan.edu.cn.

Highlighted Details

  • JADE 5.0: Text-to-Image model safety evaluation.
  • JADE 4.0: Safety regulation RAG.
  • JADE 3.0: LLM safety alignment.
  • Includes datasets for both Chinese and English LLMs.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state a license for the datasets.
  • Users are encouraged to cite the technical report.

Limitations & Caveats

  • Publicly available datasets exclude politically sensitive test cases due to regulations.
  • Datasets contain examples of harmful and illegal content for evaluation purposes.
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luca Soldaini Luca Soldaini(Research Scientist at Ai2), and
7 more.

hh-rlhf by anthropics

0%
2k
RLHF dataset for training safe AI assistants
Created 3 years ago
Updated 6 months ago
Feedback? Help us improve.