LLM safety evaluation datasets for red-teaming
Top 69.6% on sourcepulse
JADE-DB is a curated dataset designed for the targeted safety evaluation of Large Language Models (LLMs), particularly focusing on Chinese open-source and international commercial models. It addresses the need for robust safety testing by transforming low-trigger-rate seed questions into high-risk queries through linguistic variation, covering categories like core values, illegal activities, rights infringement, and discrimination.
How It Works
The project employs linguistic variation techniques to automatically generate a diverse set of high-risk test cases from a smaller set of seed prompts. This approach aims to create natural language datasets that are effective in probing LLM safety alignment across various categories and sub-categories, enhancing the scalability and comprehensiveness of safety evaluations.
Quick Start & Requirements
jade_benchmark_easy_zh.csv
, jade_benchmark_medium_zh.csv
, jade_benchmark_zh.csv
, jade_benchmark_en.csv
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive