jade-db by whitzard-ai

LLM safety evaluation datasets for red-teaming

Created 2 years ago

516 stars

Top 60.0% on SourcePulse

Project Summary

JADE-DB is a curated dataset designed for the targeted safety evaluation of Large Language Models (LLMs), particularly focusing on Chinese open-source and international commercial models. It addresses the need for robust safety testing by transforming low-trigger-rate seed questions into high-risk queries through linguistic variation, covering categories like core values, illegal activities, rights infringement, and discrimination.

How It Works

The project employs linguistic variation techniques to automatically generate a diverse set of high-risk test cases from a smaller set of seed prompts. This approach aims to create natural language datasets that are effective in probing LLM safety alignment across various categories and sub-categories, enhancing the scalability and comprehensiveness of safety evaluations.

Quick Start & Requirements

Data Access: Datasets are available as CSV files (e.g., jade_benchmark_easy_zh.csv, jade_benchmark_medium_zh.csv, jade_benchmark_zh.csv, jade_benchmark_en.csv).
Demo: Interactive demos are available at https://whitzard-ai.github.io/jade-demo.html.
Contact: For politically sensitive test cases, contact mi_zhang@fudan.edu.cn.

Highlighted Details

JADE 5.0: Text-to-Image model safety evaluation.
JADE 4.0: Safety regulation RAG.
JADE 3.0: LLM safety alignment.
Includes datasets for both Chinese and English LLMs.

Maintenance & Community

Project is led by Fudan University BaiZe Intelligent.
Technical reports available in Chinese and English.
Project website: https://whitzard-ai.github.io/jade.html.

Licensing & Compatibility

The README does not explicitly state a license for the datasets.
Users are encouraged to cite the technical report.

Limitations & Caveats

Publicly available datasets exclude politically sensitive test cases due to regulations.
Datasets contain examples of harmful and illegal content for evaluation purposes.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

SafetyBench by thu-coai

A benchmark for LLM safety

Created 2 years ago

Updated 11 months ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

2 more.

llm_rules by normster

Benchmark for evaluating LLM rule-following capabilities

Created 2 years ago

Updated 1 year ago

PrimeVul by DLVulDet

Dataset and tools for code vulnerability detection

Created 2 years ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Nathan Lambert

Nathan Lambert(Research Scientist at AI2).

do-not-answer by Libr-AI

Dataset for evaluating LLM safety mechanisms

Created 2 years ago

Updated 2 years ago

langtest by PacificAI

NLP testing SDK for model safety and effectiveness

Created 3 years ago

Updated 5 days ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

TrustLLM by HowieHwong

Trustworthiness benchmark for large language models (ICML 2024)

Created 2 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Safety-Prompts by thu-coai

Chinese safety prompts for LLM evaluation/alignment

Created 3 years ago

Updated 2 years ago

Starred by

Jinze Bai

Jinze Bai(Research Scientist at Alibaba Qwen).

Generalization-Causality by yfzhang114

Reading list for domain generalization, adaptation, causality, robustness, etc

Created 5 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

detoxify by unitaryai

Trained models for toxic comment classification

Created 5 years ago

Updated 5 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2), and

7 more.

hh-rlhf by anthropics

RLHF dataset for training safe AI assistants

Created 4 years ago

Updated 1 year ago

bugbug by mozilla

ML platform for software engineering tasks

Created 8 years ago

Updated 1 day ago

Spiritual-Spell-Red-Teaming by Goochbeater

LLM security research through advanced adversarial prompting

Created 6 months ago

Updated 1 day ago

Feedback? Help us improve.