SafetyBench by thu-coai

A benchmark for LLM safety

Created 2 years ago

272 stars

Top 94.9% on SourcePulse

Project Summary

SafetyBench is a comprehensive benchmark designed to evaluate the safety of Large Language Models (LLMs). It addresses the need for standardized, multilingual safety assessment by providing over 11,000 multiple-choice questions across seven distinct safety categories. This resource benefits LLM developers and researchers by offering a structured methodology for identifying and mitigating safety risks in AI systems.

How It Works

The benchmark comprises 11,435 multiple-choice questions covering seven safety categories, available in both Chinese and English. Evaluations can be conducted in zero-shot or five-shot settings, with predicted answers extracted from model responses. SafetyBench is designed to be less reasoning-intensive than general capability benchmarks like MMLU, focusing directly on safety-related outputs.

Quick Start & Requirements

Data can be downloaded via download_data.sh or download_data.py scripts, hosted on Hugging Face. Evaluation involves running provided example code, with submission requiring a UTF-8 encoded JSON file. Specific Python versions or hardware requirements are not detailed in the README. Links to the website, Hugging Face, and the paper are provided for further details.

Highlighted Details

Features 11,435 questions across 7 safety categories.
Supports both Chinese and English language evaluations.
Integrated into the SuperBench platform.
Test answers have been fully open-sourced as of July 2025.
Accepted for publication at ACL 2024.

Maintenance & Community

The project has seen recent updates and was accepted to ACL 2024, indicating active development. Leaderboards are maintained on the project website. No specific community channels (e.g., Discord, Slack) or major sponsorships are mentioned in the README.

Licensing & Compatibility

The provided README does not specify a software license. This lack of explicit licensing information may pose compatibility concerns for commercial use or integration into proprietary systems.

Limitations & Caveats

The benchmark's design is noted as less reasoning-intensive, which might limit its applicability for evaluating safety aspects deeply tied to complex reasoning capabilities. No other specific limitations, known bugs, or platform incompatibilities are detailed.

SafetyBench by thu-coai

Explore Similar Projects

auto-j by GAIR-NLP

do-not-answer by Libr-AI

AlignBench by THUDM

Qwen3Guard by QwenLM

LLMs-Finetuning-Safety by LLM-Tuning-Safety

jade-db by whitzard-ai

bench by arthur-ai

Awesome-LLMs-Evaluation-Papers by tjunlp-lab

PandaLM by WeOpenML

Safety-Prompts by thu-coai

evaluation-guidebook by huggingface

deepeval by confident-ai