Discover and explore top open-source AI tools and projects—updated daily.
A benchmark for LLM safety
Top 100.0% on SourcePulse
SafetyBench is a comprehensive benchmark designed to evaluate the safety of Large Language Models (LLMs). It addresses the need for standardized, multilingual safety assessment by providing over 11,000 multiple-choice questions across seven distinct safety categories. This resource benefits LLM developers and researchers by offering a structured methodology for identifying and mitigating safety risks in AI systems.
How It Works
The benchmark comprises 11,435 multiple-choice questions covering seven safety categories, available in both Chinese and English. Evaluations can be conducted in zero-shot or five-shot settings, with predicted answers extracted from model responses. SafetyBench is designed to be less reasoning-intensive than general capability benchmarks like MMLU, focusing directly on safety-related outputs.
Quick Start & Requirements
Data can be downloaded via download_data.sh
or download_data.py
scripts, hosted on Hugging Face. Evaluation involves running provided example code, with submission requiring a UTF-8 encoded JSON file. Specific Python versions or hardware requirements are not detailed in the README. Links to the website, Hugging Face, and the paper are provided for further details.
Highlighted Details
Maintenance & Community
The project has seen recent updates and was accepted to ACL 2024, indicating active development. Leaderboards are maintained on the project website. No specific community channels (e.g., Discord, Slack) or major sponsorships are mentioned in the README.
Licensing & Compatibility
The provided README does not specify a software license. This lack of explicit licensing information may pose compatibility concerns for commercial use or integration into proprietary systems.
Limitations & Caveats
The benchmark's design is noted as less reasoning-intensive, which might limit its applicability for evaluating safety aspects deeply tied to complex reasoning capabilities. No other specific limitations, known bugs, or platform incompatibilities are detailed.
1 month ago
Inactive