Discover and explore top open-source AI tools and projects—updated daily.
normsterBenchmark for evaluating LLM rule-following capabilities
Top 99.3% on SourcePulse
Summary
RuLES (Rule-following Language Evaluation Scenarios) is a benchmark designed to rigorously evaluate the rule-following capabilities of Large Language Models (LLMs). It addresses the critical need for understanding LLM reliability and safety by providing a systematic way to test adherence to instructions. This benchmark is valuable for researchers and developers seeking to quantify and improve LLM behavior.
How It Works
The project implements a comprehensive benchmark suite with diverse test cases targeting various aspects of rule-following. It supports evaluation of models via APIs (OpenAI, Anthropic, Google VertexAI) and locally hosted models using vLLM. The core approach involves running LLMs against these carefully crafted scenarios and analyzing their outputs to derive rule-following scores, offering a novel and precise method for assessing a key LLM limitation.
Quick Start & Requirements
pip install -e .. For API wrappers, use pip install -e .[models]..env file), local HuggingFace models (e.g., Llama-2, downloaded via snapshot_download), Python. GPU is required for vLLM evaluation and GCG attack.Highlighted Details
Maintenance & Community
The project shows active maintenance with multiple updates in 2024, including significant revisions to the benchmark and evaluation scripts. No specific community channels (e.g., Discord, Slack) are listed in the provided README text.
Licensing & Compatibility
The license type is not explicitly stated in the provided README text, which is a critical omission for assessing commercial use or closed-source linking compatibility.
Limitations & Caveats
1 year ago
Inactive
JinjieNi
mlfoundations
groq