Legal reasoning benchmark for evaluating LLMs
Top 66.9% on sourcepulse
LegalBench is an open-science initiative to create and maintain a comprehensive benchmark for evaluating the legal reasoning capabilities of large language models. It targets researchers and practitioners in AI and law, aiming to drive innovation in legal NLP and assess the safety and reliability of LLMs in legal contexts.
How It Works
LegalBench comprises 162 distinct tasks, each with an associated dataset of input-output pairs designed to test specific legal reasoning skills. Tasks are sourced through a crowd-sourcing effort involving legal professionals and academics, ensuring coverage of diverse legal domains, text types, and reasoning challenges. LLMs are evaluated by measuring their accuracy in generating the correct output for given legal inputs.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is an ongoing effort with active community involvement. Contact information for questions and contributions is provided. Links to related projects and research papers are also available.
Licensing & Compatibility
LegalBench is a collection of datasets with varying licenses. Users are instructed to adhere to the specific license of each dataset creator. A notebook is provided to help select tasks based on license information.
Limitations & Caveats
The benchmark's composition is subject to the ongoing crowd-sourcing effort, meaning its scope and coverage will evolve. Users must manage the licensing complexities of the individual datasets within the benchmark.
11 months ago
1 day