legalbench  by HazyResearch

Legal reasoning benchmark for evaluating LLMs

created 2 years ago
459 stars

Top 66.9% on sourcepulse

GitHubView on GitHub
Project Summary

LegalBench is an open-science initiative to create and maintain a comprehensive benchmark for evaluating the legal reasoning capabilities of large language models. It targets researchers and practitioners in AI and law, aiming to drive innovation in legal NLP and assess the safety and reliability of LLMs in legal contexts.

How It Works

LegalBench comprises 162 distinct tasks, each with an associated dataset of input-output pairs designed to test specific legal reasoning skills. Tasks are sourced through a crowd-sourcing effort involving legal professionals and academics, ensuring coverage of diverse legal domains, text types, and reasoning challenges. LLMs are evaluated by measuring their accuracy in generating the correct output for given legal inputs.

Quick Start & Requirements

Highlighted Details

  • 162 tasks curated from 40 contributors, covering a wide spectrum of legal reasoning.
  • Tasks are designed to reflect real-world legal processes and academic assessments.
  • Encourages community contributions to expand the benchmark's scope and relevance.
  • Aims to inspire new algorithmic innovations through the unique challenges posed by legal text.

Maintenance & Community

The project is an ongoing effort with active community involvement. Contact information for questions and contributions is provided. Links to related projects and research papers are also available.

Licensing & Compatibility

LegalBench is a collection of datasets with varying licenses. Users are instructed to adhere to the specific license of each dataset creator. A notebook is provided to help select tasks based on license information.

Limitations & Caveats

The benchmark's composition is subject to the ongoing crowd-sourcing effort, meaning its scope and coverage will evolve. Users must manage the licensing complexities of the individual datasets within the benchmark.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
42 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.