legal-ml-datasets by neelguha

Datasets for legal machine learning tasks

Created 5 years ago

439 stars

Top 67.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shengjia Zhao

Chief Scientist at Meta Superintelligence Lab

Project Summary

This repository serves as a curated index of datasets and tasks for machine learning applications within the legal domain. It aims to consolidate resources for researchers and practitioners working on legal NLP, offering a centralized point for discovering and accessing relevant data for tasks like argument mining, statutory reasoning, and contract review.

How It Works

The project functions as a living document, aggregating links and descriptions of various legal datasets. It categorizes these resources by the specific legal NLP tasks they support, such as case summarization, legal judgment prediction, and contract understanding. The collection emphasizes datasets annotated by legal experts, highlighting their utility for training and evaluating models on nuanced legal reasoning.

Quick Start & Requirements

This repository is a collection of pointers to external datasets. No installation is required to browse the collection. Links to official dataset pages, documentation, and associated research papers are provided for each entry.

Highlighted Details

Comprehensive coverage of legal NLP tasks, from statutory reasoning to contract analysis.
Inclusion of datasets with expert-level annotations, crucial for high-stakes legal applications.
Emphasis on multilingual legal datasets, addressing the need for cross-lingual legal AI.
Links to benchmarks and evaluation results for various legal NLP models.

Maintenance & Community

The repository is maintained by Neel Guha and encourages community contributions via pull requests or direct contact. The project is actively updated, reflecting ongoing research in legal AI.

Licensing & Compatibility

Licensing varies by individual dataset. Users must consult the specific license for each dataset they intend to use. Compatibility for commercial use or closed-source linking depends on the terms of each dataset's license.

Limitations & Caveats

This repository is an index and does not host the datasets themselves. Users are responsible for accessing and managing each dataset according to its respective terms and availability. Some datasets may have restricted access or specific usage requirements.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days