Datasets for legal machine learning tasks
Top 74.3% on sourcepulse
This repository serves as a curated index of datasets and tasks for machine learning applications within the legal domain. It aims to consolidate resources for researchers and practitioners working on legal NLP, offering a centralized point for discovering and accessing relevant data for tasks like argument mining, statutory reasoning, and contract review.
How It Works
The project functions as a living document, aggregating links and descriptions of various legal datasets. It categorizes these resources by the specific legal NLP tasks they support, such as case summarization, legal judgment prediction, and contract understanding. The collection emphasizes datasets annotated by legal experts, highlighting their utility for training and evaluating models on nuanced legal reasoning.
Quick Start & Requirements
This repository is a collection of pointers to external datasets. No installation is required to browse the collection. Links to official dataset pages, documentation, and associated research papers are provided for each entry.
Highlighted Details
Maintenance & Community
The repository is maintained by Neel Guha and encourages community contributions via pull requests or direct contact. The project is actively updated, reflecting ongoing research in legal AI.
Licensing & Compatibility
Licensing varies by individual dataset. Users must consult the specific license for each dataset they intend to use. Compatibility for commercial use or closed-source linking depends on the terms of each dataset's license.
Limitations & Caveats
This repository is an index and does not host the datasets themselves. Users are responsible for accessing and managing each dataset according to its respective terms and availability. Some datasets may have restricted access or specific usage requirements.
1 year ago
Inactive