Collection of question answering datasets for NLP tasks
Top 72.6% on sourcepulse
This repository provides a curated collection of large-scale question answering (QA) datasets, primarily aimed at researchers and developers in Natural Language Processing (NLP). It offers a centralized resource for various QA tasks, including extractive, abstractive, and multi-hop reasoning, enabling the development and benchmarking of advanced QA models.
How It Works
The collection comprises datasets generated through diverse methodologies, ranging from crowd-sourced question formulation over text passages (SQuAD, QuAC, CoQA) to automated generation from knowledge bases (FreebaseQA, CFQ) and web data (WebQuestions, TriviaQA). This variety allows for training and evaluating QA systems on different data distributions and reasoning complexities.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
This repository acts as a curated index rather than an actively maintained project. The datasets themselves are maintained by their respective creators.
Licensing & Compatibility
Dataset licenses vary by source. Users must consult the individual dataset licenses for terms of use, redistribution, and commercial application.
Limitations & Caveats
The repository itself does not provide tools for data processing or model training; it is purely a collection of links and descriptions. Users are responsible for downloading, managing, and processing the data according to each dataset's specific license and format.
1 year ago
Inactive