Discover and explore top open-source AI tools and projects—updated daily.
This repository curates and organizes links to Korean language datasets, primarily aimed at researchers and developers building end-to-end NLP models. It serves as a centralized resource to simplify data acquisition and exploration for various Korean NLP tasks, from morphological analysis to machine translation and sentiment analysis.
How It Works
The project compiles links to a wide array of Korean text and speech datasets, categorizing them by task (e.g., named entity recognition, question answering, summarization) and providing details on their provider, documentation, license, and redistribution terms. It also includes information on data volume and language. The repository aims to facilitate easier access to these resources, enabling users to quickly identify and download relevant data for their NLP projects.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The repository has seen contributions and revisions, with significant updates noted in August 2020 and a move to the main repo in October 2020. It appears to be a community-driven effort to consolidate Korean language resources.
Licensing & Compatibility
Dataset licenses vary, including "rd" (Redistribution possible with or without modification), "no" (Redistribution not possible), and "unk" (Unknown). Users must check the specific license for each dataset to ensure compatibility with their intended use, especially for commercial applications.
Limitations & Caveats
Some datasets may have specific usage restrictions or require a formal application process, as indicated by terms like "academic use only" or the need for user registration and approval. The availability and format of data can also vary, with some links potentially leading to external sites requiring further steps.
11 months ago
Inactive