Curated list of NLP datasets/libraries for Bahasa Indonesia
Top 59.7% on sourcepulse
This repository serves as a comprehensive, curated index of datasets, libraries, and pre-trained models for Natural Language Processing (NLP) tasks specifically in Bahasa Indonesia. It targets researchers, developers, and practitioners working with Indonesian language data, aiming to streamline the discovery and utilization of relevant resources.
How It Works
The project functions as a directory, aggregating links to various GitHub repositories, Hugging Face datasets, academic papers, and other online resources. It categorizes these links by NLP task (e.g., Named Entity Recognition, Sentiment Analysis, Text Summarization) and resource type (e.g., Corpus, Pre-trained Models, Usable Libraries), providing a structured overview of the Indonesian NLP ecosystem.
Quick Start & Requirements
This repository is a collection of links and does not have a direct installation or execution command. Users need to navigate to the linked resources to download datasets or install libraries.
Highlighted Details
Maintenance & Community
The last update mentioned is March 15, 2022. The repository lists several GitHub contributors and links to related "Awesome" lists for Indonesian NLP, indicating community interest.
Licensing & Compatibility
The licensing varies by the linked resources. Users must consult the individual repositories for specific license terms. Compatibility for commercial use depends on the licenses of the referenced datasets and libraries.
Limitations & Caveats
The repository is a curated list and not an integrated toolkit; users must manage dependencies and integration of individual resources. The last update was in March 2022, meaning newer resources may not be included.
2 years ago
1+ week