NLP_bahasa_resources by louisowen6

Curated list of NLP datasets/libraries for Bahasa Indonesia

Created 6 years ago

575 stars

Top 55.5% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated index of datasets, libraries, and pre-trained models for Natural Language Processing (NLP) tasks specifically in Bahasa Indonesia. It targets researchers, developers, and practitioners working with Indonesian language data, aiming to streamline the discovery and utilization of relevant resources.

How It Works

The project functions as a directory, aggregating links to various GitHub repositories, Hugging Face datasets, academic papers, and other online resources. It categorizes these links by NLP task (e.g., Named Entity Recognition, Sentiment Analysis, Text Summarization) and resource type (e.g., Corpus, Pre-trained Models, Usable Libraries), providing a structured overview of the Indonesian NLP ecosystem.

Quick Start & Requirements

This repository is a collection of links and does not have a direct installation or execution command. Users need to navigate to the linked resources to download datasets or install libraries.

Highlighted Details

Extensive coverage across numerous NLP tasks relevant to Bahasa Indonesia.
Includes links to both raw datasets and pre-trained models like Indo-BERT.
Features pointers to usable libraries and APIs for Indonesian NLP, such as Pujangga and Sastrawi.
Provides resources for data scraping (Twitter) and spelling correction.

Maintenance & Community

The last update mentioned is March 15, 2022. The repository lists several GitHub contributors and links to related "Awesome" lists for Indonesian NLP, indicating community interest.

Licensing & Compatibility

The licensing varies by the linked resources. Users must consult the individual repositories for specific license terms. Compatibility for commercial use depends on the licenses of the referenced datasets and libraries.

Limitations & Caveats

The repository is a curated list and not an integrated toolkit; users must manage dependencies and integration of individual resources. The last update was in March 2022, meaning newer resources may not be included.

NLP_bahasa_resources by louisowen6

Explore Similar Projects

awesome-hungarian-nlp by oroszgy

Portuguese-NLP by ajdavidl

awesome-japanese-nlp-resources by taishi-i

awsome-vietnamese-nlp by vndee

German-NLP by adbar

Awesome-Indonesia-NLP by irfnrdh

NLP-Resources by jia-zh

indicnlp_catalog by AI4Bharat

awesome-bangla by banglakit

The-NLP-Pandect by ivan-bilan

nlp by makcedward

awesome-nlp by keon