NLP_bahasa_resources  by louisowen6

Curated list of NLP datasets/libraries for Bahasa Indonesia

created 5 years ago
539 stars

Top 59.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated index of datasets, libraries, and pre-trained models for Natural Language Processing (NLP) tasks specifically in Bahasa Indonesia. It targets researchers, developers, and practitioners working with Indonesian language data, aiming to streamline the discovery and utilization of relevant resources.

How It Works

The project functions as a directory, aggregating links to various GitHub repositories, Hugging Face datasets, academic papers, and other online resources. It categorizes these links by NLP task (e.g., Named Entity Recognition, Sentiment Analysis, Text Summarization) and resource type (e.g., Corpus, Pre-trained Models, Usable Libraries), providing a structured overview of the Indonesian NLP ecosystem.

Quick Start & Requirements

This repository is a collection of links and does not have a direct installation or execution command. Users need to navigate to the linked resources to download datasets or install libraries.

Highlighted Details

  • Extensive coverage across numerous NLP tasks relevant to Bahasa Indonesia.
  • Includes links to both raw datasets and pre-trained models like Indo-BERT.
  • Features pointers to usable libraries and APIs for Indonesian NLP, such as Pujangga and Sastrawi.
  • Provides resources for data scraping (Twitter) and spelling correction.

Maintenance & Community

The last update mentioned is March 15, 2022. The repository lists several GitHub contributors and links to related "Awesome" lists for Indonesian NLP, indicating community interest.

Licensing & Compatibility

The licensing varies by the linked resources. Users must consult the individual repositories for specific license terms. Compatibility for commercial use depends on the licenses of the referenced datasets and libraries.

Limitations & Caveats

The repository is a curated list and not an integrated toolkit; users must manage dependencies and integration of individual resources. The last update was in March 2022, meaning newer resources may not be included.

Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.