Awesome-Indonesia-NLP  by irfnrdh

NLP resource list for Bahasa Indonesia

created 5 years ago
270 stars

Top 95.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated collection of resources for Natural Language Processing (NLP) specifically for the Indonesian language. It targets researchers, developers, and students working with Bahasa Indonesia, providing a centralized hub for datasets, academic papers, software, and tutorials to accelerate Indonesian NLP development.

How It Works

The project functions as a meta-resource, aggregating links and information from various sources. It categorizes resources by NLP task (e.g., summarization, parsing, sentiment analysis) and resource type (datasets, papers, software). This structured approach allows users to quickly find relevant materials without extensive searching across disparate platforms.

Quick Start & Requirements

  • Installation: No direct installation is required as this is a curated list of resources. Users will need to follow individual links to access datasets, papers, or software.
  • Prerequisites: Access to the internet is required. Specific software or libraries mentioned (e.g., Sastrawi, spaCy) will have their own installation and dependency requirements.
  • Links:

Highlighted Details

  • Extensive collection of Indonesian NLP datasets, including news articles, sentiment data, NER-tagged data, and speech corpora.
  • Curated list of academic papers and theses on Indonesian NLP topics, spanning various techniques and applications.
  • Links to relevant software and libraries, such as Sastrawi for stemming and Kateglo for Indonesian dictionaries.
  • Categorized list of NLP courses, tutorials, and code samples for practical learning.

Maintenance & Community

The repository is community-driven, with contributions encouraged via pull requests. It includes a FAQ section to address common queries.

Licensing & Compatibility

The repository itself is licensed under the MIT License, allowing for broad use and modification. However, users must adhere to the specific licenses of the individual resources linked within the repository, which may vary.

Limitations & Caveats

This is a curated list and not a functional library; users must individually acquire and integrate the resources. Some linked datasets or software may have specific usage restrictions (e.g., academic/non-commercial use for the TITML-IDN speech corpus). The project's scope is limited to Indonesian NLP, and its maintenance depends on community contributions.

Health Check
Last commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.