NLP resource list for Hungarian
Top 99.5% on sourcepulse
This repository is a curated list of open-source Natural Language Processing (NLP) resources specifically for the Hungarian language. It serves as a comprehensive catalog for researchers, developers, and students working with Hungarian text data, aiming to centralize tools, models, datasets, and learning materials.
How It Works
The list is organized into logical categories, covering the entire NLP pipeline from basic text processing (tokenization, morphology) to advanced tasks like named entity recognition, sentiment analysis, and machine translation. It highlights resources with features like ease of installation, commercial-friendly licenses, and availability of pre-trained models, providing quick indicators for adoption suitability.
Quick Start & Requirements
pip
(e.g., pip install huntoken
, pip install huspacy
).Highlighted Details
emMorph
, hunmorph
, and hunpos
.huBERT
, PULI-GPTrio
, and SambaLingo-Hungarian-Base
.Maintenance & Community
HuNLP Slack
) and relevant academic groups (e.g., BME, RIL-MTA) are provided.Licensing & Compatibility
Limitations & Caveats
The list is a curated collection, and the quality, maintenance status, and ease of use can vary significantly between individual resources. Users should independently verify the suitability and current state of each tool or dataset.
4 days ago
1 day