awsome-vietnamese-nlp  by vndee

NLP resources for Vietnamese

created 5 years ago
270 stars

Top 95.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of resources for Vietnamese Natural Language Processing (NLP), targeting researchers and developers working with the Vietnamese language. It provides a comprehensive overview of pre-trained models, datasets, and toolkits, aiming to accelerate development and research in Vietnamese NLP.

How It Works

The collection is organized into categories such as Large Language Models, Corpus, Text Processing Toolkits, Pre-trained Language Models, Sentiment Analysis, Named Entity Recognition, and Speech Processing. It lists various models and datasets, often with links to their respective repositories or papers, and includes benchmark results for sentiment analysis and named entity recognition tasks.

Quick Start & Requirements

This is a curated list, not a runnable project. To use any of the listed resources, users must refer to the individual project links provided within the README for installation and usage instructions. Requirements vary significantly per resource, ranging from standard Python environments to specific deep learning frameworks and hardware (e.g., GPUs for LLMs).

Highlighted Details

  • Features a wide array of Vietnamese-specific LLMs like PhoGPT, SeaLLM, and VinaLlaMA.
  • Includes extensive datasets for various NLP tasks, from general text corpora like VN News Corpus to specialized sentiment analysis and NER datasets.
  • Lists multiple Vietnamese NLP toolkits (e.g., VnCoreNLP, pyvi, underthesea) for tasks like tokenization, POS tagging, and dependency parsing.
  • Provides benchmark results for Sentiment Analysis and Named Entity Recognition tasks, comparing various models and approaches.

Maintenance & Community

The project is community-driven, encouraging contributions via pull requests or issues. Specific maintainers or community channels are not explicitly detailed, but the nature of the list suggests ongoing community input.

Licensing & Compatibility

Licensing varies by individual resource. Users must consult the license of each specific model, dataset, or toolkit. Compatibility for commercial use or closed-source linking depends entirely on the licenses of the individual components.

Limitations & Caveats

This is a directory of resources, not a unified framework. Users need to integrate and manage individual components themselves. Some listed resources may be outdated or have limited community support.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.