Portuguese-NLP by ajdavidl

NLP resources and tools focused on Portuguese

Created 4 years ago

363 stars

Top 77.2% on SourcePulse

Project Summary

This repository serves as a comprehensive, community-driven catalog of Natural Language Processing (NLP) resources specifically curated for the Portuguese language. It aims to consolidate datasets, lexicons, pre-trained models, and tools, providing a centralized hub for researchers and developers working with Portuguese NLP tasks.

How It Works

The project functions as a curated list, meticulously organized into categories such as Datasets, Lexicons, Models, Frameworks, and Tools. It aggregates links to Hugging Face, GitHub repositories, and other relevant sources, facilitating discovery and access to a wide array of Portuguese NLP assets. The emphasis is on providing a broad overview of available resources, from foundational datasets to state-of-the-art language models.

Quick Start & Requirements

This repository is a list of resources, not a runnable software package. To utilize the listed resources, users will need to follow the installation and usage instructions specific to each individual tool or dataset, typically found on their respective GitHub or Hugging Face pages.

Highlighted Details

Extensive collection of over 150 datasets, covering diverse domains like legal texts, news, social media, and literature.
A wide range of pre-trained models, including BERT, GPT, RoBERTa, and T5 variants specifically for Brazilian and European Portuguese.
Includes lexicons, word embeddings, and metrics tailored for Portuguese NLP tasks.
Features a leaderboard for evaluating Portuguese LLMs.

Maintenance & Community

The project is community-driven, with contributions likely from various researchers and institutions in Portuguese NLP. Specific maintainer details or community links (e.g., Discord, Slack) are not explicitly provided in the README.

Licensing & Compatibility

The licensing varies significantly as this is a curated list of external resources. Users must consult the individual licenses of each dataset, model, or tool to ensure compatibility with their intended use, especially for commercial applications.

Limitations & Caveats

As a curated list, the repository itself does not provide direct functionality. Users are responsible for navigating to and managing each individual resource. The quality and maintenance status of listed resources may vary, requiring user due diligence.

Portuguese-NLP by ajdavidl

Explore Similar Projects

awesome-hungarian-nlp by oroszgy

corus by natasha

awesome-japanese-nlp-resources by taishi-i

awsome-vietnamese-nlp by vndee

German-NLP by adbar

nlp-cheat-sheet-python by janlukasschroeder

indicnlp_catalog by AI4Bharat

NLP_bahasa_resources by louisowen6

awesome-bangla by banglakit

The-NLP-Pandect by ivan-bilan

awesome-nlp by keon

funNLP by fighting41love