Portuguese-NLP  by ajdavidl

NLP resources and tools focused on Portuguese

created 3 years ago
286 stars

Top 92.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, community-driven catalog of Natural Language Processing (NLP) resources specifically curated for the Portuguese language. It aims to consolidate datasets, lexicons, pre-trained models, and tools, providing a centralized hub for researchers and developers working with Portuguese NLP tasks.

How It Works

The project functions as a curated list, meticulously organized into categories such as Datasets, Lexicons, Models, Frameworks, and Tools. It aggregates links to Hugging Face, GitHub repositories, and other relevant sources, facilitating discovery and access to a wide array of Portuguese NLP assets. The emphasis is on providing a broad overview of available resources, from foundational datasets to state-of-the-art language models.

Quick Start & Requirements

This repository is a list of resources, not a runnable software package. To utilize the listed resources, users will need to follow the installation and usage instructions specific to each individual tool or dataset, typically found on their respective GitHub or Hugging Face pages.

Highlighted Details

  • Extensive collection of over 150 datasets, covering diverse domains like legal texts, news, social media, and literature.
  • A wide range of pre-trained models, including BERT, GPT, RoBERTa, and T5 variants specifically for Brazilian and European Portuguese.
  • Includes lexicons, word embeddings, and metrics tailored for Portuguese NLP tasks.
  • Features a leaderboard for evaluating Portuguese LLMs.

Maintenance & Community

The project is community-driven, with contributions likely from various researchers and institutions in Portuguese NLP. Specific maintainer details or community links (e.g., Discord, Slack) are not explicitly provided in the README.

Licensing & Compatibility

The licensing varies significantly as this is a curated list of external resources. Users must consult the individual licenses of each dataset, model, or tool to ensure compatibility with their intended use, especially for commercial applications.

Limitations & Caveats

As a curated list, the repository itself does not provide direct functionality. Users are responsible for navigating to and managing each individual resource. The quality and maintenance status of listed resources may vary, requiring user due diligence.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.