Persian language model based on Google's BERT architecture
Top 75.8% on sourcepulse
ParsBERT is a Transformer-based language model specifically designed for Persian natural language understanding tasks. It offers pre-trained models for sentiment analysis, text classification, and named entity recognition, outperforming existing Persian NLP models and multilingual alternatives.
How It Works
ParsBERT is built upon Google's BERT architecture and pre-trained on a massive Persian corpus exceeding 3.9 million documents. The training process involved extensive pre-processing, including POS tagging and WordPiece segmentation, to handle the nuances of the Persian language, particularly the zero-width non-joiner (ZWNJ) character. This approach ensures robust performance across various downstream NLP tasks.
Quick Start & Requirements
from transformers import AutoTokenizer, AutoModel
"HooshvareLab/bert-fa-zwnj-base"
(for v3.0)transformers
library.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's latest release (v3.0) was in 2021, indicating potential for newer developments or research not yet incorporated. While benchmarks are provided, specific hardware requirements for fine-tuning or running larger models are not detailed.
2 years ago
Inactive