BERT model for financial NLP tasks
Top 47.7% on sourcepulse
FinBERT is an open-source, BERT-based pre-trained language model specifically designed for the financial domain in Chinese. It aims to improve performance on various financial Natural Language Processing (NLP) tasks, such as text classification, named entity recognition, and sentiment analysis, by leveraging a large corpus of financial data. The target audience includes NLP engineers and researchers working in financial technology.
How It Works
FinBERT utilizes the standard BERT architecture (Base and Large versions) but is pre-trained on a massive 3 billion token corpus comprising financial news, research reports, company announcements, and financial encyclopedic entries. Key improvements over general-purpose BERT models include the use of Financial Whole Word Masking (FWWM), which masks entire financial terms rather than sub-word units, and the incorporation of supervised tasks like report industry classification and financial entity recognition during pre-training. This domain-specific training allows FinBERT to capture nuanced financial language and concepts more effectively.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is developed by 熵简科技 AI Lab. Contact email: liyu@entropyreduce.com. Future versions (FinBERT 2.0 & 3.0) are planned.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is focused on Chinese financial text. The README does not specify a license, which may impact commercial adoption. Model availability is via direct download links, not a package manager.
4 weeks ago
Inactive