NLP benchmark for Chinese financial domain tasks
Top 95.8% on sourcepulse
This repository provides resources for advancing Natural Language Processing (NLP) in the Chinese financial domain. It addresses the need for larger, more diverse datasets and standardized benchmarks, offering a comprehensive solution for researchers and practitioners aiming to improve financial NLP model performance.
How It Works
The project introduces BBT-FinCorpus, a 300GB corpus of Chinese financial text from diverse sources like company announcements, research reports, financial news, and social media. It also presents BBT-FinT5, a 1-billion parameter knowledge-enhanced pre-trained language model built on the T5 architecture, utilizing a novel triplet masking approach for improved entity knowledge understanding. Finally, it offers CFLEB, the first benchmark for Chinese financial NLP, comprising six tasks (summarization, QA, classification, relation extraction, sentiment analysis, negative news detection) to facilitate standardized model evaluation.
Quick Start & Requirements
Model
folder of the GitHub repository.Highlighted Details
Maintenance & Community
The project is associated with supersymmetry-technologies. Further community or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
The README does not specify a license. Users should verify licensing for BBT-FinCorpus and BBT-FinT5 models before commercial use.
Limitations & Caveats
The project is research-oriented, and the full corpus and models require explicit request. The README mentions an earlier version of the benchmark (FinCUGE) with different tasks, now superseded by CFLEB.
2 years ago
Inactive