BBT-FinCUGE-Applications by supersymmetry-technologies

NLP benchmark for Chinese financial domain tasks

Created 3 years ago

279 stars

Top 93.2% on SourcePulse

Project Summary

This repository provides resources for advancing Natural Language Processing (NLP) in the Chinese financial domain. It addresses the need for larger, more diverse datasets and standardized benchmarks, offering a comprehensive solution for researchers and practitioners aiming to improve financial NLP model performance.

How It Works

The project introduces BBT-FinCorpus, a 300GB corpus of Chinese financial text from diverse sources like company announcements, research reports, financial news, and social media. It also presents BBT-FinT5, a 1-billion parameter knowledge-enhanced pre-trained language model built on the T5 architecture, utilizing a novel triplet masking approach for improved entity knowledge understanding. Finally, it offers CFLEB, the first benchmark for Chinese financial NLP, comprising six tasks (summarization, QA, classification, relation extraction, sentiment analysis, negative news detection) to facilitate standardized model evaluation.

Quick Start & Requirements

BBT-FinCorpus: Request access by emailing model@ssymmetry.com with subject "BBT-FinCorpus-{base or large}" and providing identity, affiliation, and usage.
BBT-FinT5 Models: Available in the Model folder of the GitHub repository.
CFLEB Benchmark: Datasets for each task are detailed within the README.

Highlighted Details

BBT-FinCorpus is the largest open-source Chinese financial corpus to date.
BBT-FinT5 leverages DeepSpeed for training acceleration and BFLOAT16 for stability.
CFLEB includes multiple leaderboards for comprehensive model evaluation across understanding and generation tasks.
The project aims to bridge the gap between academic research and practical NLP applications in finance.

Maintenance & Community

The project is associated with supersymmetry-technologies. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Users should verify licensing for BBT-FinCorpus and BBT-FinT5 models before commercial use.

Limitations & Caveats

The project is research-oriented, and the full corpus and models require explicit request. The README mentions an earlier version of the benchmark (FinCUGE) with different tasks, now superseded by CFLEB.

BBT-FinCUGE-Applications by supersymmetry-technologies

Explore Similar Projects

FinQwen by Tongyi-EconML

FinLLMs by adlnlp

awesome-chinese-ner by taishan1994

bertNER by yumath

NER-BERT-pytorch by lemonhu

ditto by megagonlabs

PIXIU by The-FinAI

FinBERT by valuesimplex

FinBERT by yya518

Chinese-XLNet by ymcui

nlp_chinese_corpus by brightmart

text_classification by brightmart