BBT-FinCUGE-Applications  by supersymmetry-technologies

NLP benchmark for Chinese financial domain tasks

created 3 years ago
271 stars

Top 95.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides resources for advancing Natural Language Processing (NLP) in the Chinese financial domain. It addresses the need for larger, more diverse datasets and standardized benchmarks, offering a comprehensive solution for researchers and practitioners aiming to improve financial NLP model performance.

How It Works

The project introduces BBT-FinCorpus, a 300GB corpus of Chinese financial text from diverse sources like company announcements, research reports, financial news, and social media. It also presents BBT-FinT5, a 1-billion parameter knowledge-enhanced pre-trained language model built on the T5 architecture, utilizing a novel triplet masking approach for improved entity knowledge understanding. Finally, it offers CFLEB, the first benchmark for Chinese financial NLP, comprising six tasks (summarization, QA, classification, relation extraction, sentiment analysis, negative news detection) to facilitate standardized model evaluation.

Quick Start & Requirements

  • BBT-FinCorpus: Request access by emailing model@ssymmetry.com with subject "BBT-FinCorpus-{base or large}" and providing identity, affiliation, and usage.
  • BBT-FinT5 Models: Available in the Model folder of the GitHub repository.
  • CFLEB Benchmark: Datasets for each task are detailed within the README.

Highlighted Details

  • BBT-FinCorpus is the largest open-source Chinese financial corpus to date.
  • BBT-FinT5 leverages DeepSpeed for training acceleration and BFLOAT16 for stability.
  • CFLEB includes multiple leaderboards for comprehensive model evaluation across understanding and generation tasks.
  • The project aims to bridge the gap between academic research and practical NLP applications in finance.

Maintenance & Community

The project is associated with supersymmetry-technologies. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Users should verify licensing for BBT-FinCorpus and BBT-FinT5 models before commercial use.

Limitations & Caveats

The project is research-oriented, and the full corpus and models require explicit request. The README mentions an earlier version of the benchmark (FinCUGE) with different tasks, now superseded by CFLEB.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.