FinGLM  by MetaGLM

FinGLM: Open-source project for AI in finance

created 1 year ago
2,032 stars

Top 22.3% on sourcepulse

GitHubView on GitHub
Project Summary

FinGLM is an open-source, public-benefit project aiming to advance "AI + Finance" by building a persistent financial large model. It focuses on enabling expert-level financial analysis of company annual reports through conversational AI, targeting developers and researchers interested in financial data analysis and LLM applications.

How It Works

The project processes company annual reports by converting PDFs to TXT, splitting data into categories (basic info, financial data, comprehensive info), performing calculations (e.g., cost rates, growth rates), and storing it in SQL, Mongo, or ES databases. Models are then fine-tuned using strategies like ptuningv2 or LoRA. The Q&A process involves user input, prompt generation, database querying, and answer synthesis, leveraging LLMs for information extraction and response generation.

Quick Start & Requirements

  • Dataset Download:
    • PDFs (69GB): git clone http://www.modelscope.cn/datasets/modelscope/chatglm_llm_fintech_raw_dataset.git (requires Git LFS)
    • TXTs: wget https://sail-moe.oss-cn-hangzhou.aliyuncs.com/open_data/hackathon_chatglm_fintech/alltxt.zip
    • HTMLs: wget https://sail-moe.oss-cn-hangzhou.aliyuncs.com/open_data/hackathon_chatglm_fintech/allhtml.zip
  • ModelScope SDK: pip3 install "modelscope==1.7.2rc0"
  • Datasets Library: pip3 install datasets==2.13.0
  • Prerequisites: Python, Git LFS, ModelScope SDK, Datasets library.

Highlighted Details

  • Provides 70GB+ of annual report data (2019-2021) and 10,000+ manually annotated Q&A pairs.
  • Includes code and models from multiple winning teams of the SMP 2023 ChatGLM Financial Large Model Challenge.
  • Offers comprehensive learning tutorials covering data preprocessing, database usage, GLM utilization, prompt engineering, and model fine-tuning.
  • A dedicated fund of ¥100,000 and compute resources are available for project development.

Maintenance & Community

The project is maintained by a collective of teams and individuals, with active community engagement encouraged through application to join and regular communication channels. Resources for learning and contribution are readily available.

Licensing & Compatibility

The project itself appears to be open for research and non-commercial use. However, users must adhere to the specific licenses of underlying models, such as ChatGLM-6B, when using them for commercial purposes.

Limitations & Caveats

The project is primarily intended for research and non-commercial use. Commercial application is not recommended without careful adherence to the licensing terms of individual components, particularly the base LLMs.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
79 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.