FinNLP  by AI4Finance-Foundation

SDK for internet-scale financial data

Created 2 years ago
1,266 stars

Top 31.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

FinNLP provides a comprehensive toolkit for accessing and processing internet-scale financial data, specifically targeting researchers and developers interested in applying Large Language Models (LLMs) and Natural Language Processing (NLP) to financial markets. It offers pipelines for data acquisition and LLM training/finetuning, aiming to democratize financial data access.

How It Works

The project focuses on data ingestion from diverse sources, including financial news (Finnhub, Sina, Eastmoney), social media (Stocktwits, Reddit, Weibo), and company announcements (SEC, Juchao). It employs a modular design with specific classes for each data source, allowing users to download and process data via Python scripts. The use of proxies is integrated to mitigate IP blocking during scraping.

Quick Start & Requirements

  • Install: pip install finnlp
  • Prerequisites: Python 3.7+, API keys for certain services (e.g., Finnhub). Proxy configuration may be required for extensive scraping.
  • Resources: Requires internet access and potentially significant disk space for downloaded data.
  • Docs: https://github.com/AI4Finance-Foundation/FinNLP

Highlighted Details

  • Supports data collection from both US and Chinese financial markets.
  • Integrates with various LLM frameworks and models like ChatGPT, GPT-4, LLaMA, and FinBERT.
  • Provides access to structured datasets like AShare (news) and stocknet-dataset (tweets).
  • Includes functionalities for sentiment analysis and company announcement parsing.

Maintenance & Community

The project is part of the AI4Finance Foundation. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License (for code), with a disclaimer for academic purposes and educational license.
  • Compatibility: MIT license generally permits commercial use and linking with closed-source projects.

Limitations & Caveats

The project is primarily for academic and research purposes, explicitly stating it is not financial advice. Some data sources may require specific configurations (e.g., Weibo cookies) or are marked as "Soon" for full support. Proxy usage is recommended but not guaranteed to prevent all IP blocks.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.