Data-Copilot by zwq2018

LLM-based system for autonomous data workflows

Created 2 years ago

1,517 stars

Top 27.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

Data-Copilot is an LLM-based system designed to autonomously manage, process, analyze, predict, and visualize data for users, particularly focusing on Chinese financial markets. It aims to bridge the gap between vast datasets and human understanding by transforming raw data into informative results and interactive interfaces.

How It Works

Data-Copilot leverages LLMs (GPT-3.5, Azure-GPT-3.5, Qwen-72b-Chat) to interpret user requests and autonomously design, dispatch, and execute workflows. It acts as a "designer" by creating interface tools and a "dispatcher" by sequentially or in parallel invoking these tools to fetch, process, and visualize data from heterogeneous sources like Chinese stocks, funds, economic, and financial data. This autonomous workflow generation and execution aims to reduce manual intervention in complex data analysis tasks.

Quick Start & Requirements

Install: pip install -r requirements.txt
Run: python main.py (for core processing) or python app.py (for Gradio demo).
Prerequisites: OpenAI API key (or Azure equivalent with api-base and engine), Tushare token.
Resources: Requires API keys for LLM services and Tushare.
Demo: Available on Hugging Face Space.

Highlighted Details

Supports Chinese stock, fund, economic, and financial data.
Autonomous workflow design and execution for data processing and visualization.
Can generate versatile interface tools through self-request and iterative refinement.
Outputs results as text summaries, images, and tables.

Maintenance & Community

Project associated with authors from Zhejiang University.
Contact email provided for questions.
Acknowledgements include ChatGPT, Tushare, and Qwen.

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The system's current data access is limited by the 4k input token limit of GPT-3.5, restricting it to Chinese financial data. Future support for foreign financial markets is planned.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days