Data-Copilot  by zwq2018

LLM-based system for autonomous data workflows

Created 2 years ago
1,508 stars

Top 27.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Data-Copilot is an LLM-based system designed to autonomously manage, process, analyze, predict, and visualize data for users, particularly focusing on Chinese financial markets. It aims to bridge the gap between vast datasets and human understanding by transforming raw data into informative results and interactive interfaces.

How It Works

Data-Copilot leverages LLMs (GPT-3.5, Azure-GPT-3.5, Qwen-72b-Chat) to interpret user requests and autonomously design, dispatch, and execute workflows. It acts as a "designer" by creating interface tools and a "dispatcher" by sequentially or in parallel invoking these tools to fetch, process, and visualize data from heterogeneous sources like Chinese stocks, funds, economic, and financial data. This autonomous workflow generation and execution aims to reduce manual intervention in complex data analysis tasks.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Run: python main.py (for core processing) or python app.py (for Gradio demo).
  • Prerequisites: OpenAI API key (or Azure equivalent with api-base and engine), Tushare token.
  • Resources: Requires API keys for LLM services and Tushare.
  • Demo: Available on Hugging Face Space.

Highlighted Details

  • Supports Chinese stock, fund, economic, and financial data.
  • Autonomous workflow design and execution for data processing and visualization.
  • Can generate versatile interface tools through self-request and iterative refinement.
  • Outputs results as text summaries, images, and tables.

Maintenance & Community

  • Project associated with authors from Zhejiang University.
  • Contact email provided for questions.
  • Acknowledgements include ChatGPT, Tushare, and Qwen.

Licensing & Compatibility

  • No explicit license is mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The system's current data access is limited by the 4k input token limit of GPT-3.5, restricting it to Chinese financial data. Future support for foreign financial markets is planned.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
44 more.

llama_index by run-llama

0.3%
44k
Data framework for building LLM-powered agents
Created 2 years ago
Updated 21 hours ago
Feedback? Help us improve.