DeepAnalyze by ruc-datalab

Autonomous data science powered by an agentic LLM

Created 4 months ago

3,702 stars

Top 12.9% on SourcePulse

Project Summary

Summary DeepAnalyze presents itself as the first agentic Large Language Model (LLM) designed for autonomous data science. It aims to automate the entire data science pipeline, from data preparation and analysis to modeling, visualization, and report generation, enabling open-ended data research across diverse data formats without human intervention. This project targets users seeking an automated data analysis assistant capable of producing analyst-grade research reports.

How It Works The core of DeepAnalyze is its agentic LLM architecture, which autonomously executes complex data science tasks. It supports a broad spectrum of data sources, including structured (Databases, CSV, Excel), semi-structured (JSON, XML, YAML), and unstructured (TXT, Markdown) data. This approach allows for end-to-end data processing and deep research, culminating in comprehensive reports, thereby streamlining the data science workflow.

Quick Start & Requirements To deploy locally, users must first create a Python 3.12 environment (e.g., using conda create -n deepanalyze python=3.12 -y). After activating the environment (conda activate deepanalyze), install core dependencies via pip install -r requirements.txt, ensuring torch==2.6.0, transformers==4.53.2, and vllm==0.8.5 are met. For training custom models, additional pip install -e . commands are required within specific subdirectories (deepanalyze/ms-swift/ and deepanalyze/SkyRL/). The demo interface can be launched by navigating to demo/chat, running npm install, and then executing bash start.sh. Interaction is available via a web browser at http://localhost:4000. An OpenAI-style API can be started using python demo/backend.py.

Highlighted Details

End-to-End Automation: Capable of autonomously handling the entire data science lifecycle, including preparation, analysis, modeling, visualization, and report generation.
Versatile Data Handling: Supports deep research and analysis across structured, semi-structured, and unstructured data formats.
Fully Open-Source: The project provides open access to its model, code, training data, and demo, facilitating deployment and extension.
API Access: Offers an OpenAI-style API for programmatic integration.

Maintenance & Community The project welcomes contributions, with useful issues and pull requests being incorporated into the contributor list. For inquiries, users can contact zhangshaolei98@ruc.edu.cn. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility The provided README does not specify a software license. This absence of explicit licensing information is a significant blocker for determining commercial use, derivative works, and overall compatibility with other projects.

Limitations & Caveats The user interface for the demo is noted as an initial version, with an invitation for further development. A critical limitation for adoption is the absence of a stated software license, preventing clear understanding of usage rights and restrictions.

DeepAnalyze by ruc-datalab

Explore Similar Projects

awesome-data-agents by HKUSTDial

Auto-Analyst-Streamlit by FireBird-Technologies

bagofwords by bagofwords1

verl-tool by TIGER-AI-Lab

Auto-Analyst by FireBird-Technologies

instill-core by instill-ai

spiceai by spiceai

distilabel by argilla-io

fire-enrich by firecrawl

AI-Research-SKILLs by Orchestra-Research

atomic-agents by BrainBlend-AI

ai-data-science-team by business-science