DataHorse by DeDolphins

Data science tool for conversational data analysis using LLMs

Created 1 year ago

261 stars

Top 97.5% on SourcePulse

Project Summary

DataHorse is an open-source Python library and tool that democratizes data science by enabling users to interact with, analyze, and visualize data, as well as build machine learning models, using plain English commands. It is designed for business users and individuals without technical expertise, allowing them to derive insights and make data-driven decisions quickly and easily.

How It Works

DataHorse leverages Large Language Models (LLMs) to interpret natural language instructions and translate them into executable data manipulation, analysis, and machine learning operations. This conversational approach abstracts away complex syntax and coding requirements, making data science accessible to a broader audience. The library supports data modification, visualization, model training, and testing through a simple, chat-like interface.

Quick Start & Requirements

Primary install: pip install datahorse
Requirements: Python. The library utilizes LLMs, implying potential API key requirements or local model setup for full functionality, though not explicitly detailed in the README.
Demo: A Google Colab notebook is available for setup and usage examples.
WebUI: Requires cloning the repository, installing dependencies via pip install -r requirements.text, and running streamlit run app.py.

Highlighted Details

Conversational data analysis and ML model building in plain English.
Supports data modification, visualization, model training, and testing.
Includes an optional WebUI powered by Streamlit.
Offers reproducibility via a seed parameter and caching via cache_req=True.

Maintenance & Community

The project encourages contributions and provides a contributing guide. Users can follow the project on LinkedIn.

Licensing & Compatibility

The README does not explicitly state the license type.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The reliance on LLMs suggests potential costs or setup complexities related to model access or API usage not fully elaborated upon.

DataHorse by DeDolphins

Explore Similar Projects

datadm by approximatelabs

chatbi by chatbi

openchatbi by zhongyu09

viz-gpt by ObservedObserver

BambooAI by pgalko

VMind by VisActor

datavisualization_langgraph by DhruvAtreja

sketch by approximatelabs

DataAgent by spring-ai-alibaba

DeepBI by DeepInsight-AI

Rath by Kanaries

data-formulator by microsoft