DataHorse  by DeDolphins

Data science tool for conversational data analysis using LLMs

created 11 months ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
Project Summary

DataHorse is an open-source Python library and tool that democratizes data science by enabling users to interact with, analyze, and visualize data, as well as build machine learning models, using plain English commands. It is designed for business users and individuals without technical expertise, allowing them to derive insights and make data-driven decisions quickly and easily.

How It Works

DataHorse leverages Large Language Models (LLMs) to interpret natural language instructions and translate them into executable data manipulation, analysis, and machine learning operations. This conversational approach abstracts away complex syntax and coding requirements, making data science accessible to a broader audience. The library supports data modification, visualization, model training, and testing through a simple, chat-like interface.

Quick Start & Requirements

  • Primary install: pip install datahorse
  • Requirements: Python. The library utilizes LLMs, implying potential API key requirements or local model setup for full functionality, though not explicitly detailed in the README.
  • Demo: A Google Colab notebook is available for setup and usage examples.
  • WebUI: Requires cloning the repository, installing dependencies via pip install -r requirements.text, and running streamlit run app.py.

Highlighted Details

  • Conversational data analysis and ML model building in plain English.
  • Supports data modification, visualization, model training, and testing.
  • Includes an optional WebUI powered by Streamlit.
  • Offers reproducibility via a seed parameter and caching via cache_req=True.

Maintenance & Community

The project encourages contributions and provides a contributing guide. Users can follow the project on LinkedIn.

Licensing & Compatibility

The README does not explicitly state the license type.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The reliance on LLMs suggests potential costs or setup complexities related to model access or API usage not fully elaborated upon.

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Dominik Moritz Dominik Moritz(Professor at CMU; ML Researcher at Apple) and Casey Caruso Casey Caruso(Managing Partner of Topology Ventures).

latent-scope by enjalot

0.4%
717
Scientific tool for latent space investigation
created 2 years ago
updated 2 months ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 4 days ago
Feedback? Help us improve.