datadm  by approximatelabs

Private data assistant for conversational data manipulation

created 2 years ago
386 stars

Top 75.3% on sourcepulse

GitHubView on GitHub
Project Summary

DataDM offers a private, conversational interface for data analysis, enabling users to load, clean, transform, and visualize data without writing code. It targets users who prioritize data privacy and seek an intuitive, AI-powered data manipulation tool.

How It Works

DataDM leverages a persistent Jupyter kernel backend for executing data manipulation code generated by a large language model (LLM). Users interact via natural language, and the LLM translates these requests into Python code executed within the kernel. This approach allows for complex data operations and visualizations through a conversational interface while maintaining a stateful session for iterative analysis.

Quick Start & Requirements

  • Docker: docker run -p 7860:7860 -it ghcr.io/approximatelabs/datadm:latest (for OpenAI models) or docker run --gpus all -p 7860:7860 -it ghcr.io/approximatelabs/datadm:latest-cuda (for local StarChat models).
  • Local Install: pip install datadm or pip install "datadm[cuda]".
  • Prerequisites: CUDA-enabled GPU with at least 24GB RAM for local StarChat model. OpenAI API key required for cloud-based models.
  • Resources: Local mode requires significant GPU RAM.
  • Links: Demo

Highlighted Details

  • Full local execution option for enhanced data privacy.
  • Supports natural language chat, visualizations, and direct data downloads.
  • Integrates with OpenAI's GPT models or the locally runnable StarChat model.
  • Features persistent Jupyter kernel for stateful data manipulation.

Maintenance & Community

  • Community support via Discord.
  • Contributions are welcomed via PRs and issues.
  • Discord

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

LLMs are prone to hallucination, requiring users to verify generated results. GGML-based CPU-only mode and rollback functionality are listed as "Work in Progress." Support for additional data sources like SQL and S3 is planned but not yet implemented.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.