ezdata  by xuwei95

LLM-powered data processing and task scheduling system

Created 2 years ago
265 stars

Top 96.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

ezdata is a data processing, analysis, and task scheduling system built with Python and Vue.js, targeting users who need to manage diverse data sources and automate complex data pipelines. It offers a low-code environment enhanced by Large Language Model (LLM) capabilities for interactive data exploration, conversational Q&A, and automated insight generation, aiming to simplify and accelerate data operations.

How It Works

The system features a Python backend and a Vue3 frontend. It abstracts various data sources—including files, relational databases, NoSQL, time-series, and graph databases—into a unified data model. This abstraction allows for consistent querying and the generation of data query API interfaces. A key differentiator is its LLM integration, which enables conversational data analysis, RAG-based knowledge retrieval from data sources, and AI-driven generation of conclusions, tables, and reports. Data integration is handled via a visual, low-code pipeline approach, extendable with a distributed Pandas engine for terabyte-scale datasets and custom code transformations. Task scheduling supports both single tasks and complex Directed Acyclic Graph (DAG) workflows, offering built-in templates (Python, Shell, Data Integration), distributed worker execution, task retries, and monitoring.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt -i https://pypi.doubanio.com/simple
  • Services: Start the web API service with python web_api.py and the task scheduling service with python scheduler_api.py.
  • Workers: Launch Celery workers using celery -A tasks worker -P eventlet (Windows) or celery -A tasks worker (Linux). The Flower monitoring tool can be started with celery -A tasks flower.
  • Prerequisites: Python, Celery. Eventlet is recommended for Windows worker concurrency.
  • Links: Official Website: http://www.ezdata.cloud, GitHub Repository: https://github.com/xuwei95/ezdata. An online demo is mentioned but not directly linked in the provided text.

Highlighted Details

  • Supports a wide range of data sources: files, relational DBs, NoSQL, time-series, and graph DBs.
  • Integrates LLMs for RAG-based knowledge Q&A and conversational, interactive data analysis.
  • Features low-code data integration pipelines with distributed Pandas support for large-scale data processing.
  • Provides robust DAG workflow scheduling with distributed worker capabilities and operational monitoring.

Maintenance & Community

  • The primary development hub appears to be the xuwei95/ezdata GitHub repository.
  • Official project website: http://www.ezdata.cloud.

Licensing & Compatibility

  • The license type for this project is not specified in the provided README.

Limitations & Caveats

  • The system architecture requires setting up and managing multiple distributed components (API, scheduler, Celery workers), which may introduce operational complexity.
  • Specific hardware requirements, performance benchmarks, or detailed setup guides for large-scale deployments are not provided.
  • The absence of explicit licensing information makes it difficult to assess compatibility for commercial use or integration into closed-source projects.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Alexander Wettig Alexander Wettig(Coauthor of SWE-bench, SWE-agent), and
5 more.

data-juicer by datajuicer

0.5%
6k
Data-Juicer: Data processing system for foundation models
Created 2 years ago
Updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
47 more.

llama_index by run-llama

0.3%
46k
Data framework for building LLM-powered agents
Created 3 years ago
Updated 3 days ago
Feedback? Help us improve.