ezdata  by xuwei95

LLM-powered data processing and task scheduling system

Created 2 years ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

ezdata is a data processing, analysis, and task scheduling system built with Python and Vue.js, targeting users who need to manage diverse data sources and automate complex data pipelines. It offers a low-code environment enhanced by Large Language Model (LLM) capabilities for interactive data exploration, conversational Q&A, and automated insight generation, aiming to simplify and accelerate data operations.

How It Works

The system features a Python backend and a Vue3 frontend. It abstracts various data sources—including files, relational databases, NoSQL, time-series, and graph databases—into a unified data model. This abstraction allows for consistent querying and the generation of data query API interfaces. A key differentiator is its LLM integration, which enables conversational data analysis, RAG-based knowledge retrieval from data sources, and AI-driven generation of conclusions, tables, and reports. Data integration is handled via a visual, low-code pipeline approach, extendable with a distributed Pandas engine for terabyte-scale datasets and custom code transformations. Task scheduling supports both single tasks and complex Directed Acyclic Graph (DAG) workflows, offering built-in templates (Python, Shell, Data Integration), distributed worker execution, task retries, and monitoring.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt -i https://pypi.doubanio.com/simple
  • Services: Start the web API service with python web_api.py and the task scheduling service with python scheduler_api.py.
  • Workers: Launch Celery workers using celery -A tasks worker -P eventlet (Windows) or celery -A tasks worker (Linux). The Flower monitoring tool can be started with celery -A tasks flower.
  • Prerequisites: Python, Celery. Eventlet is recommended for Windows worker concurrency.
  • Links: Official Website: http://www.ezdata.cloud, GitHub Repository: https://github.com/xuwei95/ezdata. An online demo is mentioned but not directly linked in the provided text.

Highlighted Details

  • Supports a wide range of data sources: files, relational DBs, NoSQL, time-series, and graph DBs.
  • Integrates LLMs for RAG-based knowledge Q&A and conversational, interactive data analysis.
  • Features low-code data integration pipelines with distributed Pandas support for large-scale data processing.
  • Provides robust DAG workflow scheduling with distributed worker capabilities and operational monitoring.

Maintenance & Community

  • The primary development hub appears to be the xuwei95/ezdata GitHub repository.
  • Official project website: http://www.ezdata.cloud.

Licensing & Compatibility

  • The license type for this project is not specified in the provided README.

Limitations & Caveats

  • The system architecture requires setting up and managing multiple distributed components (API, scheduler, Celery workers), which may introduce operational complexity.
  • Specific hardware requirements, performance benchmarks, or detailed setup guides for large-scale deployments are not provided.
  • The absence of explicit licensing information makes it difficult to assess compatibility for commercial use or integration into closed-source projects.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Alexander Wettig Alexander Wettig(Coauthor of SWE-bench, SWE-agent), and
5 more.

data-juicer by modelscope

1.0%
5k
Data-Juicer: Data processing system for foundation models
Created 2 years ago
Updated 14 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
47 more.

llama_index by run-llama

0.3%
45k
Data framework for building LLM-powered agents
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.