datalinkx  by spitfireuptown

Data synchronization system for heterogeneous data sources

created 1 year ago
284 stars

Top 93.1% on sourcepulse

GitHubView on GitHub
Project Summary

DatalinkX is an open-source data synchronization and flow system designed for managing and automating data movement between heterogeneous data sources. It targets organizations with inter-departmental data collaboration needs, aiming to centralize sync tasks, consolidate logs, and improve operational efficiency.

How It Works

DatalinkX leverages Apache Flink (v1.10.3) and SeaTunnel (v2.3.8) as its core distributed data processing engines, enabling high-performance, stream-based data synchronization. It supports incremental and full data loads, with features for task management, cascading configurations, and log collection. The system offers a web-based UI for configuring data sources and sync tasks, integrating with Xxl-job for scheduled task triggering. It also supports intermediate transformation operators, including SQL and large language model (LLM) operators via Ollama.

Quick Start & Requirements

  • Installation: Docker Compose is recommended for launching core components (docker compose -p datalinkx up -d). Manual setup of dependencies like MySQL, Redis, Flink, and Xxl-job is also detailed.
  • Prerequisites: Java 8+, Maven, Node.js, MySQL (for Xxl-job), Redis (v5.0+), Flink (v1.10.3), SeaTunnel (v2.3.8), Ollama (for LLM operators), and Docker.
  • Setup: Manual setup of middleware and Flink cluster can take 1-2 hours.
  • Documentation: Detailed Documentation

Highlighted Details

  • Supports synchronization between HTTP, Oracle, MySQL, Elasticsearch, and Redis.
  • Integrates LLM operators via Ollama for data transformation.
  • Offers both batch and stream processing capabilities.
  • Provides a web UI for task management and configuration.

Maintenance & Community

The project has a significant number of stars on Gitee and GitHub. Community links are not explicitly provided in the README, but multiple Git hosting platforms are listed.

Licensing & Compatibility

The README does not explicitly state a license. The presence of "DatalinkX Pro" with additional features suggests a potential dual-licensing or commercial offering, which may have implications for commercial use or closed-source linking.

Limitations & Caveats

The open-source version lacks features found in the "Pro" version, such as ClickHouse support, MySQL CDC, an alarm center, and UI enhancements. The project relies on older versions of Flink (1.10.3) and SeaTunnel (2.3.8), which may impact compatibility with newer ecosystem components or lack recent performance optimizations and security patches.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Daniel Han Daniel Han(Cofounder of Unsloth), and
1 more.

airweave by airweave-ai

0.6%
3k
Semantic MCP server for AI agents
created 7 months ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

teable by teableio

0.3%
19k
No-code Postgres alternative for database applications
created 2 years ago
updated 17 hours ago
Feedback? Help us improve.