seatunnel  by apache

High-performance multimodal data integration

Created 8 years ago
8,821 stars

Top 5.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Apache SeaTunnel is a multimodal, high-performance, distributed data integration platform designed for synchronizing vast amounts of data daily. It targets data engineers and developers dealing with diverse data sources and complex synchronization scenarios, offering efficient resource utilization and robust data quality monitoring.

How It Works

SeaTunnel employs a distributed snapshot algorithm for data consistency and supports multiple execution engines including its native Zeta Engine, Apache Spark, and Apache Flink. It features JDBC multiplexing and log parsing for efficient multi-table and database synchronization, enabling high throughput and low latency. The platform supports batch-stream integration and offers over 100 connectors for various data sources, sinks, and transformations.

Quick Start & Requirements

  • Download SeaTunnel from the Official Website.
  • Requires selection of an execution engine (Zeta Engine, Spark, or Flink).
  • Refer to Installation Guide for detailed setup.

Highlighted Details

  • Supports integration of video, images, and binary files alongside structured and unstructured text data.
  • Offers over 100 connectors and is actively expanding its ecosystem.
  • Provides two job development methods: coding and visual management via the SeaTunnel Web Project.
  • Used by companies like Weibo, Tencent Cloud, and Sina.

Maintenance & Community

  • Active community with a Slack channel available: SeaTunnel Slack.
  • Contributions are welcomed via GitHub Repository.
  • Contact via mailing list: dev@seatunnel.apache.org.

Licensing & Compatibility

  • Licensed under the Apache 2.0 License, permitting commercial use.

Limitations & Caveats

  • While supporting multimodal data, detailed instructions for video, image, and binary file integration are found in separate documentation.
Health Check
Last Commit

10 hours ago

Responsiveness

Inactive

Pull Requests (30d)
63
Issues (30d)
47
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Alexander Wettig Alexander Wettig(Coauthor of SWE-bench, SWE-agent), and
5 more.

data-juicer by modelscope

1.0%
5k
Data-Juicer: Data processing system for foundation models
Created 2 years ago
Updated 11 hours ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.