connect  by redpanda-data

Operationally mundane stream processing engine

Created 9 years ago
8,545 stars

Top 6.0% on SourcePulse

GitHubView on GitHub
Project Summary

Fancy stream processing made operationally mundane. Redpanda Connect addresses the operational complexity of stream processing by providing a high-performance, resilient system for connecting diverse data sources and sinks. It targets engineers and power users needing to build, deploy, and monitor data pipelines with minimal friction. Its declarative configuration and built-in delivery guarantees significantly reduce setup and maintenance overhead.

How It Works

The system employs a declarative, configuration-file-driven approach where pipelines are defined via input connectors, a sequence of processing stages (mapping, enrichment, transformation, filtering), and output connectors. A key architectural advantage is its in-process transaction model, which ensures at-least-once delivery guarantees even during server failures or crashes, eliminating the need for disk-persisted state and simplifying operational management.

Quick Start & Requirements

  • Install: Via curl script for Linux binaries, Homebrew (brew install redpanda-data/tap/redpanda), or Docker (docker pull docker.redpanda.com/pandadata/connect).
  • Run: Execute with rpk connect run ./config.yaml or using Docker commands.
  • Prerequisites: Go toolchain for building from source. Docker is recommended for simplified deployment.
  • Docs: Comprehensive documentation available at the official site. Plugin development APIs are also public.

Highlighted Details

  • Broad Connectivity: Supports a vast array of sources and sinks including AWS, Azure, GCP, Kafka, NATS, MQTT, AMQP, Redis, SQL databases, Elasticsearch, Cassandra, and HTTP.
  • Delivery Guarantees: Offers robust at-least-once delivery by default, simplifying pipeline design and reliability.
  • Observability: Provides /ping and /ready health check endpoints, exposes metrics (Statsd, Prometheus, JSON HTTP), and supports OpenTelemetry tracing.

Maintenance & Community

Contributions are welcomed, with established tasks for formatting (task fmt) and linting (task lint). The project resides on GitHub. Specific community channels or contributor details are not detailed in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README text.

Limitations & Caveats

Building with optional external dependencies (e.g., zmq4) requires specific build tags (x_benthos_extra). Integration tests are skipped by default during standard testing (task test) and must be triggered manually.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
46
Issues (30d)
4
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Maxime Beauchemin Maxime Beauchemin(Author of Apache Airflow, Superset; Founder of Preset), and
3 more.

bytewax by bytewax

0.2%
2k
Python framework for stateful stream processing
Created 3 years ago
Updated 9 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

pathway by pathwaycom

0.8%
57k
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG
Created 3 years ago
Updated 16 hours ago
Feedback? Help us improve.