bytewax  by bytewax

Python framework for stateful stream processing

Created 3 years ago
1,930 stars

Top 22.4% on SourcePulse

GitHubView on GitHub
Project Summary

Bytewax is a Python-first framework for stateful stream processing, designed to simplify complex event-driven applications and online machine learning. It targets Python developers seeking to leverage familiar tools for scalable, distributed dataflow pipelines, offering an alternative to Java-centric frameworks like Flink and Spark.

How It Works

Bytewax employs a dataflow computational model, allowing users to define pipelines using Python operators and connectors. It distinguishes itself with a Pythonic interface, enabling seamless integration with the Python ecosystem. The framework manages distributed state, provides fault tolerance, and supports event-time windowing for advanced analytics. A Rust-based engine underpins its performance, while the waxctl CLI tool facilitates deployment and management across various infrastructures, including Kubernetes.

Quick Start & Requirements

  • Install Bytewax via pip: pip install bytewax
  • Install waxctl for deployment management.
  • Requires Python.
  • Official documentation and examples are available.

Highlighted Details

  • Python-first API for leveraging existing libraries and tooling.
  • Stateful stream processing with automatic state recovery and fault tolerance.
  • Scalable from local development to multi-node, distributed deployments.
  • Rich connector ecosystem for various data sources and sinks.
  • Flexible dataflow API with stateless, stateful, windowing, and join operators.

Maintenance & Community

  • Active community on Slack for support and discussion.
  • Contributions are welcomed via GitHub issues and a contribution guide.
  • Follows a Code of Conduct.

Licensing & Compatibility

  • Licensed under the Apache-2.0 license.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The README mentions a "commercially licensed Platform" for scaling, implying potential limitations or additional costs for advanced enterprise features beyond the open-source offering. Specific details on these commercial offerings are not elaborated within the provided text.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

pathway by pathwaycom

0.8%
57k
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG
Created 3 years ago
Updated 11 hours ago
Feedback? Help us improve.