pathway  by pathwaycom

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG

created 2 years ago
29,416 stars

Top 1.3% on sourcepulse

GitHubView on GitHub
Project Summary

Pathway is a Python ETL framework designed for stream processing, real-time analytics, and LLM/RAG pipelines, offering a unified approach for both batch and streaming data. It targets developers and researchers seeking to build robust, scalable data processing applications with seamless integration of Python ML libraries, powered by an incremental computation engine.

How It Works

Pathway leverages a high-performance Rust engine, built on Differential Dataflow principles, to execute Python code. This architecture enables efficient multithreading, multiprocessing, and distributed computations, keeping the entire pipeline in memory for low-latency processing. Its core advantage lies in its ability to perform incremental computations, meaning only changed data is reprocessed, leading to significant performance gains over traditional batch or micro-batch systems.

Quick Start & Requirements

Highlighted Details

  • Unified engine for batch and streaming data processing.
  • Extensive connectors, including an Airbyte connector for over 300 sources.
  • Supports stateful transformations (joins, windowing) and custom Python functions.
  • Provides persistence for pipeline state recovery.
  • Offers LLM helpers, an in-memory vector index, and integrations with LlamaIndex/LangChain for RAG applications.
  • Claims to outperform Flink, Spark, and Kafka Streaming in benchmarks.

Maintenance & Community

  • Active community on Discord for support and engagement.
  • Contribution guidelines encourage open-sourcing complementary libraries under MIT/Apache 2.0.
  • Contact email: contact@pathway.com

Licensing & Compatibility

  • BSL 1.1 License: Allows unlimited non-commercial use and most commercial use free of charge.
  • Code converts to Apache 2.0 after 4 years.
  • Complementary public repositories are MIT licensed.
  • Compatible with commercial use under the BSL terms.

Limitations & Caveats

  • The free version offers "at least once" consistency; "exactly once" consistency is an enterprise feature.
  • Full distributed computing and cloud deployment capabilities are part of the "Pathway for Enterprise" offering.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
6
Star History
6,639 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

bytewax by bytewax

0.3%
2k
Python framework for stateful stream processing
created 3 years ago
updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.