Distributed query engine for time series data
Top 33.7% on sourcepulse
Quokka is a Python-native, push-based distributed query engine designed for high-performance time series analytics and complex event processing on large datasets. It targets data engineers and researchers working with time-series data, offering significant speedups over traditional engines like Spark for specific workloads, particularly those involving windowing, joins, and custom stateful computations.
How It Works
Quokka leverages a push-based execution model, allowing data partitions to be processed serially as they become available, enabling pipelining of shuffles and I/O for performance gains. It integrates multiple high-performance libraries: DuckDB and Polars for relational algebra kernels, Ray for distributed task scheduling, Arrow for efficient data interchange, and Redis for lineage logging. This architecture allows for complex time-series operations like asof/range joins and pattern recognition, while its Python-native implementation simplifies extensibility and UDF integration.
Quick Start & Requirements
pip3 install pyquokka
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Quokka is not a direct replacement for SparkSQL, as it does not yet parse SQL directly, though this is on the roadmap. The project encourages users to engage with the developers before deploying for critical use cases.
11 months ago
1 day