quokka  by marsupialtail

Distributed query engine for time series data

Created 3 years ago
1,183 stars

Top 32.9% on SourcePulse

GitHubView on GitHub
Project Summary

Quokka is a Python-native, push-based distributed query engine designed for high-performance time series analytics and complex event processing on large datasets. It targets data engineers and researchers working with time-series data, offering significant speedups over traditional engines like Spark for specific workloads, particularly those involving windowing, joins, and custom stateful computations.

How It Works

Quokka leverages a push-based execution model, allowing data partitions to be processed serially as they become available, enabling pipelining of shuffles and I/O for performance gains. It integrates multiple high-performance libraries: DuckDB and Polars for relational algebra kernels, Ray for distributed task scheduling, Arrow for efficient data interchange, and Redis for lineage logging. This architecture allows for complex time-series operations like asof/range joins and pattern recognition, while its Python-native implementation simplifies extensibility and UDF integration.

Quick Start & Requirements

Highlighted Details

  • Tick-level backtesting demonstrated on SIP trade streams.
  • Vector embedding analytics with support for formats like Lance.
  • Claims several times faster than SparkSQL on TPC-H queries.
  • Supports complex time-series operations like windowing, asof/range joins, and pattern recognition.

Maintenance & Community

  • Active development with contributions acknowledged.
  • Discord channel available for questions and discussion.
  • Encourages users to reach out and raise GitHub issues.

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

Quokka is not a direct replacement for SparkSQL, as it does not yet parse SQL directly, though this is on the roadmap. The project encourages users to engage with the developers before deploying for critical use cases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Mike Krieger Mike Krieger(CPO at Anthropic; Cofounder of Instagram), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
25 more.

redis by redis

0.1%
71k
Redis is a versatile data structure server, cache, and query engine
Created 16 years ago
Updated 3 days ago
Feedback? Help us improve.