xorq by xorq-labs

Executable memory system for tabular data agents

Created 2 years ago

531 stars

Top 58.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Sam Bhagwat

Cofounder of Mastra, Gatsby

Wes McKinney

Author of Pandas

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Xorq tackles the problem of managing ephemeral artifacts generated by AI agents, transforming ad-hoc scripts, intermediate states, and raw data into durable, composable, and executable pipelines. It provides a Git-native catalog system for tabular data work, enabling agents and humans to discover, reproduce, and reuse computational artifacts. This approach significantly reduces technical debt, enhances collaboration, and ensures a verifiable lineage for data-driven workflows.

How It Works

Xorq's core innovation lies in its declarative approach using Ibis for dataframe expressions, which compile efficiently across multiple execution engines. The system's catalog is fundamentally a Git repository, storing build artifacts and their metadata, with Git-annex managing large files. Reproducible Python environments are meticulously managed using uv, ensuring consistent execution. Computation is powered by DataFusion for embedded processing, and Apache Arrow serves as the native data interchange format, facilitating efficient, state-less pipeline execution akin to Unix pipes. This combination ensures provenance, reproducibility, and portability of agent-generated work.

Quick Start & Requirements

Installation is available via pip (pip install xorq[examples]) or through the Claude Code plugin. Comprehensive documentation and project details are available at docs.xorq.dev and www.xorq.dev.

Highlighted Details

Git-Native Catalog: Artifacts are version-controlled as Git commits, enabling discovery and management through standard Git operations and file system access, eliminating the need for a dedicated service.
Multi-Engine Support: Declarative Ibis expressions seamlessly execute against diverse backends, including embedded engines (DataFusion, DuckDB, SQLite, pandas) and various data warehouses (Snowflake, Databricks, Trino, Postgres).
Arrow-Native Data Flow: Pipelines operate by streaming Apache Arrow RecordBatches, enabling highly efficient, stateless data transformations.
Scikit-learn Integration: Facilitates the translation of scikit-learn Pipeline objects into Xorq's deferred expression format for unified management.

Maintenance & Community

Specific details regarding maintainers, community channels (e.g., Discord/Slack), or a public roadmap were not explicitly detailed in the provided README text.

Licensing & Compatibility

Xorq is distributed under the permissive MIT license, making it suitable for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The project is currently in a pre-1.0 development stage, which implies potential for breaking changes; users should anticipate the need to consult migration guides.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days