xorq  by xorq-labs

Executable memory system for tabular data agents

Created 2 years ago
510 stars

Top 60.8% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Xorq tackles the problem of managing ephemeral artifacts generated by AI agents, transforming ad-hoc scripts, intermediate states, and raw data into durable, composable, and executable pipelines. It provides a Git-native catalog system for tabular data work, enabling agents and humans to discover, reproduce, and reuse computational artifacts. This approach significantly reduces technical debt, enhances collaboration, and ensures a verifiable lineage for data-driven workflows.

How It Works

Xorq's core innovation lies in its declarative approach using Ibis for dataframe expressions, which compile efficiently across multiple execution engines. The system's catalog is fundamentally a Git repository, storing build artifacts and their metadata, with Git-annex managing large files. Reproducible Python environments are meticulously managed using uv, ensuring consistent execution. Computation is powered by DataFusion for embedded processing, and Apache Arrow serves as the native data interchange format, facilitating efficient, state-less pipeline execution akin to Unix pipes. This combination ensures provenance, reproducibility, and portability of agent-generated work.

Quick Start & Requirements

Installation is available via pip (pip install xorq[examples]) or through the Claude Code plugin. Comprehensive documentation and project details are available at docs.xorq.dev and www.xorq.dev.

Highlighted Details

  • Git-Native Catalog: Artifacts are version-controlled as Git commits, enabling discovery and management through standard Git operations and file system access, eliminating the need for a dedicated service.
  • Multi-Engine Support: Declarative Ibis expressions seamlessly execute against diverse backends, including embedded engines (DataFusion, DuckDB, SQLite, pandas) and various data warehouses (Snowflake, Databricks, Trino, Postgres).
  • Arrow-Native Data Flow: Pipelines operate by streaming Apache Arrow RecordBatches, enabling highly efficient, stateless data transformations.
  • Scikit-learn Integration: Facilitates the translation of scikit-learn Pipeline objects into Xorq's deferred expression format for unified management.

Maintenance & Community

Specific details regarding maintainers, community channels (e.g., Discord/Slack), or a public roadmap were not explicitly detailed in the provided README text.

Licensing & Compatibility

Xorq is distributed under the permissive MIT license, making it suitable for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The project is currently in a pre-1.0 development stage, which implies potential for breaking changes; users should anticipate the need to consult migration guides.

Health Check
Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)
107
Issues (30d)
18
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

Kiln by Kiln-AI

0.2%
5k
AI prototyping and dataset collaboration tool
Created 1 year ago
Updated 20 hours ago
Feedback? Help us improve.