pixeltable  by pixeltable

AI data infrastructure for multimodal apps using declarative, incremental approach

created 2 years ago
684 stars

Top 50.6% on sourcepulse

GitHubView on GitHub
Project Summary

Pixeltable provides a declarative data infrastructure for multimodal AI applications, addressing the complexity of stitching together disparate tools for data ingestion, transformation, indexing, and orchestration. It targets AI engineers and researchers building production-ready multimodal applications, offering a unified framework to simplify data plumbing and accelerate development.

How It Works

Pixeltable operates as a database, storing metadata and computed results persistently. Users define data processing and AI workflows declaratively using computed columns on tables. The engine automatically handles data ingestion (referencing files in place), transformation via Python UDFs or built-in operations, AI model integration for inference, and vector index creation for semantic search. Its core advantage lies in incremental computation, ensuring only necessary recomputations occur when data or code changes, alongside automatic versioning and lineage tracking.

Quick Start & Requirements

  • Install via pip: pip install pixeltable
  • Requires Python 3.8+
  • Supports Linux, macOS, and Windows.
  • See Installation and Quick Start.

Highlighted Details

  • Unified multimodal interface for images, video, audio, and documents.
  • Declarative computed columns for automatic processing and AI model integration.
  • Built-in vector search and similarity indexing.
  • Supports Python UDFs and agentic workflows with LLM tool calling.
  • Persistent storage with automatic versioning and lineage tracking.

Maintenance & Community

  • Active development with a public roadmap for cloud infrastructure and deployment.
  • Community support available via Discord.
  • Contributions are welcomed via their Contributing Guide.

Licensing & Compatibility

  • Licensed under the Apache 2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is actively under development, with a roadmap indicating future cloud features. While it supports various AI integrations, specific model compatibility or performance tuning for niche use cases may require custom UDFs.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
58
Issues (30d)
4
Star History
512 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

NeumAI by NeumTry

0%
858
Data platform for retrieval-augmented generation (RAG)
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alexander Wettig Alexander Wettig(Author of SWE-bench, SWE-agent), and
2 more.

data-juicer by modelscope

0.7%
5k
Data-Juicer: Data processing system for foundation models
created 2 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Feedback? Help us improve.