indexify  by tensorlakeai

Engine for data-intensive generative AI apps

created 2 years ago
1,046 stars

Top 36.6% on sourcepulse

GitHubView on GitHub
Project Summary

Indexify is an open-source serving engine designed to simplify the creation and deployment of durable, multi-stage data processing workflows for generative AI applications. It targets developers building data-intensive applications, enabling them to orchestrate complex data pipelines involving tasks like document extraction, embedding, and retrieval across distributed compute resources.

How It Works

Indexify utilizes a graph-based approach where workflows are defined as directed acyclic graphs (DAGs) of functions. Each function, decorated with @tensorlake_function, represents a unit of compute that can be independently deployed and scaled. The engine supports dynamic routing, allowing data to be routed to specialized compute functions based on conditional logic. It also features automatic retries and load balancing for functions, ensuring durability and efficient resource utilization across heterogeneous hardware (CPU/GPU).

Quick Start & Requirements

  • Install via pip: pip install indexify tensorlake
  • Requires Python 3.7+
  • Local testing requires only the tensorlake package.
  • Deployment involves running the indexify-server and indexify-cli executor components.
  • Official documentation and examples are available.

Highlighted Details

  • Supports multi-cloud and multi-region deployment for compute resources.
  • Enables distributed processing and fine-grained resource allocation (CPU/GPU).
  • Functions are durable and automatically retried on failure.
  • Workflows can be exposed as HTTP APIs or Python Remote APIs.

Maintenance & Community

Indexify is the core engine powering Tensorlake's Serverless Workflow Engine. Community channels and roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, which permits commercial use and linking with closed-source applications.

Limitations & Caveats

The project is described as the "Open-Source core compute engine," suggesting potential differences or additional features in Tensorlake's commercial offering. The roadmap indicates features like cyclic graph support and ephemeral graphs are still under development.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
81
Issues (30d)
42
Star History
58 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

bytewax by bytewax

0.3%
2k
Python framework for stateful stream processing
created 3 years ago
updated 4 months ago
Feedback? Help us improve.