flyte  by flyteorg

Workflow orchestrator for data, ML, and analytics pipelines

created 5 years ago
6,385 stars

Top 8.1% on sourcepulse

GitHubView on GitHub
Project Summary

Flyte is an open-source orchestration platform designed for building production-grade data and ML pipelines, targeting data engineers and ML practitioners. It offers scalability, reproducibility, and seamless integration with existing stacks by leveraging Kubernetes, enabling efficient distributed processing and resource utilization.

How It Works

Flyte utilizes a robust type engine for strongly typed interfaces, ensuring data validation at each workflow step. Workflows can be written in Python or other languages via raw containers or SDKs (Java, Scala, JavaScript). Executions are immutable for reproducibility, and features like dynamic workflows, branching, and map tasks allow for flexible and parallel execution. Data lineage tracking and visualization tools are integrated.

Quick Start & Requirements

  • Install SDK: pip install flytekit
  • Run locally: pyflyte run <workflow_file.py> <workflow_name>
  • Demo cluster: flytectl demo start
  • Production deployment: Refer to the Deployment guide.

Highlighted Details

  • Supports multi-language development (Python, Java, Scala, JavaScript) via SDKs or raw containers.
  • Features strong typing, immutability, data lineage, and dynamic workflows.
  • Enables parallel execution with map tasks and fine-grained control over resource allocation (e.g., GPU acceleration, spot instances).
  • Offers features like caching, checkpointing, timeouts, and failure recovery at the task level.
  • Integrates with cloud storage (FlyteFile, FlyteDirectory) and provides structured dataset handling.

Maintenance & Community

Flyte is used by companies like LinkedIn and Spotify. Community engagement is fostered through monthly syncs, a Slack channel, a newsletter, and YouTube content. Contributions are welcomed via bug reports, documentation improvements, and code submissions.

Licensing & Compatibility

Flyte is available under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While Flyte supports multi-language development, the primary SDK and documentation focus heavily on Python. Production deployment requires Kubernetes expertise.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
25
Issues (30d)
31
Star History
204 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Daniel Han Daniel Han(Cofounder of Unsloth), and
1 more.

airweave by airweave-ai

0.6%
3k
Semantic MCP server for AI agents
created 7 months ago
updated 2 days ago
Feedback? Help us improve.