hamilton  by apache

Python library for data transformation DAGs

Created 2 years ago
2,300 stars

Top 19.8% on SourcePulse

GitHubView on GitHub
Project Summary

Apache Hamilton provides a Python library for defining, visualizing, and executing data transformation Directed Acyclic Graphs (DAGs). It targets data scientists and engineers seeking to improve the modularity, testability, and maintainability of their data pipelines, from ETL to ML workflows and LLM applications. Hamilton's core benefit is enabling portable, expressive, and self-documenting dataflows that integrate seamlessly across various Python environments.

How It Works

Hamilton models data transformations as Python functions, where function parameters define dependencies. The library automatically constructs the DAG from these functions, promoting readable, modular code. Its unique function modifiers allow for DRY code and reduced complexity in large DAGs, while built-in data and schema validation (@check_output, SchemaValidator) enhance robustness. This approach separates DAG definition from execution, facilitating collaboration and smoother transitions from development to production.

Quick Start & Requirements

  • Install with pip install "sf-hamilton[visualization]".
  • For visualization, Graphviz must be installed separately on your system.
  • Hamilton supports Python 3.8+.
  • The Hamilton UI requires pip install "sf-hamilton[ui,sdk]".
  • Try Hamilton in the browser: www.tryhamilton.dev

Highlighted Details

  • Automatically visualizes, catalogs, and monitors DAG execution via the Hamilton UI.
  • Supports data and schema validation for outputs using decorators and adapters.
  • Functions as a framework for structuring data transformations, comparable to dbt for SQL.
  • Designed for extensibility with a plugin architecture.

Maintenance & Community

  • Originated at Stitch Fix, now supported by DAGWorks Inc.
  • Active community via Slack.
  • Numerous contributors and notable users across various industries (e.g., UK Government Digital Services, IBM, Adobe).

Licensing & Compatibility

  • BSD 3-Clause Clear License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Hamilton is not an orchestrator or a feature store, but rather a framework for defining data transformation logic. For complex control flow like loops or conditional logic (e.g., for LLM agents), the sister library Burr is recommended.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
6
Star History
30 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.