hamilton by apache

Python library for data transformation DAGs

Created 2 years ago

2,361 stars

Top 19.1% on SourcePulse

View on GitHub

8 Experts Love This Project

Cofounder of Cloudera

Jesse Clark

Cofounder of Marqo

and 4 more!

Project Summary

Apache Hamilton provides a Python library for defining, visualizing, and executing data transformation Directed Acyclic Graphs (DAGs). It targets data scientists and engineers seeking to improve the modularity, testability, and maintainability of their data pipelines, from ETL to ML workflows and LLM applications. Hamilton's core benefit is enabling portable, expressive, and self-documenting dataflows that integrate seamlessly across various Python environments.

How It Works

Hamilton models data transformations as Python functions, where function parameters define dependencies. The library automatically constructs the DAG from these functions, promoting readable, modular code. Its unique function modifiers allow for DRY code and reduced complexity in large DAGs, while built-in data and schema validation (@check_output, SchemaValidator) enhance robustness. This approach separates DAG definition from execution, facilitating collaboration and smoother transitions from development to production.

Quick Start & Requirements

Install with pip install "sf-hamilton[visualization]".
For visualization, Graphviz must be installed separately on your system.
Hamilton supports Python 3.8+.
The Hamilton UI requires pip install "sf-hamilton[ui,sdk]".
Try Hamilton in the browser: www.tryhamilton.dev

Highlighted Details

Automatically visualizes, catalogs, and monitors DAG execution via the Hamilton UI.
Supports data and schema validation for outputs using decorators and adapters.
Functions as a framework for structuring data transformations, comparable to dbt for SQL.
Designed for extensibility with a plugin architecture.

Maintenance & Community

Originated at Stitch Fix, now supported by DAGWorks Inc.
Active community via Slack.
Numerous contributors and notable users across various industries (e.g., UK Government Digital Services, IBM, Adobe).

Licensing & Compatibility

BSD 3-Clause Clear License.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Hamilton is not an orchestrator or a feature store, but rather a framework for defining data transformation logic. For complex control flow like loops or conditional logic (e.g., for LLM agents), the sister library Burr is recommended.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

38 stars in the last 30 days