hamilton  by apache

Python library for data transformation DAGs

created 2 years ago
2,213 stars

Top 20.9% on sourcepulse

GitHubView on GitHub
Project Summary

Apache Hamilton provides a Python library for defining, visualizing, and executing data transformation Directed Acyclic Graphs (DAGs). It targets data scientists and engineers seeking to improve the modularity, testability, and maintainability of their data pipelines, from ETL to ML workflows and LLM applications. Hamilton's core benefit is enabling portable, expressive, and self-documenting dataflows that integrate seamlessly across various Python environments.

How It Works

Hamilton models data transformations as Python functions, where function parameters define dependencies. The library automatically constructs the DAG from these functions, promoting readable, modular code. Its unique function modifiers allow for DRY code and reduced complexity in large DAGs, while built-in data and schema validation (@check_output, SchemaValidator) enhance robustness. This approach separates DAG definition from execution, facilitating collaboration and smoother transitions from development to production.

Quick Start & Requirements

  • Install with pip install "sf-hamilton[visualization]".
  • For visualization, Graphviz must be installed separately on your system.
  • Hamilton supports Python 3.8+.
  • The Hamilton UI requires pip install "sf-hamilton[ui,sdk]".
  • Try Hamilton in the browser: www.tryhamilton.dev

Highlighted Details

  • Automatically visualizes, catalogs, and monitors DAG execution via the Hamilton UI.
  • Supports data and schema validation for outputs using decorators and adapters.
  • Functions as a framework for structuring data transformations, comparable to dbt for SQL.
  • Designed for extensibility with a plugin architecture.

Maintenance & Community

  • Originated at Stitch Fix, now supported by DAGWorks Inc.
  • Active community via Slack.
  • Numerous contributors and notable users across various industries (e.g., UK Government Digital Services, IBM, Adobe).

Licensing & Compatibility

  • BSD 3-Clause Clear License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Hamilton is not an orchestrator or a feature store, but rather a framework for defining data transformation logic. For complex control flow like loops or conditional logic (e.g., for LLM agents), the sister library Burr is recommended.

Health Check
Last commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
34
Issues (30d)
11
Star History
104 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alexander Wettig Alexander Wettig(Author of SWE-bench, SWE-agent), and
2 more.

data-juicer by modelscope

0.7%
5k
Data-Juicer: Data processing system for foundation models
created 2 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.