tango by allenai

Experiment orchestration framework for AI research

Created 4 years ago

565 stars

Top 56.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Jeff Hammerbacher

Cofounder of Cloudera

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

AI2 Tango is an open-source Python library designed to streamline machine learning research by organizing experiments into discrete, cacheable, and reusable steps. It targets researchers and engineers working on complex, iterative projects, offering a structured alternative to ad-hoc file management and version tracking.

How It Works

Tango structures research workflows as Directed Acyclic Graphs (DAGs) of "steps." Each step is a Python function or class decorated with @step(). The library caches step outputs based on a unique ID derived from step inputs and metadata (fully qualified name, version). This caching mechanism avoids redundant computation when inputs haven't changed, significantly speeding up iterative development. Unlike other workflow engines, Tango intentionally excludes source code hashes from cache keys, allowing for code modifications without invalidating the cache unless a VERSION class variable is manually updated, promoting transparency and control.

Quick Start & Requirements

Install via pip: pip install ai2-tango or pip install 'ai2-tango[all]' for all integrations.
Conda installation: conda install tango -c conda-forge or conda install tango-all -c conda-forge.
Requires Python 3.8+.
Official documentation: https://allenai.github.io/tango/

Highlighted Details

Organizes experiments into cacheable, reusable steps.
Caching based on inputs and step metadata, not source code hashes.
Supports various integrations (e.g., PyTorch, Datasets, Weights & Biases) via optional installs.
Offers Docker images for reproducible environments.

Maintenance & Community

Developed and maintained by the AllenNLP team at the Allen Institute for Artificial Intelligence (AI2).

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While designed for research, the README notes that tools like Metaflow, Airflow, or Redun may be better suited for production workflows. The caching mechanism relies on manual VERSION updates for code changes to invalidate the cache.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days