tango  by allenai

Experiment orchestration framework for AI research

created 3 years ago
562 stars

Top 58.1% on sourcepulse

GitHubView on GitHub
Project Summary

AI2 Tango is an open-source Python library designed to streamline machine learning research by organizing experiments into discrete, cacheable, and reusable steps. It targets researchers and engineers working on complex, iterative projects, offering a structured alternative to ad-hoc file management and version tracking.

How It Works

Tango structures research workflows as Directed Acyclic Graphs (DAGs) of "steps." Each step is a Python function or class decorated with @step(). The library caches step outputs based on a unique ID derived from step inputs and metadata (fully qualified name, version). This caching mechanism avoids redundant computation when inputs haven't changed, significantly speeding up iterative development. Unlike other workflow engines, Tango intentionally excludes source code hashes from cache keys, allowing for code modifications without invalidating the cache unless a VERSION class variable is manually updated, promoting transparency and control.

Quick Start & Requirements

  • Install via pip: pip install ai2-tango or pip install 'ai2-tango[all]' for all integrations.
  • Conda installation: conda install tango -c conda-forge or conda install tango-all -c conda-forge.
  • Requires Python 3.8+.
  • Official documentation: https://allenai.github.io/tango/

Highlighted Details

  • Organizes experiments into cacheable, reusable steps.
  • Caching based on inputs and step metadata, not source code hashes.
  • Supports various integrations (e.g., PyTorch, Datasets, Weights & Biases) via optional installs.
  • Offers Docker images for reproducible environments.

Maintenance & Community

Developed and maintained by the AllenNLP team at the Allen Institute for Artificial Intelligence (AI2).

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While designed for research, the README notes that tools like Metaflow, Airflow, or Redun may be better suited for production workflows. The caching mechanism relies on manual VERSION updates for code changes to invalidate the cache.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

AutoPR by irgolic

0.1%
1k
AI-powered workflows for codebase automation
created 2 years ago
updated 1 year ago
Starred by John Yang John Yang(Author of SWE-bench, SWE-agent), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
3 more.

cleanrl by vwxyzjn

0.5%
8k
RL algorithms implementation with research-friendly features
created 6 years ago
updated 3 weeks ago
Feedback? Help us improve.