determined by determined-ai

Open-source ML platform for simplifying training, tuning, tracking, and resource management

Created 5 years ago

3,214 stars

Top 14.7% on SourcePulse

9 Experts Love This Project

jn2clark

Cofounder of Marqo

vincentweisser

Vincent Weisser

Cofounder of Prime Intellect

hammer

Jeff Hammerbacher

Cofounder of Cloudera

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 5 more!

Project Summary

Determined AI is an open-source platform designed to streamline deep learning workflows for researchers and engineers. It simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management, aiming to accelerate model development and reduce cloud costs.

How It Works

Determined operates through a Python library, a command-line interface (CLI), and a web UI. The Python library allows users to integrate their PyTorch or TensorFlow code by organizing it within PyTorchTrial or TFv1Trial classes or by using the lower-level Core API. This abstraction handles distributed execution and hyperparameter search, enabling faster training and automated tuning. The CLI manages cluster deployment (local, AWS, GCP) and experiment execution, while the Web UI provides comprehensive visualization of training progress, resource utilization, and model artifacts.

Quick Start & Requirements

Install CLI: pip install determined
Deployment: Use det deploy local cluster-up for local setup or det deploy aws up / det deploy gcp up for cloud.
Prerequisites: Python, Docker (for local deployment). Cloud deployment requires cloud provider credentials.
Resources: Cloud GPU instances are recommended for training.
Links: Examples, Documentation, Quick Start Guide

Highlighted Details

Supports distributed training across multiple GPUs and nodes.
Integrates advanced hyperparameter search algorithms like Adaptive SHA.
Provides robust experiment tracking, including code snapshots and configuration history.
Offers performance profiling and resource management for cost optimization.

Maintenance & Community

Active community via [Slack](https://join.slack.com/t/determinedai/shared_invite/zt-133033001-0_0142303207437217733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727733167727

Health Check

Last Commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)

2

Issues (30d)

1

Star History

5 stars in the last 30 days

Explore Similar Projects

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

varuna by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago

Updated 1 year ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

optimum-benchmark by huggingface

Benchmarking utility for Transformers, Diffusers, and other models

Created 2 years ago

Updated 5 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect).

openai_lab by kengz

RL experimentation framework for OpenAI Gym

Created 9 years ago

Updated 8 years ago

Starred by

Zhiqiang Xie

Zhiqiang Xie(Coauthor of SGLang).

dynolog by facebookincubator

Telemetry daemon for performance monitoring and tracing of heterogeneous CPU-GPU systems

Created 3 years ago

Updated 1 day ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

efficient-dl-systems by mryab

Course materials for efficient deep learning systems

Created 4 years ago

Updated 3 days ago

cube-studio by data-infra

Unified cloud-native AI platform for end-to-end ML workflows

Created 1 year ago

Updated 2 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

SwanLab by SwanHubX

AI training tracking and visualization tool

Created 2 years ago

Updated 1 week ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Eiso Kant

Eiso Kant(Cofounder of Poolside AI), and

20 more.

composer by mosaicml

DL framework for training at scale, optimized for large-scale clusters

Created 4 years ago

Updated 3 months ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

13 more.

higgsfield by higgsfield-ai

ML framework for large model training and GPU orchestration

Created 7 years ago

Updated 1 year ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI), and

6 more.

clearml by clearml

MLOps suite for experiment tracking, automation, and data management

Created 6 years ago

Updated 1 day ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

14 more.

wandb by wandb

AI developer platform for model training, fine-tuning, and management

Created 9 years ago

Updated 19 hours ago

Starred by

Peter Salanki

Peter Salanki(Cofounder of CoreWeave),

Travis Fischer

Travis Fischer(Founder of Agentic), and

8 more.

serving by tensorflow

Serving system for machine learning models in production

Created 10 years ago

Updated 2 months ago

Feedback? Help us improve.