Discover and explore top open-source AI tools and projects—updated daily.
clearmlMLOps orchestration for distributed AI workloads
Top 93.3% on SourcePulse
ClearML Agent provides a distributed scheduler and orchestration solution for ML/DL/GenAI workloads, simplifying MLOps and LLMOps. It targets ML engineers and researchers seeking to manage experiments across diverse compute resources, offering automated execution, resource utilization optimization, and simplified cluster management with minimal DevOps overhead.
How It Works
The ClearML Agent functions as a job scheduler that monitors specified queues, retrieves experiments, and manages their execution. It automates the creation of isolated execution environments using virtual environments or Docker containers, clones the relevant code, installs dependencies (including automatic PyTorch version selection based on CUDA), executes the task, and streams logs and progress back to the ClearML Server UI. This approach enables a "fire-and-forget" execution model with flexible resource allocation across bare-metal, Kubernetes, and HPC environments.
Quick Start & Requirements
pip install clearml-agentdocker folder. Kubernetes integration details can be found via the clearml-helm-charts repository. Example automation scripts are located in the ClearML example/automation folder.Highlighted Details
Maintenance & Community
The project actively promotes community support through GitHub stars. However, the provided README does not detail specific contributors, sponsorships, or community channels like Discord or Slack.
Licensing & Compatibility
The project is licensed under the Apache License, Version 2.0. This permissive license generally allows for commercial use and integration with closed-source projects.
Limitations & Caveats
The "Services Mode" currently supports CPU-only configurations, limiting its direct application for GPU-intensive background services. The full functionality and orchestration capabilities are dependent on the availability and configuration of a ClearML Server instance.
2 months ago
Inactive
 Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), 
google
grahamjenson
google-research
triton-inference-server
tensorflow