clusterdata by alibaba

Production cluster traces for data center research

Created 8 years ago

2,107 stars

Top 20.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Wei-Lin Chiang

Cofounder of LMArena

Simon Mo

Core Maintainer of vLLM

Project Summary

Summary:

Alibaba's Cluster Trace Program offers researchers and engineers real-world production cluster data from Alibaba's data centers. It addresses the need for realistic datasets to understand modern internet data center characteristics and workloads, enabling research into workload characterization, resource management, and scheduling algorithms. The benefit lies in providing a foundation for validating new ideas and improving cluster efficiency with data derived from large-scale, operational environments.

How It Works:

The project releases diverse trace datasets capturing different aspects and scales of Alibaba's production infrastructure over specific periods. These include machine configurations, workload types (online services, batch jobs, AI/ML, microservices), resource utilization, and microarchitectural metrics. Versions like cluster-trace-v2018 include DAG information for batch workloads, while GPU traces focus on AI/ML workloads. This approach provides granular, production-level insights advantageous for simulating realistic scenarios and developing robust scheduling strategies.

Quick Start & Requirements:

Access to trace datasets requires completing a short online survey. Specific subdirectories contain data, schemas, and processing/visualization scripts tailored to each trace version. Users will likely need standard data processing tools (e.g., Python libraries) to utilize the data.

Highlighted Details:

Offers diverse trace datasets (2017-2025) covering general clusters, GPU AI/ML, microservices, and microarchitecture performance.
Supports research into workload characterization, efficient workload assignment, and inter-scheduler collaboration in collocated environments.
Validated by numerous academic publications at top-tier conferences (USENIX NSDI, ATC, ACM SoCC), demonstrating significant research impact.
Includes specialized datasets like AMTrace (v2022) for microarchitecture analysis and v2025 for GPU-disaggregated Deep Learning Recommendation Models.

Maintenance & Community:

Engagement is encouraged via GitHub issues for questions and discussions. Users are invited to report publications using the traces. Alibaba plans periodic new trace releases. Contact is available via email (alibaba-clusterdata) or directly with maintainers.

Licensing & Compatibility:

Datasets are provided strictly for "research or study purpose." The specific open-source license is not explicitly stated, and this usage restriction may limit commercial applications or integration into closed-source systems.

Limitations & Caveats:

The data reflects Alibaba's specific production environment, potentially limiting generalizability. The primary limitation is the explicit restriction to research and study purposes. Datasets are point-in-time snapshots, not continuous streams.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days