clusterdata  by alibaba

Production cluster traces for data center research

Created 8 years ago
1,875 stars

Top 23.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary:

Alibaba's Cluster Trace Program offers researchers and engineers real-world production cluster data from Alibaba's data centers. It addresses the need for realistic datasets to understand modern internet data center characteristics and workloads, enabling research into workload characterization, resource management, and scheduling algorithms. The benefit lies in providing a foundation for validating new ideas and improving cluster efficiency with data derived from large-scale, operational environments.

How It Works:

The project releases diverse trace datasets capturing different aspects and scales of Alibaba's production infrastructure over specific periods. These include machine configurations, workload types (online services, batch jobs, AI/ML, microservices), resource utilization, and microarchitectural metrics. Versions like cluster-trace-v2018 include DAG information for batch workloads, while GPU traces focus on AI/ML workloads. This approach provides granular, production-level insights advantageous for simulating realistic scenarios and developing robust scheduling strategies.

Quick Start & Requirements:

Access to trace datasets requires completing a short online survey. Specific subdirectories contain data, schemas, and processing/visualization scripts tailored to each trace version. Users will likely need standard data processing tools (e.g., Python libraries) to utilize the data.

Highlighted Details:

  • Offers diverse trace datasets (2017-2025) covering general clusters, GPU AI/ML, microservices, and microarchitecture performance.
  • Supports research into workload characterization, efficient workload assignment, and inter-scheduler collaboration in collocated environments.
  • Validated by numerous academic publications at top-tier conferences (USENIX NSDI, ATC, ACM SoCC), demonstrating significant research impact.
  • Includes specialized datasets like AMTrace (v2022) for microarchitecture analysis and v2025 for GPU-disaggregated Deep Learning Recommendation Models.

Maintenance & Community:

Engagement is encouraged via GitHub issues for questions and discussions. Users are invited to report publications using the traces. Alibaba plans periodic new trace releases. Contact is available via email (alibaba-clusterdata) or directly with maintainers.

Licensing & Compatibility:

Datasets are provided strictly for "research or study purpose." The specific open-source license is not explicitly stated, and this usage restriction may limit commercial applications or integration into closed-source systems.

Limitations & Caveats:

The data reflects Alibaba's specific production environment, potentially limiting generalizability. The primary limitation is the explicit restriction to research and study purposes. Datasets are point-in-time snapshots, not continuous streams.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
34 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 6 months ago
Feedback? Help us improve.