Discover and explore top open-source AI tools and projects—updated daily.
alibabaProduction cluster traces for data center research
Top 23.1% on SourcePulse
Summary:
Alibaba's Cluster Trace Program offers researchers and engineers real-world production cluster data from Alibaba's data centers. It addresses the need for realistic datasets to understand modern internet data center characteristics and workloads, enabling research into workload characterization, resource management, and scheduling algorithms. The benefit lies in providing a foundation for validating new ideas and improving cluster efficiency with data derived from large-scale, operational environments.
How It Works:
The project releases diverse trace datasets capturing different aspects and scales of Alibaba's production infrastructure over specific periods. These include machine configurations, workload types (online services, batch jobs, AI/ML, microservices), resource utilization, and microarchitectural metrics. Versions like cluster-trace-v2018 include DAG information for batch workloads, while GPU traces focus on AI/ML workloads. This approach provides granular, production-level insights advantageous for simulating realistic scenarios and developing robust scheduling strategies.
Quick Start & Requirements:
Access to trace datasets requires completing a short online survey. Specific subdirectories contain data, schemas, and processing/visualization scripts tailored to each trace version. Users will likely need standard data processing tools (e.g., Python libraries) to utilize the data.
Highlighted Details:
Maintenance & Community:
Engagement is encouraged via GitHub issues for questions and discussions. Users are invited to report publications using the traces. Alibaba plans periodic new trace releases. Contact is available via email (alibaba-clusterdata) or directly with maintainers.
Licensing & Compatibility:
Datasets are provided strictly for "research or study purpose." The specific open-source license is not explicitly stated, and this usage restriction may limit commercial applications or integration into closed-source systems.
Limitations & Caveats:
The data reflects Alibaba's specific production environment, potentially limiting generalizability. The primary limitation is the explicit restriction to research and study purposes. Datasets are point-in-time snapshots, not continuous streams.
3 weeks ago
Inactive
Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab),
google
grahamjenson
google-research
triton-inference-server
tensorflow