Public Azure traces for cloud workload research
Top 39.0% on sourcepulse
This repository provides Microsoft Azure's public traces for the research community, focusing on Virtual Machines (VMs), Azure Functions, and Large Language Model (LLM) inference workloads. It offers detailed datasets for workload analysis, resource management, and system optimization, benefiting researchers in cloud computing and distributed systems.
How It Works
The project releases sanitized, real-world traces collected from Azure's infrastructure. These datasets include VM utilization, function invocations, blob accesses, LLM input/output tokens, and benchmark noise data. The traces are provided as-is, with accompanying Jupyter notebooks for comparative analysis and links to research papers detailing their use and methodology.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
This project is a collaboration between Azure and Microsoft Research. Users are encouraged to contact a provided mailing list for issues and questions.
Licensing & Compatibility
The repository does not explicitly state a license. Traces are provided for research and academic use, with specific citation requirements for associated papers. Commercial use or integration into closed-source systems may require explicit permission or adherence to Microsoft's terms of service.
Limitations & Caveats
Traces are sanitized subsets of actual workloads and may not represent the entirety of Azure's operations. Specific details regarding data format consistency across all datasets and potential biases introduced by sanitization are not fully detailed.
5 months ago
Inactive