AzurePublicDataset  by Azure

Public Azure traces for cloud workload research

created 8 years ago
964 stars

Top 39.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides Microsoft Azure's public traces for the research community, focusing on Virtual Machines (VMs), Azure Functions, and Large Language Model (LLM) inference workloads. It offers detailed datasets for workload analysis, resource management, and system optimization, benefiting researchers in cloud computing and distributed systems.

How It Works

The project releases sanitized, real-world traces collected from Azure's infrastructure. These datasets include VM utilization, function invocations, blob accesses, LLM input/output tokens, and benchmark noise data. The traces are provided as-is, with accompanying Jupyter notebooks for comparative analysis and links to research papers detailing their use and methodology.

Quick Start & Requirements

  • Data is available via direct download links provided within the repository's documentation, often associated with specific research papers.
  • No specific installation commands are required; data is for analysis using standard data science tools.
  • Access to large storage for datasets and computational resources for analysis are recommended.

Highlighted Details

  • Includes VM traces from 2017 and 2019, detailing ~2M-2.6M VMs and over 1B utilization readings.
  • Features Azure Functions invocation and blob access traces from 2019-2021.
  • Provides LLM inference traces from 2023-2024, including input/output tokens.
  • Offers VM benchmark noise data collected over 483 days (May 2023 - Sep 2024).

Maintenance & Community

This project is a collaboration between Azure and Microsoft Research. Users are encouraged to contact a provided mailing list for issues and questions.

Licensing & Compatibility

The repository does not explicitly state a license. Traces are provided for research and academic use, with specific citation requirements for associated papers. Commercial use or integration into closed-source systems may require explicit permission or adherence to Microsoft's terms of service.

Limitations & Caveats

Traces are sanitized subsets of actual workloads and may not represent the entirety of Azure's operations. Specific details regarding data format consistency across all datasets and potential biases introduced by sanitization are not fully detailed.

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
47 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.