benchmarks-ai-io  by Marlon666

AI I/O performance benchmarking and optimization recipes

Created 3 months ago
351 stars

Top 79.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This repository provides open, reproducible benchmarks and practical recipes designed to identify and mitigate I/O bottlenecks in AI training and large-scale inference workloads. It targets engineers, researchers, and power users who need to optimize storage performance and reduce operational costs across a spectrum of hardware, from high-end GPU clusters to modest CPU-equipped laptops. The project offers a neutral, scriptable harness that emulates common AI data-path stress patterns, enabling teams to validate storage tiers, tune application-level settings, and quantify the return on investment for cloud provider storage optimizations.

How It Works The core of the project is a scriptable harness engineered to emulate diverse AI workload I/O patterns, effectively isolating and analyzing storage layer performance. It models distinct stress profiles characteristic of training (e.g., periodic checkpointing, dataset enumeration, continuous streaming) and inference (e.g., request fan-out, hot/cold data splits, micro-batch assembly). By meticulously measuring critical I/O behaviors such as metadata fan-out, pagination costs, checkpoint throughput, and data loading latency, the benchmarks reveal bottlenecks. These can stem from specific AI frameworks like PyTorch, TensorFlow, or JAX, or from distributed training stacks such as DeepSpeed, Megatron, FSDP, and ZeRO. The benchmarks generate portable CSV and YAML outputs, facilitating direct comparisons across different environments and informing the development of practical optimization playbooks.

Quick Start & Requirements

  • Install: Clone the repository. Create a Python environment (version 3.10+ recommended) and install necessary dependencies using pip install -r <benchmark_folder>/requirements.txt for the chosen benchmark module.
  • Prerequisites: Python 3.10+ is recommended. Benchmarks are designed to run on commodity hardware without requiring GPUs, though they scale effectively to GPU-backed clusters for detailed utilization analysis.
  • Links: Configuration details and specific instructions for each benchmark module can be found in their respective README files, such as listing_folder_benchmarks/README.md and serving_benchmarks/README.md.

Highlighted Details

  • Emulates a wide array of AI workload I/O behaviors, including metadata-intensive operations, high-throughput checkpointing, and deep data prefetching.
  • Provides compatibility and analysis across various popular AI frameworks (PyTorch, TensorFlow, JAX) and advanced distributed training strategies (DeepSpeed, FSDP, ZeRO).
  • Generates standardized, portable CSV and YAML outputs, ensuring reproducible results and enabling straightforward comparisons across different cloud providers, filesystems, and hardware configurations.
  • Delivers practical optimization playbooks and recipes that translate benchmark findings into actionable strategies for improving model performance and reducing $/token costs.
  • Features distinct benchmark modules: Listing Emulated Benchmark (LEB) for metadata performance, Serving Benchmarks for inference I/O, and Checkpointing Benchmarks for training resilience.

Maintenance & Community Contributions are actively welcomed through GitHub issues for workload emulation requests or pull requests for new modules and configurations. Users can seek support, share tuning tips, or request specific benchmarks by filing an issue or initiating a discussion thread. The project roadmap includes near-term additions such as a checkpoint churn simulator and more complex mixed workload stressors.

Licensing & Compatibility All code within the repository is released under the terms specified in the LICENSE file. The specific license type (e.g., MIT, Apache 2.0) and any associated restrictions for commercial use or closed-source linking are detailed therein and require direct review. The benchmarks are designed for broad compatibility, running on commodity hardware and scaling to GPU-accelerated environments.

Limitations & Caveats The project roadmap indicates ongoing development, with several modules listed as upcoming, suggesting that the current feature set may evolve. The precise licensing terms are not explicitly stated in the README text provided, necessitating a review of the repository's LICENSE file to ascertain compatibility for commercial applications. While designed for broad hardware support, comprehensive performance validation might benefit from execution on GPU-backed systems.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
126 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
9 more.

3FS by deepseek-ai

0.3%
10k
Distributed file system for AI training/inference workloads
Created 10 months ago
Updated 6 days ago
Feedback? Help us improve.