awesome-ai-infrastructures by 1duo

AI infrastructures for scalable ML production workflows

Created 7 years ago

441 stars

Top 67.8% on SourcePulse

Project Summary

This repository curates real-world AI infrastructures and production machine learning systems, pipelines, and platforms. It serves as a valuable resource for engineers and researchers seeking to understand the technology stacks required for stable, scalable, and reliable ML training and inference in production environments. The collection aims to provide a broad overview of how complex ML systems are architected and deployed.

How It Works

The repository lists and categorizes various actively maintained AI infrastructure projects. It focuses on the overall architectures of end-to-end ML training pipelines, scalable inference solutions for cloud and edge devices, compiler and optimization stacks for diverse hardware, and novel approaches to large-scale distributed training. The advantage lies in its structured presentation of production-ready ML systems rather than isolated frameworks.

Quick Start & Requirements

This repository is a curated list of resources and does not require installation or specific prerequisites. It serves as a directory to explore individual projects.

Highlighted Details

Key Platforms: Comprehensive coverage of major AI infrastructures including Google's TFX and Kubeflow, NVIDIA's RAPIDS, Uber's Michelangelo, Facebook's FBLearner, Intel's BigDL, Amazon's SageMaker, and Microsoft's NNI.
Performance Benchmarks: Details significant milestones in large-scale distributed training, showcasing ImageNet training times from hours down to minutes/seconds using advanced techniques like LARS, mixed-precision, and optimized communication protocols on GPUs and TPUs.
Hardware & Deployment Focus: Features solutions for GPU acceleration (RAPIDS, H2O4GPU), on-device inference (TensorFlow Lite, Core ML), and compiler optimization stacks (TVM, MLIR, TensorRT) for diverse hardware backends.
ML Lifecycle Management: Includes platforms like MLflow for end-to-end ML lifecycle management, covering tracking, projects, and model deployment.
Specialized Tools: Highlights AutoML capabilities (Auto-Keras, NNI, TransmogriFai), model compression (PocketFlow, Distiller), and distributed execution frameworks (Project Ray, BigDL).

Maintenance & Community

The list is maintained for personal learning purposes, with an open invitation for contributions, forks, and pull requests. No specific community channels or maintainer details beyond the originating companies of the listed projects are provided.

Licensing & Compatibility

The license for the curated list itself is not specified in the README. The individual projects linked within the repository are subject to their own respective licenses.

Limitations & Caveats

This repository is a meta-list and does not provide direct tooling or code for implementation. It serves as a directory and educational resource, requiring users to explore individual projects for their specific needs. The information is presented "in no specific order."

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days