Production-Level-Deep-Learning  by alirezadir

Guideline for production-level deep learning systems

created 5 years ago
4,488 stars

Top 11.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive engineering guideline for building production-level deep learning systems. It targets engineers and researchers aiming to deploy ML models in real-world applications, offering best practices and tool recommendations across the entire ML lifecycle, from data management to deployment and monitoring.

How It Works

The guideline breaks down the ML system into key components: Data Management, Development/Training/Evaluation, Testing/Deployment, and Monitoring. For each component, it recommends specific tools, frameworks, and best practices derived from industry workshops and practitioners. The approach emphasizes a full-stack perspective, acknowledging that production ML involves much more than just model training, including data versioning, workflow orchestration, experiment tracking, and robust deployment strategies.

Quick Start & Requirements

Highlighted Details

  • Addresses the high failure rate of AI projects (85%) by focusing on practical production challenges.
  • Covers data management aspects like data labeling, storage, versioning (DVC, Pachyderm), and processing (Airflow, Luigi).
  • Recommends tools for experiment management (Tensorboard, MLFlow, Weights & Biases) and hyperparameter tuning (RayTune, Keras Tuner).
  • Details deployment strategies including containerization (Docker, Kubernetes), model serving (TF Serving, Clipper), and service meshes (Istio).

Maintenance & Community

  • The repository is under continuous development, welcoming feedback and contributions.
  • Mentions specific workshops and contributors from industry (UC Berkeley, OpenAI, Turnitin, Google, Uber).

Licensing & Compatibility

  • The repository itself is not a software package with a license. The content is presented as a guideline.
  • Tools and frameworks mentioned within the guideline have their own licenses, which users must adhere to.

Limitations & Caveats

  • The document is a guideline and does not provide executable code or a deployable system.
  • Some sections are marked as "TBD" (To Be Determined), indicating areas that are still under development or require further detail.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.