Production-Level-Deep-Learning by alirezadir

Guideline for production-level deep learning systems

Created 6 years ago

4,591 stars

Top 10.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

and 1 more!

Project Summary

This repository provides a comprehensive engineering guideline for building production-level deep learning systems. It targets engineers and researchers aiming to deploy ML models in real-world applications, offering best practices and tool recommendations across the entire ML lifecycle, from data management to deployment and monitoring.

How It Works

The guideline breaks down the ML system into key components: Data Management, Development/Training/Evaluation, Testing/Deployment, and Monitoring. For each component, it recommends specific tools, frameworks, and best practices derived from industry workshops and practitioners. The approach emphasizes a full-stack perspective, acknowledging that production ML involves much more than just model training, including data versioning, workflow orchestration, experiment tracking, and robust deployment strategies.

Quick Start & Requirements

Installation: No direct installation command; this is a guideline document.
Prerequisites: Familiarity with deep learning concepts, Python, and common ML frameworks (TensorFlow, PyTorch).
Resources: The document references various tools and platforms, some of which may have their own installation and resource requirements (e.g., Docker, Kubernetes, cloud platforms).
Links:
- Full Stack Deep Learning Bootcamp: https://fullstackdeeplearning.com/
- TFX Workshop: https://www.youtube.com/watch?v=f0_f_f73_00
- Kubeflow Meetup: https://www.youtube.com/watch?v=h7035f17h_0

Highlighted Details

Addresses the high failure rate of AI projects (85%) by focusing on practical production challenges.
Covers data management aspects like data labeling, storage, versioning (DVC, Pachyderm), and processing (Airflow, Luigi).
Recommends tools for experiment management (Tensorboard, MLFlow, Weights & Biases) and hyperparameter tuning (RayTune, Keras Tuner).
Details deployment strategies including containerization (Docker, Kubernetes), model serving (TF Serving, Clipper), and service meshes (Istio).

Maintenance & Community

The repository is under continuous development, welcoming feedback and contributions.
Mentions specific workshops and contributors from industry (UC Berkeley, OpenAI, Turnitin, Google, Uber).

Licensing & Compatibility

The repository itself is not a software package with a license. The content is presented as a guideline.
Tools and frameworks mentioned within the guideline have their own licenses, which users must adhere to.

Limitations & Caveats

The document is a guideline and does not provide executable code or a deployable system.
Some sections are marked as "TBD" (To Be Determined), indicating areas that are still under development or require further detail.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

47 stars in the last 30 days