hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference  by kaiwaehner

Real-time ML for IoT using streaming data from MQTT via Kafka to TensorFlow

created 6 years ago
416 stars

Top 71.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project demonstrates real-time big data machine learning for IoT scenarios, specifically predictive maintenance for connected cars. It targets engineers and researchers needing to build scalable, cloud-native IoT data pipelines for model training and inference without relying on traditional data stores like S3 or HDFS. The primary benefit is a simplified, cost-effective architecture for streaming machine learning.

How It Works

The system ingests data from simulated IoT devices via HiveMQ (MQTT), streams it into Apache Kafka for preprocessing (Kafka Streams/KSQL), and then directly into TensorFlow using the tensorflow-io Kafka plugin. This approach eliminates the need for intermediate data storage, streamlining the pipeline for model training and real-time inference. It supports both unsupervised (Autoencoder) and supervised (LSTM) learning models.

Quick Start & Requirements

  • Install/Run: Deployment is managed via Terraform scripts on Google Kubernetes Engine (GKE).
  • Prerequisites: Requires local installation of gcloud, kubectl, helm, and terraform.
  • Setup: Two to three shell commands to set up the infrastructure.
  • Demo: A 20-minute video and blog post are available for guidance.
  • Docs: Quick Start Guide

Highlighted Details

  • Demonstrates real-time model training and inference from streaming IoT data.
  • Achieves scalability for tens of thousands of devices and millions of messages per second.
  • Eliminates the need for additional data stores (S3, HDFS, Spark) by using Kafka directly with TensorFlow I/O.
  • Supports both unsupervised (Autoencoder) and supervised (LSTM) learning models.

Maintenance & Community

The project is authored by Kai Waehner, a recognized expert in IoT, Kafka, and AI. Further details and discussions can be found via linked blog posts and presentations.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the specific licenses of the components used (HiveMQ, Kafka, TensorFlow).

Limitations & Caveats

While the project simplifies architecture, it still performs batch training rather than true online learning. The setup relies on GCP and GKE, though the architecture is presented as applicable to other cloud providers. The README mentions using enterprise components from HiveMQ and Confluent, which may have licensing implications.

Health Check
Last commit

4 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tobi Lutke Tobi Lutke(Cofounder of Shopify), and
15 more.

skypilot by skypilot-org

0.4%
8k
Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure
created 4 years ago
updated 1 day ago
Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.