hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference by kaiwaehner

Real-time ML for IoT using streaming data from MQTT via Kafka to TensorFlow

Created 6 years ago

418 stars

Top 70.1% on SourcePulse

Project Summary

This project demonstrates real-time big data machine learning for IoT scenarios, specifically predictive maintenance for connected cars. It targets engineers and researchers needing to build scalable, cloud-native IoT data pipelines for model training and inference without relying on traditional data stores like S3 or HDFS. The primary benefit is a simplified, cost-effective architecture for streaming machine learning.

How It Works

The system ingests data from simulated IoT devices via HiveMQ (MQTT), streams it into Apache Kafka for preprocessing (Kafka Streams/KSQL), and then directly into TensorFlow using the tensorflow-io Kafka plugin. This approach eliminates the need for intermediate data storage, streamlining the pipeline for model training and real-time inference. It supports both unsupervised (Autoencoder) and supervised (LSTM) learning models.

Quick Start & Requirements

Install/Run: Deployment is managed via Terraform scripts on Google Kubernetes Engine (GKE).
Prerequisites: Requires local installation of gcloud, kubectl, helm, and terraform.
Setup: Two to three shell commands to set up the infrastructure.
Demo: A 20-minute video and blog post are available for guidance.
Docs: Quick Start Guide

Highlighted Details

Demonstrates real-time model training and inference from streaming IoT data.
Achieves scalability for tens of thousands of devices and millions of messages per second.
Eliminates the need for additional data stores (S3, HDFS, Spark) by using Kafka directly with TensorFlow I/O.
Supports both unsupervised (Autoencoder) and supervised (LSTM) learning models.

Maintenance & Community

The project is authored by Kai Waehner, a recognized expert in IoT, Kafka, and AI. Further details and discussions can be found via linked blog posts and presentations.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the specific licenses of the components used (HiveMQ, Kafka, TensorFlow).

Limitations & Caveats

While the project simplifies architecture, it still performs batch training rather than true online learning. The setup relies on GCP and GKE, though the architecture is presented as applicable to other cloud providers. The README mentions using enterprise components from HiveMQ and Confluent, which may have licensing implications.

hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference by kaiwaehner

Explore Similar Projects

upgini by upgini

chronon by airbnb

data-analytics-golden-demo by GoogleCloudPlatform

bytewax by bytewax

kubeai by kubeai-project

mlops-with-vertex-ai by GoogleCloudPlatform

streaming by mosaicml

kafka-streams-machine-learning-examples by kaiwaehner

composer by mosaicml

feast by feast-dev

kserve by kserve

serve by jina-ai