Real-time ML for IoT using streaming data from MQTT via Kafka to TensorFlow
Top 71.5% on sourcepulse
This project demonstrates real-time big data machine learning for IoT scenarios, specifically predictive maintenance for connected cars. It targets engineers and researchers needing to build scalable, cloud-native IoT data pipelines for model training and inference without relying on traditional data stores like S3 or HDFS. The primary benefit is a simplified, cost-effective architecture for streaming machine learning.
How It Works
The system ingests data from simulated IoT devices via HiveMQ (MQTT), streams it into Apache Kafka for preprocessing (Kafka Streams/KSQL), and then directly into TensorFlow using the tensorflow-io
Kafka plugin. This approach eliminates the need for intermediate data storage, streamlining the pipeline for model training and real-time inference. It supports both unsupervised (Autoencoder) and supervised (LSTM) learning models.
Quick Start & Requirements
gcloud
, kubectl
, helm
, and terraform
.Highlighted Details
Maintenance & Community
The project is authored by Kai Waehner, a recognized expert in IoT, Kafka, and AI. Further details and discussions can be found via linked blog posts and presentations.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the specific licenses of the components used (HiveMQ, Kafka, TensorFlow).
Limitations & Caveats
While the project simplifies architecture, it still performs batch training rather than true online learning. The setup relies on GCP and GKE, though the architecture is presented as applicable to other cloud providers. The README mentions using enterprise components from HiveMQ and Confluent, which may have licensing implications.
4 years ago
1 day