image-classification-with-local-vlms by Paulescu

Local VLMs for Edge AI Image Classification

Created 9 months ago

373 stars

Top 75.7% on SourcePulse

Project Summary

Summary

This repository provides a practical guide to building and deploying high-accuracy, low-latency image classifiers on edge devices using local Visual Language Models (VLMs). It targets engineers and researchers aiming for practical edge AI solutions, offering a step-by-step methodology for specialized classifier development without cloud dependency.

How It Works

The project employs a progressive learning approach through increasingly complex image classification tasks, starting with cats vs. dogs. It utilizes open-weight VLMs (LFM2-VL family) and demonstrates key techniques: building evaluation pipelines, implementing structured generation (via Outlines) for controlled VLM output, and applying LoRA-based supervised fine-tuning for production-grade accuracy. iOS deployment is a planned future feature.

Quick Start & Requirements

Setup involves Python environment management with uv and executing commands like uv sync, make evaluate, or make fine-tune. A Modal account is required for GPU-accelerated computation (pay-as-you-go). Jupyter notebooks are used for result visualization. Links to uv setup and Hugging Face datasets are implied.

Highlighted Details

Achieves 100% accuracy on a cats vs. dogs classification task post-fine-tuning.
Demonstrates structured generation with libraries like Outlines to enforce VLM output formats (e.g., JSON).
Implements efficient supervised fine-tuning using LoRA for adaptive model updates.
Leverages Modal for accessible GPU acceleration in training and evaluation.
Includes tools for building evaluation pipelines and performing sample-by-sample performance analysis.

Maintenance & Community

The project appears actively maintained, with planned future tasks including advanced classification and iOS deployment. Users can subscribe to a newsletter for ongoing AI system tutorials. No explicit community channels (Discord, Slack) or roadmap links are provided.

Licensing & Compatibility

The README omits explicit licensing information. This lack of clarity is a significant adoption blocker, particularly for commercial use or integration into proprietary systems, and requires direct clarification.

Limitations & Caveats

Key features like advanced classification tasks (car brand, human action) and iOS app deployment are marked as "COMING SOON." The current focus is solely on image classification, with broader VLM applications deferred to future work. Method effectiveness depends on dataset quality and VLM selection.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days