label-studio-ml-backend  by HumanSignal

SDK for wrapping ML code into a web server for Label Studio automation

Created 5 years ago
844 stars

Top 42.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a Python SDK and boilerplate configurations for integrating custom machine learning models with Label Studio, an open-source data labeling platform. It enables users to automate labeling tasks by serving ML models as web servers that can be connected to a Label Studio instance, supporting pre-annotation, interactive labeling, and model training.

How It Works

The ML backend acts as a bridge between Label Studio and your ML models. It exposes an API that Label Studio calls to get predictions or to train models based on annotations. The SDK provides a base class LabelStudioMLBase that users can inherit from, overriding methods like predict and fit to implement their model's inference and training logic. This approach allows for flexible integration of various ML frameworks and custom model architectures.

Quick Start & Requirements

  • Install/Run: Use docker-compose up within a model's example directory (e.g., label_studio_ml/examples/{MODEL_NAME}).
  • Prerequisites: docker-compose, LABEL_STUDIO_URL, LABEL_STUDIO_API_KEY environment variables for data access.
  • Setup Time: Minimal for provided examples; depends on model complexity for custom development.
  • Docs: https://labelstud.io/guide/ml.html

Highlighted Details

  • Supports a wide range of models including text classification (BERT, scikit-learn), NER (Flair, GLiNER, Hugging Face, SpaCy), OCR (EasyOCR, Tesseract), object detection (MMDetection, YOLO, GroundingDINO, Grounding SAM), and LLMs (Hugging Face, Langchain, WatsonX).
  • Offers interactive labeling capabilities for models like GLiNER, Grounding SAM, and Tesseract.
  • Includes functionality for model training and updating based on user annotations.
  • Provides a label-studio-ml create command to scaffold new custom ML backends.

Maintenance & Community

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The provided ML backend examples are intended for development and may not support production-level inference serving, potentially leading to "Bad Gateway" or "Service Unavailable" errors under heavy load. Windows users may encounter issues with line endings in shell scripts, requiring specific Git configuration adjustments.

Health Check
Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
9
Issues (30d)
3
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
3 more.

refinery by code-kern-ai

0%
1k
Open-source tool for NLP data scaling, assessment, and maintenance
Created 3 years ago
Updated 9 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Wing Lian Wing Lian(Founder of Axolotl AI).

xtreme1 by xtreme1-io

0.5%
1k
Open-source platform for multimodal training data annotation
Created 3 years ago
Updated 2 months ago
Feedback? Help us improve.