label-studio-ml-backend  by HumanSignal

SDK for wrapping ML code into a web server for Label Studio automation

created 5 years ago
798 stars

Top 45.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Python SDK and boilerplate configurations for integrating custom machine learning models with Label Studio, an open-source data labeling platform. It enables users to automate labeling tasks by serving ML models as web servers that can be connected to a Label Studio instance, supporting pre-annotation, interactive labeling, and model training.

How It Works

The ML backend acts as a bridge between Label Studio and your ML models. It exposes an API that Label Studio calls to get predictions or to train models based on annotations. The SDK provides a base class LabelStudioMLBase that users can inherit from, overriding methods like predict and fit to implement their model's inference and training logic. This approach allows for flexible integration of various ML frameworks and custom model architectures.

Quick Start & Requirements

  • Install/Run: Use docker-compose up within a model's example directory (e.g., label_studio_ml/examples/{MODEL_NAME}).
  • Prerequisites: docker-compose, LABEL_STUDIO_URL, LABEL_STUDIO_API_KEY environment variables for data access.
  • Setup Time: Minimal for provided examples; depends on model complexity for custom development.
  • Docs: https://labelstud.io/guide/ml.html

Highlighted Details

  • Supports a wide range of models including text classification (BERT, scikit-learn), NER (Flair, GLiNER, Hugging Face, SpaCy), OCR (EasyOCR, Tesseract), object detection (MMDetection, YOLO, GroundingDINO, Grounding SAM), and LLMs (Hugging Face, Langchain, WatsonX).
  • Offers interactive labeling capabilities for models like GLiNER, Grounding SAM, and Tesseract.
  • Includes functionality for model training and updating based on user annotations.
  • Provides a label-studio-ml create command to scaffold new custom ML backends.

Maintenance & Community

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The provided ML backend examples are intended for development and may not support production-level inference serving, potentially leading to "Bad Gateway" or "Service Unavailable" errors under heavy load. Windows users may encounter issues with line endings in shell scripts, requiring specific Git configuration adjustments.

Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
6
Issues (30d)
4
Star History
97 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
1 more.

refinery by code-kern-ai

0.1%
1k
Open-source tool for NLP data scaling, assessment, and maintenance
created 3 years ago
updated 7 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

autolabel by refuel-ai

0.3%
2k
Python library to label text datasets using LLMs
created 2 years ago
updated 5 months ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

cleanlab by cleanlab

0.3%
11k
Data-centric AI package for ML with messy data
created 7 years ago
updated 3 weeks ago
Feedback? Help us improve.