Discover and explore top open-source AI tools and projects—updated daily.
webis-deActive Learning for efficient text classification data labeling
Top 52.1% on SourcePulse
Small-Text addresses the challenge of efficiently labeling training data for text classification, particularly when labeled data is scarce. It offers state-of-the-art active learning strategies, allowing users to easily combine pre-implemented query strategies, initialization methods, and stopping criteria with classifiers from scikit-learn, PyTorch, or Hugging Face Transformers. This accelerates the development of supervised text classification models by intelligently selecting the most informative data points for manual annotation, benefiting researchers and practitioners alike.
How It Works
The library provides a unified interface for active learning workflows. Users can select from various scientifically evaluated components and integrate them with popular machine learning frameworks. It supports GPU acceleration via PyTorch and seamless integration with Transformers for leveraging advanced text classification models. This modular design facilitates experimentation and application building, optimizing the data labeling process by reducing manual annotation effort.
Quick Start & Requirements
pip install small-text for a slim installation; pip install small-text[transformers] for full installation including transformer support.Highlighted Details
Maintenance & Community
Developed by Christopher Schröder at Leipzig University's NLP group (Webis). The project is funded by the Development Bank of Saxony. Contributions are welcomed. A community survey on active learning in NLP was conducted in March 2026.
Licensing & Compatibility
Licensed under the MIT License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
Version 2.0.0.dev3 is an alpha release and may not have stable interfaces. The project emphasizes its progress and feature set, noting that simple counts do not fully represent its capabilities.
5 days ago
Inactive
code-kern-ai
antirez
mlfoundations
facebookresearch
catalyst-team