Discover and explore top open-source AI tools and projects—updated daily.
code-kern-aiOpen-source tool for NLP data scaling, assessment, and maintenance
Top 28.1% on SourcePulse
Refinery is an open-source platform designed for data scientists to scale, assess, and maintain natural language processing (NLP) training data. It addresses the challenges of managing unstructured text data, enabling a data-centric approach to building better NLP models by semi-automating labeling, identifying low-quality data subsets, and monitoring data quality.
How It Works
Refinery employs a microservices architecture, integrating with libraries like Hugging Face Transformers and spaCy for NLP tasks and Qdrant for neural search. It supports a data-centric workflow by allowing users to define heuristics (e.g., Python functions, active learning models, zero-shot classifiers) to generate noisy labels. These heuristics, combined with manually labeled data, form a noisy label matrix used for analysis, quality assessment, and iterative model improvement.
Quick Start & Requirements
pip install kern-refineryrefinery starthttp://localhost:4455Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The open-source version is primarily single-user; multi-user capabilities and enterprise features are part of commercial offerings. Integrating custom Python libraries into the labeling function execution environment requires opening an issue for inclusion.
11 months ago
Inactive
argilla-io
microsoft
explosion