Data-centric AI package for ML with messy data
Top 4.8% on sourcepulse
Cleanlab is a data-centric AI package designed to automatically detect and fix issues in machine learning datasets, particularly those with messy, real-world data and labels. It empowers users to improve model reliability across various ML tasks, including supervised learning, LLMs, and RAG applications, by leveraging existing models to identify problems like outliers, duplicates, and label errors.
How It Works
Cleanlab employs state-of-the-art confident learning algorithms, grounded in peer-reviewed research, to estimate dataset problems. It works by using an existing ML model's predictions and embeddings to diagnose issues within the data. This approach is advantageous as it requires no changes to existing modeling code and can be applied universally across any dataset type (text, image, audio, tabular) and any ML model (PyTorch, TensorFlow, XGBoost, etc.).
Quick Start & Requirements
pip install cleanlab
or conda install cleanlab
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 weeks ago
1 day