Discover and explore top open-source AI tools and projects—updated daily.
AI data quality evaluation tool
Top 65.4% on SourcePulse
Dingo is a comprehensive AI data quality evaluation tool designed for LLM and multimodal datasets, targeting researchers and engineers. It automates the detection of data quality issues using a flexible system of built-in and custom rules and model-based assessments, enhancing dataset reliability for pre-training, fine-tuning, and evaluation stages.
How It Works
Dingo employs a hybrid approach combining rule-based checks and LLM-driven evaluations. Rule-based checks utilize over 20 heuristic rules for common issues like completeness and format, while LLM evaluations leverage models (OpenAI, Kimi, local) with customizable prompts to assess quality dimensions such as helpfulness, harmlessness, and relevance. This dual approach allows for both automated, deterministic checks and nuanced, context-aware quality assessments.
Quick Start & Requirements
pip install dingo-python
Highlighted Details
Maintenance & Community
The project is actively maintained by the Dingo Contributors. Community engagement is encouraged via Discord and WeChat. Contribution guidelines are provided.
Licensing & Compatibility
Licensed under Apache 2.0. Dependencies like fasttext use the MIT License, which is compatible. This license permits commercial use and integration with closed-source projects.
Limitations & Caveats
The current rule and model focus on common data quality problems; specialized needs may require custom rule development. Future plans include expanding to audio/video modalities and small model evaluation.
5 days ago
Inactive