Discover and explore top open-source AI tools and projects—updated daily.
Data enrichment library for ML pipelines using external data sources
Top 81.5% on SourcePulse
Upgini is a Python library designed to simplify and automate the process of finding and integrating external data features into machine learning pipelines. It targets data scientists and ML engineers looking to boost model accuracy by leveraging a vast array of public, community, and premium data sources, including LLM-generated features.
How It Works
Upgini acts as an intelligent data search engine. It uses LLMs, GraphNNs, and RNNs to automatically optimize and generate relevant features from hundreds of external data sources. The library intelligently searches for features that demonstrably improve model accuracy, rather than just those correlated with the target variable. It also offers automated search key augmentation, stability checks for accuracy gains, and a scikit-learn compatible interface for seamless integration.
Quick Start & Requirements
%pip install upgini
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 day ago
Inactive