featureform  by featureform

A virtual feature store for ML

Created 5 years ago
1,945 stars

Top 22.5% on SourcePulse

GitHubView on GitHub
Project Summary

Featureform provides a virtual feature store abstraction, enabling data scientists to define, manage, and serve ML features by orchestrating existing data infrastructure. It addresses collaboration, experimentation, deployment, reliability, and compliance challenges for individual data scientists and enterprise teams, leveraging current data stacks.

How It Works

Featureform acts as an infrastructure-agnostic framework, transforming existing data sources and compute engines (like Spark) into a functional feature store. It manages feature definitions, lineage, and deployment orchestration without replacing or computing data itself. This approach allows teams to utilize their preferred data infrastructure while benefiting from a standardized feature store abstraction, enhancing reusability and reliability through immutable definitions.

Quick Start & Requirements

Featureform can be deployed locally via Docker or within Kubernetes environments, connecting to existing cloud infrastructure. Official guides are available for Kubernetes deployment and a Docker quickstart. The project encourages community participation via Slack and provides contribution documentation.

Highlighted Details

  • Native embeddings support includes versioned embedding tables and integration with vector databases for training and inference.
  • Enforces immutability for features, labels, and training sets, ensuring consistent and safe re-use.
  • Orchestrates data pipelines and handles distributed system complexities like retry logic.
  • Offers built-in role-based access control, audit logs, and dynamic serving rules for governance.

Maintenance & Community

The project fosters community engagement through a Slack channel and provides clear contribution guidelines. Issue reporting is encouraged to aid development.

Licensing & Compatibility

Released under the Mozilla Public License 2.0 (MPL 2.0), which permits commercial use and linking, provided modifications to the licensed code are shared under the same terms.

Limitations & Caveats

The README does not detail specific limitations, alpha status, or known bugs. Its effectiveness relies on the configuration and stability of the underlying data infrastructure it orchestrates.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.