Python library for ML training data pipelines
Top 64.2% on sourcepulse
Grain is a Python library for efficient, deterministic, and flexible reading and processing of machine learning training data, primarily targeting JAX models but usable with other frameworks. It enables users to define complex data pipelines declaratively, simplifying the preparation of datasets for training and evaluation.
How It Works
Grain employs a declarative API for defining data processing pipelines. Users chain transformations like shuffle
, map
, and batch
to construct a data flow. This approach allows for clear, readable pipeline definitions and enables Grain to optimize the execution of these steps, ensuring deterministic and efficient data handling.
Quick Start & Requirements
pip install grain
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library does not directly utilize GPUs or TPUs for its transformations, meaning all processing is CPU-bound. Windows is not a supported platform.
1 day ago
1+ week