PyTorch framework for training models producing high-quality embeddings
Top 38.9% on sourcepulse
This repository provides Open Metric Learning (OML), a PyTorch-based framework for training and validating models that produce high-quality embeddings for retrieval tasks. It targets researchers and engineers working with large-scale datasets where traditional classification methods fall short, offering a structured approach to metric learning with pre-trained models and practical pipelines.
How It Works
OML focuses on end-to-end pipelines and practical use cases, abstracting the complexities of metric learning. It leverages PyTorch Lightning for efficient training, especially with distributed data parallelism (DDP). The framework provides specialized samplers (e.g., CategoryBalanceSampler
) and miners (e.g., HardTripletsMiner
) to construct effective training batches, aiming to achieve state-of-the-art results with simpler heuristics compared to complex mathematical approaches.
Quick Start & Requirements
pip install -U open-metric-learning
(with optional extras like [nlp]
, [audio]
, [pipelines]
). Docker images are also available (omlteam/oml:gpu
, omlteam/oml:cpu
).Highlighted Details
torchvision
.Maintenance & Community
The project is actively maintained by the OML-Team, with contributions from university researchers and industry professionals. Community engagement channels are available via GitHub.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
ONNX export is not directly supported but can be achieved using PyTorch capabilities. The framework's primary focus is on PyTorch, and integration with other deep learning frameworks would require custom wrappers.
3 months ago
1 day