open-metric-learning  by OML-Team

PyTorch framework for training models producing high-quality embeddings

Created 3 years ago
972 stars

Top 37.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Open Metric Learning (OML), a PyTorch-based framework for training and validating models that produce high-quality embeddings for retrieval tasks. It targets researchers and engineers working with large-scale datasets where traditional classification methods fall short, offering a structured approach to metric learning with pre-trained models and practical pipelines.

How It Works

OML focuses on end-to-end pipelines and practical use cases, abstracting the complexities of metric learning. It leverages PyTorch Lightning for efficient training, especially with distributed data parallelism (DDP). The framework provides specialized samplers (e.g., CategoryBalanceSampler) and miners (e.g., HardTripletsMiner) to construct effective training batches, aiming to achieve state-of-the-art results with simpler heuristics compared to complex mathematical approaches.

Quick Start & Requirements

  • Install via pip: pip install -U open-metric-learning (with optional extras like [nlp], [audio], [pipelines]). Docker images are also available (omlteam/oml:gpu, omlteam/oml:cpu).
  • Supports Python 3.10-3.12.
  • Official documentation: https://open-metric-learning.readthedocs.io/en/latest/
  • Tutorials: English

Highlighted Details

  • Pipelines: Config-driven training and validation for images, texts, and audio.
  • Zoo: Access to pre-trained models for various modalities (e.g., ViT, ECAPA-TDNN) similar to torchvision.
  • Integration: Works with PyTorch Lightning and can be used with pure PyTorch.
  • State-of-the-Art Performance: Achieves competitive results on benchmarks like SOP and DeepFashion using custom samplers and miners.

Maintenance & Community

The project is actively maintained by the OML-Team, with contributions from university researchers and industry professionals. Community engagement channels are available via GitHub.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

ONNX export is not directly supported but can be achieved using PyTorch capabilities. The framework's primary focus is on PyTorch, and integration with other deep learning frameworks would require custom wrappers.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.