finetuner  by jina-ai

Cloud tool for task-oriented embedding finetuning of models like BERT and CLIP

created 4 years ago
1,507 stars

Top 27.9% on sourcepulse

GitHubView on GitHub
Project Summary

Jina Finetuner is a Python library designed to simplify and accelerate the process of fine-tuning embedding models for neural search and other AI tasks. It targets developers and researchers seeking to improve embedding quality for applications like semantic search, recommendation systems, and cross-modal retrieval, offering significant performance gains with minimal data and compute.

How It Works

Finetuner streamlines the fine-tuning workflow by abstracting away infrastructure complexity and handling cloud-based GPU training. It supports a wide array of mainstream loss functions, optimizers, and advanced techniques like layer pruning, weight freezing, and distributed training, enabling users to achieve state-of-the-art performance on domain-specific data with relatively small datasets and short training times.

Quick Start & Requirements

  • Install via pip: pip install -U finetuner
  • For cloud-based fine-tuning jobs: pip install "finetuner[full]"
  • Requires Python 3.8+.
  • Cloud computing features require an account on Jina AI Cloud. The last version supporting local computing is 0.4.1.

Highlighted Details

  • Achieves significant performance improvements (e.g., 15.8% mRR on Quora QA, 78.2% mAP on visual similarity) with as few as a few hundred samples.
  • Offers over 40 loss functions, 10 optimizers, and features like dimensionality reduction and hard-negative mining.
  • Supports various embedding models including BERT, CLIP, ResNet, and PointNet++.
  • Benchmarks demonstrate substantial gains across diverse tasks like text-to-image search and 3D mesh search.

Maintenance & Community

Finetuner is backed by Jina AI. Community support is available via Jina AI's Discord server and public events.

Licensing & Compatibility

Licensed under Apache-2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

Starting with version 0.5.0, computing is exclusively performed on Jina AI Cloud, requiring users to transition from local execution or use older versions for local training.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.