finetuner by jina-ai

Cloud tool for task-oriented embedding finetuning of models like BERT and CLIP

Created 4 years ago

1,508 stars

Top 27.2% on SourcePulse

View on GitHub

8 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Wing Lian

Founder of Axolotl AI

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Bryan Helmig

Cofounder of Zapier

and 4 more!

Project Summary

Jina Finetuner is a Python library designed to simplify and accelerate the process of fine-tuning embedding models for neural search and other AI tasks. It targets developers and researchers seeking to improve embedding quality for applications like semantic search, recommendation systems, and cross-modal retrieval, offering significant performance gains with minimal data and compute.

How It Works

Finetuner streamlines the fine-tuning workflow by abstracting away infrastructure complexity and handling cloud-based GPU training. It supports a wide array of mainstream loss functions, optimizers, and advanced techniques like layer pruning, weight freezing, and distributed training, enabling users to achieve state-of-the-art performance on domain-specific data with relatively small datasets and short training times.

Quick Start & Requirements

Install via pip: pip install -U finetuner
For cloud-based fine-tuning jobs: pip install "finetuner[full]"
Requires Python 3.8+.
Cloud computing features require an account on Jina AI Cloud. The last version supporting local computing is 0.4.1.

Highlighted Details

Achieves significant performance improvements (e.g., 15.8% mRR on Quora QA, 78.2% mAP on visual similarity) with as few as a few hundred samples.
Offers over 40 loss functions, 10 optimizers, and features like dimensionality reduction and hard-negative mining.
Supports various embedding models including BERT, CLIP, ResNet, and PointNet++.
Benchmarks demonstrate substantial gains across diverse tasks like text-to-image search and 3D mesh search.

Maintenance & Community

Finetuner is backed by Jina AI. Community support is available via Jina AI's Discord server and public events.

Licensing & Compatibility

Licensed under Apache-2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

Starting with version 0.5.0, computing is exclusively performed on Jina AI Cloud, requiring users to transition from local execution or use older versions for local training.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days