finetune  by IndicoDataSolutions

NLP finetuning library with scikit-learn style API

created 7 years ago
712 stars

Top 49.1% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a scikit-learn-like interface for fine-tuning state-of-the-art NLP models, targeting researchers and developers who need to adapt pre-trained models to specific downstream tasks. It simplifies the process of leveraging models like BERT, RoBERTa, and GPT for classification, regression, and sequence labeling.

How It Works

Finetune abstracts the complexity of transformer architectures and training loops behind a familiar Classifier.fit() API. Users select a base model (e.g., BERT, RoBERTa) and configure training parameters (learning rate, length, regularization) to fine-tune on custom datasets. It supports multi-stage fine-tuning, first on unlabeled data and then on labeled data, to maximize performance.

Quick Start & Requirements

  • Install via pip: pip3 install finetune
  • Requires TensorFlow-GPU >= 1.14.0 and up-to-date NVIDIA drivers.
  • spaCy English tokenizer: python3 -m spacy download en
  • Docker images are available for GPU and CPU-only usage.
  • Full documentation: finetune.indico.io

Highlighted Details

  • Supports BERT, RoBERTa, GPT, GPT2, TextCNN, TCN, and DistilBERT.
  • Offers diverse task-specific wrappers: Classifier, Regressor, SequenceLabeler, Comparison, etc.
  • Enables two-stage fine-tuning for improved performance with limited labeled data.
  • Includes a DeploymentModel for optimizing serialized models for production.

Maintenance & Community

The project is maintained by IndicoDataSolutions. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The library requires TensorFlow 1.x, which is deprecated. The README mentions tensorflow-gpu >= 1.14.0, indicating it is not compatible with TensorFlow 2.x.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.