spacy-transformers  by explosion

spaCy extension for transformer models

created 6 years ago
1,391 stars

Top 29.6% on sourcepulse

GitHubView on GitHub
Project Summary

This package provides spaCy components and architectures to integrate Hugging Face's transformer models (BERT, XLNet, GPT-2, etc.) into spaCy pipelines. It enables users to leverage state-of-the-art NLP models for tasks within the spaCy ecosystem, offering convenient access to powerful pre-trained representations.

How It Works

The package introduces a Transformer pipeline component that acts as a bridge to Hugging Face's transformers library. It handles the automatic alignment of transformer outputs to spaCy's tokenization, allowing seamless integration. This approach simplifies using advanced transformer architectures within spaCy's established pipeline structure and configuration system.

Quick Start & Requirements

  • Install via pip: pip install 'spacy[transformers]'
  • Requirements: Python 3.6+, PyTorch v1.5+, spaCy v3.0+.
  • GPU installation requires specifying CUDA version: spacy[transformers,cudaXX] (e.g., spacy[transformers,cuda110]).
  • Documentation: https://spacy.io/usage/transformers

Highlighted Details

  • Enables multi-task learning by backpropagating from multiple pipeline components to a single transformer.
  • Integrates with spaCy v3's configuration system for training and customization.
  • Supports automatic alignment of transformer outputs to spaCy's tokenization.
  • Facilitates customization of saved transformer data and document processing length.

Maintenance & Community

  • Issues and bug reports should be filed on spaCy's issue tracker.
  • Discussion threads can be opened on the spaCy discussion board.

Licensing & Compatibility

  • The package is distributed under the MIT License.
  • Compatible with spaCy v3.x.

Limitations & Caveats

The Transformer component itself does not directly support task-specific heads (e.g., for token or text classification). For using pre-trained classification models, the spacy-huggingface-pipelines package is recommended.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
6 more.

x-transformers by lucidrains

0.2%
5k
Transformer library with extensive experimental features
created 4 years ago
updated 3 days ago
Feedback? Help us improve.