text  by pytorch

PyTorch library for NLP tasks

Created 8 years ago
3,559 stars

Top 13.7% on SourcePulse

GitHubView on GitHub
Project Summary

TorchText provides PyTorch-native tools for natural language processing, offering datasets, data processing utilities, pre-trained models, and tokenizers. It aims to simplify NLP workflows for researchers and developers building with PyTorch, though its development has ceased with the 0.18 release.

How It Works

TorchText integrates with torchdata for efficient dataset loading and provides a modular architecture for text processing. It includes components for vocabulary management, text transformations (like tokenization and normalization), and pre-trained model integration, enabling streamlined NLP pipeline construction.

Quick Start & Requirements

  • Install via pip: pip install torchtext or conda: conda install -c pytorch torchtext.
  • Requires PyTorch (version compatibility table in README).
  • Optional: pip install spacy and python -m spacy download en_core_web_sm for SpaCy tokenizer.
  • Documentation: https://pytorch.org/text/

Highlighted Details

  • Supports numerous NLP datasets (e.g., WikiText, Multi30k, SQuAD).
  • Integrates pre-trained models like RoBERTa, XLM-RoBERTa, and T5 variants.
  • Offers various tokenizers: SentencePiece, GPT-2 BPE, CLIP, RE2, BERT.
  • Includes tutorials for common NLP tasks like text classification and translation.

Maintenance & Community

TorchText development has stopped, with the 0.18 release (April 2024) being the last stable version.

Licensing & Compatibility

TorchText is released under a BSD-3-Clause license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The library's development has been discontinued, meaning no further updates or bug fixes are expected. Users should be aware of potential compatibility issues with future PyTorch versions or evolving NLP research trends.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.