text  by pytorch

PyTorch library for NLP tasks

created 8 years ago
3,550 stars

Top 13.9% on sourcepulse

GitHubView on GitHub
Project Summary

TorchText provides PyTorch-native tools for natural language processing, offering datasets, data processing utilities, pre-trained models, and tokenizers. It aims to simplify NLP workflows for researchers and developers building with PyTorch, though its development has ceased with the 0.18 release.

How It Works

TorchText integrates with torchdata for efficient dataset loading and provides a modular architecture for text processing. It includes components for vocabulary management, text transformations (like tokenization and normalization), and pre-trained model integration, enabling streamlined NLP pipeline construction.

Quick Start & Requirements

  • Install via pip: pip install torchtext or conda: conda install -c pytorch torchtext.
  • Requires PyTorch (version compatibility table in README).
  • Optional: pip install spacy and python -m spacy download en_core_web_sm for SpaCy tokenizer.
  • Documentation: https://pytorch.org/text/

Highlighted Details

  • Supports numerous NLP datasets (e.g., WikiText, Multi30k, SQuAD).
  • Integrates pre-trained models like RoBERTa, XLM-RoBERTa, and T5 variants.
  • Offers various tokenizers: SentencePiece, GPT-2 BPE, CLIP, RE2, BERT.
  • Includes tutorials for common NLP tasks like text classification and translation.

Maintenance & Community

TorchText development has stopped, with the 0.18 release (April 2024) being the last stable version.

Licensing & Compatibility

TorchText is released under a BSD-3-Clause license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The library's development has been discontinued, meaning no further updates or bug fixes are expected. Users should be aware of potential compatibility issues with future PyTorch versions or evolving NLP research trends.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
10 more.

pytext by facebookresearch

0.0%
6k
NLP framework (deprecated, migrate to torchtext)
created 7 years ago
updated 2 years ago
Feedback? Help us improve.