spaCy  by explosion

NLP library for production applications

created 11 years ago
32,073 stars

Top 1.1% on sourcepulse

GitHubView on GitHub
Project Summary

spaCy is an industrial-strength Natural Language Processing (NLP) library for Python, designed for production use. It offers state-of-the-art speed and neural network models for tasks like tokenization, tagging, parsing, and named entity recognition, supporting over 70 languages with pre-trained pipelines.

How It Works

spaCy leverages Cython for performance and integrates advanced research, including multi-task learning with transformers like BERT. Its architecture is modular, allowing for custom components and integration with PyTorch and TensorFlow. This approach prioritizes efficiency and ease of deployment in real-world applications.

Quick Start & Requirements

  • Install: pip install spacy
  • Prerequisites: Python >=3.7, <=3.12 (64-bit). Optional: spacy[lookups] for lemmatization data. GPU support requires CUDA-compatible hardware.
  • Models: Download via python -m spacy download en_core_web_sm.
  • Documentation: https://spacy.io/usage/

Highlighted Details

  • Supports 70+ languages with pre-trained pipelines.
  • Features state-of-the-art speed and neural network models.
  • Includes production-ready training system and easy model packaging.
  • Integrates with LLMs and offers visualizers for syntax and NER.

Maintenance & Community

Maintained by the spaCy team at Explosion. Community support via GitHub Discussions, Stack Overflow, and live streams. https://spacy.io/usage/spacy-101

Licensing & Compatibility

Released under the MIT license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While robust, users updating spaCy may need to retrain custom models to ensure compatibility with new versions. The README notes that some updates might require downloading new statistical models.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
4
Star History
653 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

maestro by roboflow

0.1%
3k
CLI/SDK for fine-tuning multimodal models
created 1 year ago
updated 5 days ago
Feedback? Help us improve.