spaCy  by explosion

NLP library for production applications

Created 11 years ago
32,489 stars

Top 1.0% on SourcePulse

GitHubView on GitHub
Project Summary

spaCy is an industrial-strength Natural Language Processing (NLP) library for Python, designed for production use. It offers state-of-the-art speed and neural network models for tasks like tokenization, tagging, parsing, and named entity recognition, supporting over 70 languages with pre-trained pipelines.

How It Works

spaCy leverages Cython for performance and integrates advanced research, including multi-task learning with transformers like BERT. Its architecture is modular, allowing for custom components and integration with PyTorch and TensorFlow. This approach prioritizes efficiency and ease of deployment in real-world applications.

Quick Start & Requirements

  • Install: pip install spacy
  • Prerequisites: Python >=3.7, <=3.12 (64-bit). Optional: spacy[lookups] for lemmatization data. GPU support requires CUDA-compatible hardware.
  • Models: Download via python -m spacy download en_core_web_sm.
  • Documentation: https://spacy.io/usage/

Highlighted Details

  • Supports 70+ languages with pre-trained pipelines.
  • Features state-of-the-art speed and neural network models.
  • Includes production-ready training system and easy model packaging.
  • Integrates with LLMs and offers visualizers for syntax and NER.

Maintenance & Community

Maintained by the spaCy team at Explosion. Community support via GitHub Discussions, Stack Overflow, and live streams. https://spacy.io/usage/spacy-101

Licensing & Compatibility

Released under the MIT license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While robust, users updating spaCy may need to retrain custom models to ensure compatibility with new versions. The README notes that some updates might require downloading new statistical models.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
6
Star History
329 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

spark-nlp by JohnSnowLabs

0.0%
4k
NLP library for scalable ML pipelines
Created 8 years ago
Updated 3 days ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
4 more.

Awesome-pytorch-list by bharathgs

0.1%
16k
Curated list of PyTorch content on GitHub
Created 8 years ago
Updated 1 year ago
Feedback? Help us improve.