nlu  by JohnSnowLabs

Python SDK for NLP tasks, built on Spark NLP

Created 5 years ago
938 stars

Top 39.0% on SourcePulse

GitHubView on GitHub
Project Summary

NLU (Natural Language Understanding) is a Python library designed to simplify the application of state-of-the-art NLP models. It acts as a facade for Spark NLP, offering over 1000 pre-trained models across 200+ languages, accessible with a single line of code. This library is ideal for data scientists and developers looking for a fast, accurate, and scalable solution for text analysis tasks.

How It Works

NLU leverages the power of Spark NLP, a distributed NLP library built on Apache Spark ML. It provides a unified API that abstracts away the complexities of Spark NLP, allowing users to load and apply models directly on various data structures like Pandas DataFrames, Spark DataFrames, and NumPy arrays. This approach enables efficient processing of large datasets and seamless integration into existing data science workflows.

Quick Start & Requirements

Highlighted Details

  • Supports 1000+ pre-trained models in 200+ languages for tasks like sentiment analysis, NER, POS tagging, and translation.
  • Offers a wide range of word and sentence embeddings, including BERT, ELMO, ALBERT, and XLNET.
  • Provides utilities for text cleaning, normalization, and matching.
  • Integrates tightly with Streamlit for interactive model exploration and web app development.

Maintenance & Community

  • Active community with over 2000+ AI enthusiasts on Slack.
  • Discussion forum available for in-depth discussions.
  • Resources include Medium articles, YouTube tutorials, and GitHub issues for bug reporting.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the underlying Spark NLP library is Apache 2.0 licensed, suggesting potential compatibility with commercial use.

Limitations & Caveats

  • The README mentions pyspark==3.0.2 as a requirement, which might limit compatibility with newer Spark versions.
  • Some advanced features or specific models might require additional setup or dependencies not detailed in the README.
Health Check
Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
1 more.

spark-nlp by JohnSnowLabs

0.0%
4k
NLP library for scalable ML pipelines
Created 8 years ago
Updated 3 days ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), and
42 more.

spaCy by explosion

0.1%
32k
NLP library for production applications
Created 11 years ago
Updated 3 months ago
Feedback? Help us improve.