nlu  by JohnSnowLabs

Python SDK for NLP tasks, built on Spark NLP

created 5 years ago
933 stars

Top 40.0% on sourcepulse

GitHubView on GitHub
Project Summary

NLU (Natural Language Understanding) is a Python library designed to simplify the application of state-of-the-art NLP models. It acts as a facade for Spark NLP, offering over 1000 pre-trained models across 200+ languages, accessible with a single line of code. This library is ideal for data scientists and developers looking for a fast, accurate, and scalable solution for text analysis tasks.

How It Works

NLU leverages the power of Spark NLP, a distributed NLP library built on Apache Spark ML. It provides a unified API that abstracts away the complexities of Spark NLP, allowing users to load and apply models directly on various data structures like Pandas DataFrames, Spark DataFrames, and NumPy arrays. This approach enables efficient processing of large datasets and seamless integration into existing data science workflows.

Quick Start & Requirements

Highlighted Details

  • Supports 1000+ pre-trained models in 200+ languages for tasks like sentiment analysis, NER, POS tagging, and translation.
  • Offers a wide range of word and sentence embeddings, including BERT, ELMO, ALBERT, and XLNET.
  • Provides utilities for text cleaning, normalization, and matching.
  • Integrates tightly with Streamlit for interactive model exploration and web app development.

Maintenance & Community

  • Active community with over 2000+ AI enthusiasts on Slack.
  • Discussion forum available for in-depth discussions.
  • Resources include Medium articles, YouTube tutorials, and GitHub issues for bug reporting.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the underlying Spark NLP library is Apache 2.0 licensed, suggesting potential compatibility with commercial use.

Limitations & Caveats

  • The README mentions pyspark==3.0.2 as a requirement, which might limit compatibility with newer Spark versions.
  • Some advanced features or specific models might require additional setup or dependencies not detailed in the README.
Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

spark-nlp by JohnSnowLabs

0.1%
4k
NLP library for scalable ML pipelines
created 7 years ago
updated 1 day ago
Feedback? Help us improve.