HanLP  by hankcs

Multilingual NLP library for research/industry, built on PyTorch and TensorFlow

created 10 years ago
35,443 stars

Top 0.9% on sourcepulse

GitHubView on GitHub
Project Summary

HanLP is a comprehensive multilingual NLP library designed for researchers and enterprises, offering advanced deep learning techniques for tasks like tokenization, POS tagging, NER, and dependency parsing across 130 languages. It provides both lightweight RESTful APIs for agile development and native Python APIs for deeper integration, aiming to deliver state-of-the-art performance efficiently and with ease of use.

How It Works

HanLP leverages PyTorch and TensorFlow 2.x, building upon open-access corpora like Universal Dependencies and OntoNotes. It supports multi-task learning (MTL) for joint task performance and offers mono-lingual models that often outperform multilingual ones for specific languages. The library emphasizes reproducibility, guaranteeing that reported scores can be replicated.

Quick Start & Requirements

  • RESTful API: pip install hanlp_restful
  • Native Python API: pip install hanlp (Requires Python 3.6+)
  • Hardware: GPU/TPU acceleration recommended but not mandatory.
  • Documentation: docs

Highlighted Details

  • Supports 10 joint NLP tasks across 130 languages.
  • Offers both multilingual and superior mono-lingual models.
  • Guarantees reproducible performance scores.
  • Includes functionality to train custom models.

Maintenance & Community

  • Active development with a focus on reproducibility.
  • Community forum available.

Licensing & Compatibility

  • Library licensed under Apache License 2.0, allowing commercial use.
  • Models are licensed under CC BY-NC-SA 4.0, restricting commercial use.

Limitations & Caveats

Multi-task learning models may underperform single-task models, and mono-lingual models generally outperform multilingual ones. Users targeting high accuracy should prioritize single-task mono-lingual models.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
571 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
8 more.

gpt-3 by openai

0.0%
16k
Research paper on large language model few-shot learning
created 5 years ago
updated 4 years ago
Feedback? Help us improve.