pororo  by kakaobrain

NLP SDK for natural language and speech processing tasks

created 4 years ago
1,303 stars

Top 31.3% on sourcepulse

GitHubView on GitHub
Project Summary

PORORO is a comprehensive Natural Language Processing (NLP) and speech processing toolkit designed for ease of use. It provides a unified interface for researchers and developers to access and utilize a wide range of pre-trained models for various language and speech tasks, simplifying complex NLP pipelines.

How It Works

PORORO abstracts away the complexities of individual NLP model implementations by offering a task-centric API. Users instantiate a Pororo object with a specified task (e.g., "ner", "mt", "asr") and language, and then pass input data directly to the object. This approach allows for quick experimentation and integration of diverse NLP capabilities without needing to manage individual model dependencies or architectures.

Quick Start & Requirements

  • Install via pip: pip install pororo
  • Requires PyTorch 1.6 (CUDA 10.1) and Python >= 3.6.
  • Automatic Speech Recognition (ASR) and Speech Synthesis (TTS) require separate installations via asr-install.sh and tts-install.sh respectively.
  • Official documentation: https://github.com/kakaobrain/pororo

Highlighted Details

  • Supports over 30 NLP and speech tasks, including NER, Machine Translation, Sentiment Analysis, and ASR.
  • Offers multi-language support for many tasks (e.g., English, Korean, Japanese, Chinese).
  • Allows selection of specific models for tasks that have multiple implementations (e.g., different MT models).
  • Provides access to word embeddings, POS tagging, and constituency parsing.

Maintenance & Community

  • Developed by Kakao Brain.
  • Key contributors include Hoon Heo, Hyunwoong Ko, Soohwan Kim, Gunsoo Han, Jiwoo Park, and Kyubyong Park.
  • Issue reporting available via GitHub issues.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library specifies a dependency on PyTorch 1.6 with CUDA 10.1, which may be outdated and incompatible with newer hardware or CUDA versions. Separate installation scripts are required for speech-related tasks, adding to the setup complexity.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.