Discover and explore top open-source AI tools and projects—updated daily.
PHP library for Chinese text segmentation
Top 29.5% on SourcePulse
This PHP library provides robust Chinese text segmentation (word breaking) capabilities, suitable for developers needing to process Chinese text for analysis, search, or other NLP tasks. It offers multiple segmentation modes, supports traditional Chinese, custom dictionaries, and integrates TF-IDF for keyword extraction and POS tagging.
How It Works
The library implements core NLP algorithms for word segmentation: Trie tree for efficient word graph scanning, dynamic programming for finding maximum probability paths based on word frequency, and a Hidden Markov Model (HMM) with the Viterbi algorithm for handling unknown words. This combination ensures accurate and efficient segmentation.
Quick Start & Requirements
composer require fukuball/jieba-php
require_once "/path/to/your/vendor/autoload.php";
) and initialize classes like Jieba::init()
.Highlighted Details
Maintenance & Community
The project is actively maintained by fukuball. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The software is released under the MIT License, allowing for commercial use and integration into closed-source projects.
Limitations & Caveats
While the library notes that LLM-based segmentation may yield better results, it positions itself as a fast and cost-effective alternative. The README mentions that some words not in the dictionary might still be recognized by the Viterbi algorithm, implying potential edge cases.
1 month ago
Inactive