Synonyms  by chatopera

NLP tools for chatbot-like applications

created 7 years ago
5,098 stars

Top 10.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Chinese synonym toolkit for natural language processing tasks such as RAG, intelligent Q&A, and semantic similarity. It offers APIs for finding synonyms, comparing sentence similarity, and extracting keywords, targeting developers and researchers working with Chinese NLP.

How It Works

The toolkit leverages word2vec models trained on a large corpus to compute semantic similarity. It provides functions to find nearby synonyms, compare sentence similarity using word embeddings, and extract keywords based on semantic importance. The approach aims for efficient and accurate synonym discovery and semantic understanding in Chinese.

Quick Start & Requirements

  • Install: pip install -U synonyms
  • Requirements: A license ID from Chatopera License Store is required for model package downloads. Set the SYNONYMS_DL_LICENSE environment variable.
  • Usage: After installation and license configuration, model files are downloaded on first use.
  • Documentation: https://github.com/chatopera/Synonyms

Highlighted Details

  • Vocabulary size of 435,729 words in the word vector model.
  • Offers APIs for nearby, compare, seg (segmentation), keywords, and retrieving word vectors.
  • Supports configuration via environment variables for word segmentation dictionaries and word2vec models.
  • Includes a demo script (demo.py) for usage examples.

Maintenance & Community

  • The project is maintained by Chatopera Inc.
  • Sponsors include Chatopera Cloud Services.
  • The project has a GitHub repository for contributions and issue tracking.

Licensing & Compatibility

  • License: Chunsong Public License, version 1.0.
  • Commercial Use: Machine learning model package downloads require a paid license from Chatopera License Store. Previous contributors may contact for fee discussions.

Limitations & Caveats

  • A paid license is required for model package downloads, which are necessary for the toolkit's functionality.
  • The Chinese segmentation function (synonyms.seg) does not remove stop words or punctuation.
Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
34 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.