cntext  by hiDaDeng

Text analysis package for NLP tasks, including LLMs

created 3 years ago
362 stars

Top 78.7% on sourcepulse

GitHubView on GitHub
Project Summary

cntext is a comprehensive Python library for text analysis, offering a wide range of functionalities from basic statistics like word count and readability to advanced techniques such as sentiment analysis, document similarity, and word embeddings. It is designed for researchers and practitioners in fields like economics, management, and social sciences who need to process and analyze large volumes of text data.

How It Works

cntext integrates traditional text analysis methods with modern word embedding techniques. It provides tools for calculating various statistical metrics, including readability scores based on sentence structure and word complexity. For semantic analysis, it supports building custom dictionaries and leveraging pre-trained word embeddings (Word2Vec, GloVe) to compute semantic distances and projections, enabling the extraction of cognitive information like bias and conceptual understanding from text.

Quick Start & Requirements

  • Install via pip: pip install cntext
  • Requires Python.
  • Official documentation and examples are available.

Highlighted Details

  • Supports Chinese and English text analysis.
  • Includes a diverse set of built-in sentiment lexicons and tools for creating custom ones.
  • Offers methods for semantic similarity (cosine, Jaccard, edit distance).
  • Features Text2Mind for uncovering cognitive biases and conceptual information from word embeddings.

Maintenance & Community

  • The project has accumulated over 69,000 downloads as of March 2025.
  • Offers both a free public version (1.x) and a paid private version (2.x).
  • Community resources include Bilibili and WeChat official accounts.

Licensing & Compatibility

  • The project is available under a permissive license, allowing for commercial use and integration with closed-source projects.
  • A citation is requested for academic or project use.

Limitations & Caveats

  • The sentiment function does not account for intensifiers or negation.
  • Training GloVe embeddings can be time-consuming.
  • The README mentions a paid version (cntext2.x) which may have different features or licensing.
Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.