cntext  by hiDaDeng

Text analysis package for NLP tasks, including LLMs

Created 3 years ago
364 stars

Top 77.2% on SourcePulse

GitHubView on GitHub
Project Summary

cntext is a comprehensive Python library for text analysis, offering a wide range of functionalities from basic statistics like word count and readability to advanced techniques such as sentiment analysis, document similarity, and word embeddings. It is designed for researchers and practitioners in fields like economics, management, and social sciences who need to process and analyze large volumes of text data.

How It Works

cntext integrates traditional text analysis methods with modern word embedding techniques. It provides tools for calculating various statistical metrics, including readability scores based on sentence structure and word complexity. For semantic analysis, it supports building custom dictionaries and leveraging pre-trained word embeddings (Word2Vec, GloVe) to compute semantic distances and projections, enabling the extraction of cognitive information like bias and conceptual understanding from text.

Quick Start & Requirements

  • Install via pip: pip install cntext
  • Requires Python.
  • Official documentation and examples are available.

Highlighted Details

  • Supports Chinese and English text analysis.
  • Includes a diverse set of built-in sentiment lexicons and tools for creating custom ones.
  • Offers methods for semantic similarity (cosine, Jaccard, edit distance).
  • Features Text2Mind for uncovering cognitive biases and conceptual information from word embeddings.

Maintenance & Community

  • The project has accumulated over 69,000 downloads as of March 2025.
  • Offers both a free public version (1.x) and a paid private version (2.x).
  • Community resources include Bilibili and WeChat official accounts.

Licensing & Compatibility

  • The project is available under a permissive license, allowing for commercial use and integration with closed-source projects.
  • A citation is requested for academic or project use.

Limitations & Caveats

  • The sentiment function does not account for intensifiers or negation.
  • Training GloVe embeddings can be time-consuming.
  • The README mentions a paid version (cntext2.x) which may have different features or licensing.
Health Check
Last Commit

22 hours ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.