humanize-chinese by voidborne-d

Local Chinese AI text detection and humanization tool

Created 4 months ago

373 stars

Top 75.7% on SourcePulse

Project Summary

This project provides a free, local, and zero-dependency tool for detecting and humanizing AI-generated Chinese text. It addresses the growing need to reduce AI detection scores for academic submissions, content creation, and general writing, offering a significant benefit by making AI-generated text more human-like without requiring external services or API keys.

How It Works

The tool employs a multi-faceted approach combining rule-based patterns and statistical analysis. Its detection mechanism leverages over 20 rules, N-gram perplexity, GLTR rank bucketing, DivEye surprisal, and sentence length burstiness, calibrated against the HC3-Chinese dataset. The rewriting process uses a "perplexity-guided picker" strategy to select human-like word choices, injects low-frequency bigrams, randomizes sentence lengths, and applies paraphrasing templates. This method is advantageous due to its local execution, speed, and lack of external dependencies, making it accessible and efficient.

Quick Start & Requirements

Installation can be done via clawhub install humanize-chinese, cloning the repository, or using a Claude Code Skill. The tool requires only a pure Python standard library and no external pip installations. It runs locally with fast execution times, often within seconds for typical tasks.

Highlighted Details

Zero dependencies, runs locally, no LLM or API keys required.
Offers both AI detection (0-100 score) and text humanization.
Specialized academic AIGC reduction for Chinese platforms like 知网, 维普, and 万方.
Supports 7 distinct Chinese writing style transformations (e.g., Xiaohongshu, Zhihu, Novel).
Detection and rewriting algorithms are grounded in numerous research papers.
Achieves high detection accuracy (95.5% on HC3-Chinese benchmark) and significant score reduction post-rewriting (average 40.6 points).

Maintenance & Community

The project is maintained by voidborne-d. Specific community channels like Discord or Slack are not detailed in the provided README.

Licensing & Compatibility

The software is released under the MIT Non-Commercial license. Commercial use, including selling the software, offering it as a paid service, or integrating it into commercial products, is strictly prohibited without explicit authorization from the author via GitHub.

Limitations & Caveats

The fused detection model is stringent, potentially assigning high scores to typical AI outputs. The statistical detection layer does not utilize neural networks, which may result in lower AUC compared to some SOTA detectors but ensures zero dependencies. Direct integration with academic platform APIs is not feasible due to their unavailability and anti-scraping measures. The tool aims to make text more human-like but does not guarantee 100% evasion of all AI detectors. Short, fact-based Q&A texts are more challenging to humanize effectively.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days