baibaiAIGC by poleHansen

De-AIing Chinese academic documents

Created 3 months ago

921 stars

Top 38.8% on SourcePulse

Project Summary

Summary

baibaiAIGC addresses the challenge of reducing AIGC-generated traces in Chinese academic papers and technical documents. It provides a structured, multi-round rewriting process to reduce AI markers while preserving original meaning, terminology, and academic style. Offering Web, Script, and Chat Skill interfaces, it targets users needing iterative refinement of AI-assisted content.

How It Works

The core methodology employs a strict two-round sequential rewriting process (1 -> 2). Documents are segmented into chunks, processed via an external OpenAI-compatible API, and reassembled to maintain original paragraph structure. This ensures systematic processing of long documents, prevents new fact introduction, and preserves original terminology, logic, and academic tone.

Quick Start & Requirements

Installation: Python dependencies (pip install -r requirements.txt), Web frontend (cd app && npm install).
Prerequisites: Python 3.x, Node.js/npm. Requires an OpenAI-compatible API endpoint (key, model, base URL) for Web/Script modes, configurable via environment variables or CLI.
Usage Modes:
- Web Mode: Run backend (python scripts/web_app.py) and frontend (cd app && npm run dev:web).
- Script API Mode: Use scripts/run_aigc_round.py with specified parameters.
- Dialogue Skill Mode: Integrate via SKILL.md without manual API setup.
Input: Place .txt or .docx files in origin/.
Links: SKILL.md, references/usage.md, references/checklist.md.

Highlighted Details

Supports .txt and .docx files, with utilities for Word document extraction and rebuilding.
Features a rigid two-round sequential rewriting workflow (1 then 2).
Processes long documents via chunking (default 850 chars), respecting paragraph and sentence boundaries.
Offers distinct interfaces: local Web UI, command-line Script API, and integrated Chat Skill.

Maintenance & Community

Acknowledges feedback from the "linuxdo (linux.do) community". No specific community channel links or maintainer details are provided.

Licensing & Compatibility

No license information is specified in the README. This omission may hinder commercial use or integration.

Limitations & Caveats

Long documents require sequential, chunk-based processing; single-pass rewriting is unsupported. Dialogue Skill mode may be unstable for lengthy inputs. The two-round sequence is fixed. Focus is on stylistic refinement, not content alteration for detection evasion.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

64 stars in the last 30 days