Chinese-offensive-language-detect  by royal12646

Chinese offensive language detection

Created 11 months ago
614 stars

Top 53.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a system for detecting six categories of harmful Chinese text: sexually explicit, abusive (including phonetic variations), and discriminatory content based on region, gender, race, or occupation. It targets NLP researchers and developers aiming to build more robust and safer Chinese language processing systems, offering a dataset and fine-tuned models for this purpose.

How It Works

The system employs a multi-stage approach. First, it generates a comprehensive dataset of harmful and safe text pairs using LLMs, keywords, and "model jailbreaking" techniques to prevent keyword overfitting. It then fine-tunes the hfl/chinese-macbert-base model into three specialized detectors: one for sexually explicit content, one for abusive language, and one for bias-related discrimination. Phonetic variations are handled by converting text to pinyin before analysis. Finally, an ensemble method combines the outputs of these specialized detectors using learnable weights to produce a general-purpose offensive language detector.

Quick Start & Requirements

  • Environment Setup: Create a conda environment (conda create -n off-detect python=3.9), activate it (conda activate off-detect), and install dependencies (pip install requirements.txt).
  • Training: Run python train.py.
  • Testing: Execute python test.py.
  • Demo: Navigate to ./Chinese-offensive-language-detect/Demo, allow TCP port 5000 (sudo ufw allow 5000/tcp), and run the Flask app (python Flask/app.py). A separate terminal is needed to activate the environment, navigate to ./Chinese-offensive-language-detect/Demo/User, and run node procedure.js. The frontend can be started with npm run dev from the ./Chinese-offensive-language-detect/Demo/ directory.
  • Prerequisites: Python 3.9.

Highlighted Details

  • Detects six types of harmful content: sexually explicit, abusive (including phonetic), regional, gender, racial, and occupational discrimination.
  • Utilizes LLM-generated datasets with keyword-based generation and "model jailbreaking" to enhance robustness.
  • Employs an ensemble learning approach with learnable weights for a generalized detection model.
  • Handles phonetic offensive language by converting text to pinyin.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmaps are provided in the README.

Licensing & Compatibility

The code's license is not explicitly stated. The associated dataset, "ZHateBench: A Comprehensive Chinese Offensive Language Dataset with Harmful–Safe Pairs," is available via Zenodo (DOI: 10.5281/zenodo.16812052), typically implying an open-access license for research purposes. Compatibility for commercial use or linking with closed-source projects is not specified.

Limitations & Caveats

The project focuses exclusively on the Chinese language. While the dataset is AI-generated to cover various harms, its inherent biases or potential gaps compared to real-world offensive language are not detailed. The README does not mention specific performance benchmarks or known limitations of the detection models.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
2
Star History
619 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.