ImBD by Jiaqi-Chen-00

Detecting machine-revised text with style optimization

Created 1 year ago

252 stars

Top 99.6% on SourcePulse

Project Summary

Machine-revised text detection is a challenging problem due to subtle stylistic changes. The ImBD framework offers a novel solution by aligning machine stylistic preferences, enabling state-of-the-art detection performance for revisions made by LLMs like GPT-3.5 and GPT-4o. This project is targeted at researchers and practitioners needing to identify AI-generated or modified content efficiently, even with limited training data.

How It Works

The core of ImBD lies in its Style Preference Optimization (SPO) and Style-CPC components, designed to effectively capture machine-style phrasing. This approach excels at identifying subtle stylistic nuances that differentiate human-originated text from machine-revised content, offering an advantage in accuracy and efficiency over traditional methods.

Quick Start & Requirements

Installation: Set up a Conda environment (conda create -n ImBD python=3.10), activate it (conda activate ImBD), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.10, CUDA-enabled GPU.
Resource Footprint: Inference and fast evaluation require approximately 11GB of GPU memory. Training and reproducing multi-domain results demand around 40GB of GPU memory.
Links: Website, Paper, Data, Model, and Demo are mentioned as available.

Highlighted Details

Achieves state-of-the-art performance in detecting revisions from various LLMs, including GPT-3.5 and GPT-4o.
Demonstrates significant efficiency, requiring minimal training data.
Supports multi-domain and multilingual text revision detection.
Provides comprehensive scripts for local demo, result reproduction, and evaluation of other methods.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap.

Licensing & Compatibility

The provided README does not specify the project's license or any compatibility notes for commercial use.

Limitations & Caveats

The project lists several "TODO" items, including the implementation of inference code specifically for detection, optimization of trained model preservation, and further optimization of GPU memory usage for evaluation scripts. The current inference checkpoint contains only LoRA weights, necessitating the separate download of the base model.

ImBD by Jiaqi-Chen-00

Explore Similar Projects

zotero-AI-Butler by steven-jianhao-li

ALCE by princeton-nlp

SmartResume by alibaba

Binoculars by ahans30

Awesome-LLM4IE-Papers by quqxui

detect-gpt by eric-mitchell

description-generator by Nutlope

fast-detect-gpt by baoguangsheng

PPLM by uber-research

Curator by NVIDIA-NeMo

humanizer by blader

translation-agent by andrewyng