ImBD  by Jiaqi-Chen-00

Detecting machine-revised text with style optimization

Created 1 year ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Machine-revised text detection is a challenging problem due to subtle stylistic changes. The ImBD framework offers a novel solution by aligning machine stylistic preferences, enabling state-of-the-art detection performance for revisions made by LLMs like GPT-3.5 and GPT-4o. This project is targeted at researchers and practitioners needing to identify AI-generated or modified content efficiently, even with limited training data.

How It Works

The core of ImBD lies in its Style Preference Optimization (SPO) and Style-CPC components, designed to effectively capture machine-style phrasing. This approach excels at identifying subtle stylistic nuances that differentiate human-originated text from machine-revised content, offering an advantage in accuracy and efficiency over traditional methods.

Quick Start & Requirements

  • Installation: Set up a Conda environment (conda create -n ImBD python=3.10), activate it (conda activate ImBD), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, CUDA-enabled GPU.
  • Resource Footprint: Inference and fast evaluation require approximately 11GB of GPU memory. Training and reproducing multi-domain results demand around 40GB of GPU memory.
  • Links: Website, Paper, Data, Model, and Demo are mentioned as available.

Highlighted Details

  • Achieves state-of-the-art performance in detecting revisions from various LLMs, including GPT-3.5 and GPT-4o.
  • Demonstrates significant efficiency, requiring minimal training data.
  • Supports multi-domain and multilingual text revision detection.
  • Provides comprehensive scripts for local demo, result reproduction, and evaluation of other methods.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap.

Licensing & Compatibility

The provided README does not specify the project's license or any compatibility notes for commercial use.

Limitations & Caveats

The project lists several "TODO" items, including the implementation of inference code specifically for detection, optimization of trained model preservation, and further optimization of GPU memory usage for evaluation scripts. The current inference checkpoint contains only LoRA weights, necessitating the separate download of the base model.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.