Zero-shot tool for detecting LLM-generated text
Top 90.1% on sourcepulse
Binoculars offers a zero-shot, domain-agnostic method for detecting AI-generated text, targeting researchers and developers needing to identify machine-written content without task-specific training. It leverages the shared pretraining data of decoder-only language models to achieve this detection.
How It Works
Binoculars operates on the principle that common pretraining datasets like Common Crawl and Pile create a predictable statistical fingerprint in LLM outputs. By analyzing deviations from this expected distribution, it can identify text likely generated by an LLM. This approach avoids the need for fine-tuning on specific datasets, making it broadly applicable.
Quick Start & Requirements
pip install -e .
after cloning the repository.python app.py
.Highlighted Details
Maintenance & Community
The project is associated with authors from ICML 2024. Further community or maintenance details are not specified in the README.
Licensing & Compatibility
The README does not explicitly state a license. The project is marked for academic purposes only.
Limitations & Caveats
Binoculars is more proficient with English text and is intended for academic use, not as a consumer product. Users are cautioned against relying on it without human supervision.
1 year ago
Inactive