Binoculars by ahans30

Zero-shot tool for detecting LLM-generated text

Created 2 years ago

336 stars

Top 81.9% on SourcePulse

Project Summary

Binoculars offers a zero-shot, domain-agnostic method for detecting AI-generated text, targeting researchers and developers needing to identify machine-written content without task-specific training. It leverages the shared pretraining data of decoder-only language models to achieve this detection.

How It Works

Binoculars operates on the principle that common pretraining datasets like Common Crawl and Pile create a predictable statistical fingerprint in LLM outputs. By analyzing deviations from this expected distribution, it can identify text likely generated by an LLM. This approach avoids the need for fine-tuning on specific datasets, making it broadly applicable.

Quick Start & Requirements

Install via pip: pip install -e . after cloning the repository.
Requires Python 3.9.
Demo available via python app.py.
Official paper and demo links provided in the README.

Highlighted Details

Zero-shot and domain-agnostic detection.
Based on shared LLM pretraining dataset overlap.
Provides a classification score and prediction.
Can process batches of text.

Maintenance & Community

The project is associated with authors from ICML 2024. Further community or maintenance details are not specified in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project is marked for academic purposes only.

Limitations & Caveats

Binoculars is more proficient with English text and is intended for academic use, not as a consumer product. Users are cautioned against relying on it without human supervision.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

LaCLIP by LijieFan

Research paper code and models for improving CLIP training via language rewrites

Created 2 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

llm-classifier by lamini-ai

LLM classifier for instant data classification using Llama 2

Created 2 years ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

tiger by tigerlab-ai

Open-source LLM toolkit for building trustworthy applications

Created 2 years ago

Updated 2 years ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

LMaaS-Papers by txsun1997

Curated list of LMaaS research papers

Created 3 years ago

Updated 1 year ago

Starred by

Travis Addair

Travis Addair(Cofounder of Predibase),

Travis Fischer

Travis Fischer(Founder of Agentic), and

3 more.

evaporate by HazyResearch

Code and data for a research paper on using LLMs to generate structured views of data lakes

Created 2 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Awesome-LLM4IE-Papers by quqxui

Curated list of LLM papers for generative information extraction (IE)

Created 2 years ago

Updated 1 year ago

detect-gpt by eric-mitchell

Research paper implementation for zero-shot machine-generated text detection

Created 2 years ago

Updated 2 years ago

fast-detect-gpt by baoguangsheng

Zero-shot machine-generated text detection via conditional probability curvature

Created 2 years ago

Updated 4 months ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo) and

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

TextBox by RUCAIBox

Text generation library with pre-trained language models

Created 5 years ago

Updated 2 years ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

LISA by JIA-Lab-research

Reasoning segmentation assistant via LLM

Created 2 years ago

Updated 10 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and

Piero Molino

Piero Molino(Cofounder of Predibase).

PPLM by uber-research

PPLM: Steerable text generation research paper

Created 6 years ago

Updated 1 year ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

6 more.

augmentoolkit by e-p-armstrong

Data toolkit for custom LLM creation using open-source AI

Created 2 years ago

Updated 2 months ago

Feedback? Help us improve.