Discover and explore top open-source AI tools and projects—updated daily.
Analyze LLM output for repetitive patterns
Top 96.2% on SourcePulse
This toolkit addresses the identification and analysis of "slop"—over-represented lexical patterns—in Large Language Model (LLM) outputs. It enables researchers and developers to generate standardized LLM outputs, profile their repetitive word usage, create canonical slop lists, and cluster models based on linguistic similarity.
How It Works
The toolkit operates in four main stages: dataset generation, slop profiling, slop list creation, and phylogenetic tree building. Slop profiling involves counting word and phrase frequencies, filtering common words and numbers, and calculating repetition scores and vocabulary complexity. Slop lists are created by aggregating these profiles across models to identify consistently overused terms. Phylogenetic trees are then generated by treating models as species and slop term usage as genetic traits, using bioinformatics tools like PHYLIP for parsimony analysis or falling back to hierarchical clustering.
Quick Start & Requirements
pip install -r requirements.txt
punkt
, punkt_tab
, stopwords
, cmudict
), and optionally PHYLIP for phylogenetic analysis..env.example
to .env
and set OPENAI_API_KEY
and optionally PHYLIP_PATH
and OPENAI_BASE_URL
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The phylogenetic analysis relies on PHYLIP, which may require manual installation and configuration if not available via package managers. The effectiveness of slop analysis is dependent on the quality and diversity of the generated dataset prompts.
3 months ago
Inactive