LLMmap by pasquini-dario

LLM fingerprinting for model identification

Created 2 years ago

372 stars

Top 75.8% on SourcePulse

Project Summary

LLMmap is a tool designed to identify Large Language Models (LLMs) by analyzing their behavioral traces through minimal, targeted queries. It aims to provide high-accuracy LLM fingerprinting for researchers and developers. The project has been rebuilt in PyTorch (v0.2), offering updated models and procedures for enhanced capabilities.

How It Works

The core methodology involves sending a set of carefully constructed queries, each wrapped in diverse prompt configurations, to target LLMs. LLMmap analyzes the resulting responses to generate a unique behavioral fingerprint for each model. This fingerprint is then used to identify unknown LLMs by comparing their behavioral patterns against a database of known models. The PyTorch-based approach focuses on efficiency and accuracy in this identification process.

Quick Start & Requirements

Installation: Install dependencies using pip install -r requirements.txt.
Prerequisites: Requires Python 3.11.
Pre-trained Model: A default, ready-to-use inference model is provided at ./data/pretrained_models/default/, including PyTorch weights, configuration, and behavioral templates for 52 LLMs.
Usage: Can be used programmatically in Python code or via an interactive script (main_interactive.py).
Documentation: A paper detailing the methodology is available.

Highlighted Details

Open-Set Fingerprinting: Allows extending the pre-trained model with new LLM templates without full retraining.
Custom Dataset Creation: The make_dataset.py script automates the generation of training and testing datasets by querying specified LLMs with configurable prompts and queries.
Accuracy Evaluation: Includes a script (test_model.py) to evaluate the top-k accuracy of pre-trained models against a curated list of supported LLMs.
PyTorch Rebuild (v0.2): The project has been re-implemented in PyTorch, featuring updated models and procedures.

Maintenance & Community

The project welcomes contributions to keep pace with the rapidly evolving LLM landscape, with an email address (chime.infant_0g@icloud.com) provided for collaboration inquiries. A paper detailing the research is available.

Licensing & Compatibility

The provided README does not specify a software license. This absence may pose compatibility concerns for commercial use or integration into closed-source projects.

Limitations & Caveats

The recent PyTorch rebuild (v0.2) is not a direct one-to-one conversion from previous versions, potentially leading to differences in models and procedures. Currently, the add_new_template.py script for extending the model's capabilities only supports Hugging Face LLMs, with plans for broader backend support. The lack of a specified license is a significant caveat.

LLMmap by pasquini-dario

Explore Similar Projects

LLM-Zoo by DAMO-NLP-SG

ToolkenGPT by Ber666

mlx-llm by riccardomusmeci

LLM-Interview-Code by ckd0817

Awesome-AIGC by wshzd

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing by ghimiresunil

bert4torch by Tongjilibo

how-to-train-your-gpt by raiyanyahya

instructlab by instructlab

one-small-step by karminski

Awesome-Chinese-LLM by AiHubCN

happy-llm by datawhalechina