LLMmap  by pasquini-dario

LLM fingerprinting for model identification

Created 1 year ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

LLMmap is a tool designed to identify Large Language Models (LLMs) by analyzing their behavioral traces through minimal, targeted queries. It aims to provide high-accuracy LLM fingerprinting for researchers and developers. The project has been rebuilt in PyTorch (v0.2), offering updated models and procedures for enhanced capabilities.

How It Works

The core methodology involves sending a set of carefully constructed queries, each wrapped in diverse prompt configurations, to target LLMs. LLMmap analyzes the resulting responses to generate a unique behavioral fingerprint for each model. This fingerprint is then used to identify unknown LLMs by comparing their behavioral patterns against a database of known models. The PyTorch-based approach focuses on efficiency and accuracy in this identification process.

Quick Start & Requirements

  • Installation: Install dependencies using pip install -r requirements.txt.
  • Prerequisites: Requires Python 3.11.
  • Pre-trained Model: A default, ready-to-use inference model is provided at ./data/pretrained_models/default/, including PyTorch weights, configuration, and behavioral templates for 52 LLMs.
  • Usage: Can be used programmatically in Python code or via an interactive script (main_interactive.py).
  • Documentation: A paper detailing the methodology is available.

Highlighted Details

  • Open-Set Fingerprinting: Allows extending the pre-trained model with new LLM templates without full retraining.
  • Custom Dataset Creation: The make_dataset.py script automates the generation of training and testing datasets by querying specified LLMs with configurable prompts and queries.
  • Accuracy Evaluation: Includes a script (test_model.py) to evaluate the top-k accuracy of pre-trained models against a curated list of supported LLMs.
  • PyTorch Rebuild (v0.2): The project has been re-implemented in PyTorch, featuring updated models and procedures.

Maintenance & Community

The project welcomes contributions to keep pace with the rapidly evolving LLM landscape, with an email address (chime.infant_0g@icloud.com) provided for collaboration inquiries. A paper detailing the research is available.

Licensing & Compatibility

The provided README does not specify a software license. This absence may pose compatibility concerns for commercial use or integration into closed-source projects.

Limitations & Caveats

The recent PyTorch rebuild (v0.2) is not a direct one-to-one conversion from previous versions, potentially leading to differences in models and procedures. Currently, the add_new_template.py script for extending the model's capabilities only supports Hugging Face LLMs, with plans for broader backend support. The lack of a specified license is a significant caveat.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
51 stars in the last 30 days

Explore Similar Projects

Starred by Ross Wightman Ross Wightman(Author of timm; CV at Hugging Face), Awni Hannun Awni Hannun(Author of MLX; Research Scientist at Apple), and
1 more.

mlx-llm by riccardomusmeci

0%
461
LLM tools/apps for Apple Silicon using MLX
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Didier Lopes Didier Lopes(Founder of OpenBB), and
3 more.

instructlab by instructlab

0.1%
1k
CLI tool for LLM alignment tuning via synthetic data
Created 2 years ago
Updated 1 week ago
Feedback? Help us improve.