llm-sp by chawins

LLM security/privacy resources: papers, tools, datasets, blogs

Created 2 years ago

556 stars

Top 57.5% on SourcePulse

Project Summary

This repository compiles papers and resources on the security and privacy of Large Language Models (LLMs). It serves as a curated reference for researchers and practitioners entering or working within this nascent field, offering quick access to key studies on vulnerabilities, attacks, and defenses.

How It Works

The collection is organized by primary contribution, covering areas like prompt injection, jailbreaking, adversarial attacks, privacy concerns (data extraction, membership inference), watermarking, and LLM security in general. Papers are tagged with symbols indicating their focus (e.g., ⭐ for personally recommended, 💽 for datasets, 👤 for PII focus). The curator emphasizes that ⭐ is a subjective indicator of personal understanding and enjoyment, not a measure of paper quality.

Quick Start & Requirements

Access: The primary resource is the GitHub repository itself, with a Notion page serving as a more frequently updated source.
Requirements: No specific software installation is required to browse the curated list of papers.

Highlighted Details

Extensive coverage of prompt injection techniques, including indirect methods and RCE vulnerabilities in LLM-integrated applications.
Detailed categorization of jailbreak methods, from simple prompt manipulation to complex automated attacks and persona modulation.
In-depth exploration of privacy risks, including data extraction, membership inference, and PII leakage, with a focus on empirical studies and attack methodologies.
Analysis of adversarial attacks, including white-box, black-box, and transfer attacks, with a growing focus on post-BERT era techniques.
Resources on watermarking LLM outputs for detection and defenses against various adversarial manipulations.

Maintenance & Community

The repository is maintained by the author, with contributions welcomed via GitHub issues or pull requests. The author also manually transfers updates to a Notion page.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0) given its nature as a curated list of research papers. The licensing of the individual papers referenced would vary by their original publication.

Limitations & Caveats

The paper selection is noted to be biased towards the curator's research interests, potentially leading to an incomplete overview. Distinctions between prompt injection, jailbreaking, and adversarial attacks can be fluid, and some papers may fit into multiple categories.

llm-sp by chawins

Explore Similar Projects

aegis by automorphic-ai

Awesome-ML-SP-Papers by gnipping

prompt-hacker-collections by yunwei37

llm-adaptive-attacks by tml-epfl

Prompt-Hacking-Resources by PromptLabs

ps-fuzz by prompt-security

Awesome-Jailbreak-on-LLMs by yueliu1999

JailbreakingLLMs by patrickrchao

Awesome-LM-SSP by CryptoAILab

awesome-llm-security by corca-ai

llm-guard by protectai

PurpleLlama by meta-llama