This repository is a curated reading list for large model trustworthiness, focusing on safety, security, and privacy, with a special emphasis on multi-modal LMs. It serves researchers and practitioners in the field by providing a structured collection of papers, toolkits, and surveys.
How It Works
The project maintains a manually collected and categorized list of resources, primarily academic papers, organized by specific sub-topics within LM safety, security, and privacy. It includes classifications for jailbreaking, prompt injection, adversarial examples, data privacy, copyright, and more, facilitating targeted research.
Quick Start & Requirements
- No installation or execution required; it's a curated list.
- Access to the internet to view linked resources.
- Links to official quick-start, docs, demo, or other relevant pages: None provided, as it is a reading list.
Highlighted Details
- Comprehensive collection of over 1500 papers, with significant additions from major conferences like ACL, S&P, and ICLR.
- Detailed categorization across Safety (862 papers), Security (261 papers), and Privacy (474 papers), with sub-categories like Jailbreak, Copyright, and Membership Inference Attacks.
- Includes links to toolkits, competitions, and surveys related to LM trustworthiness.
- Actively updated with recent papers from academic venues.
Maintenance & Community
- Maintained by a team of organizers including Tianshuo Cong, Xinlei He, Zhengyu Zhao, Yugeng Liu, and Delong Ran.
- Inspired by several other "Awesome" lists in the LLM security and privacy space.
- Welcomes contributions via pull requests or issues.
- Links to Discord/Slack, social handles, roadmap, etc.: None provided.
Licensing & Compatibility
- License: Not explicitly stated in the README.
- Compatibility notes for commercial use or closed-source linking: As a reading list, direct compatibility concerns are minimal, but users should verify the licenses of linked resources.
Limitations & Caveats
The repository is explicitly marked as "in progress" and relies on manual collection, which may lead to occasional omissions or categorization nuances. Licenses for linked external resources are not consolidated.