VulnLLM-R by ucsb-mlsec

Specialized LLM for code vulnerability detection

Created 7 months ago

355 stars

Top 78.4% on SourcePulse

Project Summary

Summary VulnLLM-R addresses automated vulnerability detection in code by developing a specialized Large Language Model (LLM) with enhanced reasoning capabilities. It aims to significantly improve the accuracy and efficiency of identifying security flaws, targeting software engineers and security researchers.

How It Works This project fine-tunes LLMs for vulnerability detection, emphasizing "specialized reasoning." It constructs comprehensive datasets by merging and processing sources like PrimeVul, SecCodePLT, Juliet, Sven, and Arvo. A key innovation is generating and refining reasoning chains from other LLMs (e.g., DeepSeek-r1, QwQ) to deepen analytical understanding. The training employs Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Quick Start & Requirements Clone the repository, create a Conda environment with Python 3.11, activate it, and install dependencies via pip install -e . -e ./vulscan/train/LLaMA-Factory -e ./vulscan/model_zoo. Reproducing results requires specific dataset paths and model configurations. The paper is at arXiv:2512.07533. Testing commands suggest potential GPU requirements (e.g., --tp 2, --vllm). A web demo is mentioned but lacks a URL.

Highlighted Details

Curated Datasets: Integrates and processes multiple vulnerability datasets (PrimeVul, SecCodePLT, Juliet, Sven, Arvo) into structured "clean" and "noisy" training/testing sets.
Reasoning Data Generation: Utilizes LLM-generated reasoning chains from models like DeepSeek-r1 and QwQ to enhance analytical depth.
Reproducibility Focus: Provides detailed scripts and commands for reproducing benchmark results, generating comparison plots, and testing various model types.
Model Specialization: Offers a 7B parameter model (VulnLLM-R-7B) specifically fine-tuned for vulnerability detection.

Maintenance & Community The project is associated with authors Yuzhou Nie, Hongwei Li, Chengquan Guo, Ruizhe Jiang, Zhun Wang, Bo Li, Dawn Song, and Wenbo Guo (via arXiv paper). No specific community channels (Discord/Slack), active forums, or roadmap details are provided in this README.

Licensing & Compatibility The license type and compatibility notes for commercial use or closed-source linking are not specified in the README, posing an adoption blocker.

Limitations & Caveats The project's license is unspecified. The URL for the mentioned web demo is missing. Running tests with commercial models requires users to provide their own API keys. Setup involves multi-step dataset processing and environment configuration.

VulnLLM-R by ucsb-mlsec

Explore Similar Projects

PrimeVul by DLVulDet

augustus by praetorian-inc

threat-modeling by fr33d3m0n

Awesome-LLM4Security by liu673

Mirror-Flowers by Ky0toFu

nano-analyzer by weareaisle

iris by iris-sast

SecGPT by ZacharyZcR

finite-monkey-engine by BradMoonUESTC

OpenAnt by knostic

Awesome-LLMs-for-Vulnerability-Detection by huhusmang

vulnhuntr by protectai