AICGSecEval by Tencent

AI code security evaluation benchmark

Created 6 months ago

1,097 stars

Top 34.7% on SourcePulse

Project Summary

A.S.E (AI Code Generation Security Evaluation) is a pioneering framework for repository-level security assessment of AI-generated code. It provides researchers and engineers with a realistic benchmark, simulating real-world development workflows and leveraging actual CVE vulnerabilities to evaluate LLM security.

How It Works

A.S.E simulates AI IDEs by evaluating LLM code generation within real GitHub repositories, offering context beyond fragment-level analysis. Its design prioritizes security-sensitive scenarios derived from expert-selected CVEs, employing dual code mutation to mitigate data leakage risks. The framework assesses LLMs across code security, project compatibility, and generation stability.

Quick Start & Requirements

Installation: Install dependencies via pip install -r requirements.txt. Docker is recommended for environment checks.
Prerequisites: Python 3.11 or higher. Requires LLM API access with an API key and a GitHub access token.
Hardware: Minimum 50GB disk space, 16GB RAM recommended.
Execution: Use python invoke.py with specified model and API parameters.

Highlighted Details

Repository-level Scenarios: Mimics AI IDE workflows using full GitHub projects for context.
Security-Sensitive Design: Tasks based on real CVEs with expert-defined rules and dual code mutation for data leakage mitigation.
Multi-dimensional Assessment: Evaluates code security, quality, and generation stability across 5 languages and 4 vulnerability types.

Maintenance & Community

Developed by Tencent Security Platform Department's WuKong Code Security Team with academic partners. Community contributions are welcomed via GitHub Issues/Pull Requests; collaboration is open via security@tencent.com or WeChat.

Licensing & Compatibility

Licensed under the permissive Apache-2.0 License, suitable for commercial and closed-source projects.

Limitations & Caveats

Current code context extraction uses BM25, with plans for advanced algorithms. The evaluation is time-consuming, and the project is at version 1.0, indicating ongoing development.

AICGSecEval by Tencent

Explore Similar Projects

Awesome-LLMs-for-Vulnerability-Detection by huhusmang

bitoai by gitbito

iris by iris-sast

web-codegen-scorer by angular

rules by project-codeguard

Mirror-Flowers by Ky0toFu

Auditor by TheAuditorTool

metis by arm

gpt3_security_vulnerability_scanner by chris-koch-penn

raptor by gadievron

buttercup by trailofbits

vulnhuntr by protectai