AICGSecEval  by Tencent

AI code security evaluation benchmark

Created 3 months ago
427 stars

Top 69.4% on SourcePulse

GitHubView on GitHub
Project Summary

A.S.E (AI Code Generation Security Evaluation) is a pioneering framework for repository-level security assessment of AI-generated code. It provides researchers and engineers with a realistic benchmark, simulating real-world development workflows and leveraging actual CVE vulnerabilities to evaluate LLM security.

How It Works

A.S.E simulates AI IDEs by evaluating LLM code generation within real GitHub repositories, offering context beyond fragment-level analysis. Its design prioritizes security-sensitive scenarios derived from expert-selected CVEs, employing dual code mutation to mitigate data leakage risks. The framework assesses LLMs across code security, project compatibility, and generation stability.

Quick Start & Requirements

  • Installation: Install dependencies via pip install -r requirements.txt. Docker is recommended for environment checks.
  • Prerequisites: Python 3.11 or higher. Requires LLM API access with an API key and a GitHub access token.
  • Hardware: Minimum 50GB disk space, 16GB RAM recommended.
  • Execution: Use python invoke.py with specified model and API parameters.

Highlighted Details

  • Repository-level Scenarios: Mimics AI IDE workflows using full GitHub projects for context.
  • Security-Sensitive Design: Tasks based on real CVEs with expert-defined rules and dual code mutation for data leakage mitigation.
  • Multi-dimensional Assessment: Evaluates code security, quality, and generation stability across 5 languages and 4 vulnerability types.

Maintenance & Community

Developed by Tencent Security Platform Department's WuKong Code Security Team with academic partners. Community contributions are welcomed via GitHub Issues/Pull Requests; collaboration is open via security@tencent.com or WeChat.

Licensing & Compatibility

Licensed under the permissive Apache-2.0 License, suitable for commercial and closed-source projects.

Limitations & Caveats

Current code context extraction uses BM25, with plans for advanced algorithms. The evaluation is time-consuming, and the project is at version 1.0, indicating ongoing development.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
34
Issues (30d)
25
Star History
142 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.