web-codegen-scorer  by angular

Tool for evaluating LLM-generated web code quality

Created 2 months ago
484 stars

Top 63.5% on SourcePulse

GitHubView on GitHub
Project Summary

Web Codegen Scorer addresses the need for empirical evaluation of AI-generated web code quality. It empowers developers and researchers to make data-driven decisions regarding LLM-generated code by providing systematic testing and comparison capabilities. The tool facilitates prompt iteration, model comparison, and quality monitoring for web development workflows.

How It Works

The scorer focuses on web code, employing well-established quality metrics. It allows users to configure evaluations with various LLMs, frameworks, and tools, specifying system instructions and integrating with MCP servers. Built-in checks cover build success, runtime errors, accessibility, security, LLM ratings, and coding best practices, with automatic issue repair capabilities. An intuitive UI visualizes and compares evaluation results.

Quick Start & Requirements

Installation is via npm: npm install -g web-codegen-scorer. Setup requires exporting API keys for LLM providers (e.g., GEMINI_API_KEY, OPENAI_API_KEY) as environment variables. A basic evaluation can be run with web-codegen-scorer eval --env=angular-example. Custom evaluations are initiated with web-codegen-scorer init. For local development, pnpm install is required, followed by commands like pnpm run eval.

Highlighted Details

  • Comprehensive built-in checks: build success, runtime errors, accessibility, security, LLM rating, and coding best practices.
  • Automatic issue repair functionality for detected code problems.
  • Supports any web library, framework, or LLM, not limited to Angular or Google models.
  • Configurable with custom Retrieval-Augmented Generation (RAG) endpoints.
  • Features an intuitive report viewer UI for results analysis.

Maintenance & Community

Developed by the Angular team at Google, the project has a roadmap for expanding checks, including interaction testing, Core Web Vitals measurement, and evaluating LLM edits on existing codebases. No specific community channels (e.g., Discord, Slack) or direct social handles are mentioned in the provided README.

Licensing & Compatibility

The provided README does not specify the software license. Users should verify licensing terms before adoption, especially concerning commercial use or integration with closed-source projects.

Limitations & Caveats

The tool is actively evolving, with plans to introduce more built-in checks and testing scenarios. While it aims for comprehensive evaluation, current checks are not exhaustive, and further features like interaction testing are planned for future releases.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
57
Issues (30d)
2
Star History
134 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

AlphaCodium by Codium-ai

0.1%
4k
Code generation research paper implementation
Created 1 year ago
Updated 11 months ago
Feedback? Help us improve.