Discover and explore top open-source AI tools and projects—updated daily.
angularTool for evaluating LLM-generated web code quality
Top 63.5% on SourcePulse
Web Codegen Scorer addresses the need for empirical evaluation of AI-generated web code quality. It empowers developers and researchers to make data-driven decisions regarding LLM-generated code by providing systematic testing and comparison capabilities. The tool facilitates prompt iteration, model comparison, and quality monitoring for web development workflows.
How It Works
The scorer focuses on web code, employing well-established quality metrics. It allows users to configure evaluations with various LLMs, frameworks, and tools, specifying system instructions and integrating with MCP servers. Built-in checks cover build success, runtime errors, accessibility, security, LLM ratings, and coding best practices, with automatic issue repair capabilities. An intuitive UI visualizes and compares evaluation results.
Quick Start & Requirements
Installation is via npm: npm install -g web-codegen-scorer. Setup requires exporting API keys for LLM providers (e.g., GEMINI_API_KEY, OPENAI_API_KEY) as environment variables. A basic evaluation can be run with web-codegen-scorer eval --env=angular-example. Custom evaluations are initiated with web-codegen-scorer init. For local development, pnpm install is required, followed by commands like pnpm run eval.
Highlighted Details
Maintenance & Community
Developed by the Angular team at Google, the project has a roadmap for expanding checks, including interaction testing, Core Web Vitals measurement, and evaluating LLM edits on existing codebases. No specific community channels (e.g., Discord, Slack) or direct social handles are mentioned in the provided README.
Licensing & Compatibility
The provided README does not specify the software license. Users should verify licensing terms before adoption, especially concerning commercial use or integration with closed-source projects.
Limitations & Caveats
The tool is actively evolving, with plans to introduce more built-in checks and testing scenarios. While it aims for comprehensive evaluation, current checks are not exhaustive, and further features like interaction testing are planned for future releases.
23 hours ago
Inactive
LiveCodeBench
Codium-ai