next-evals-oss by vercel

AI agentic framework for Next.js coding evaluations

Created 6 months ago

257 stars

Top 98.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Dan Abramov

Core Contributor to React; Coauthor of Redux, Create React App

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Project Summary

This repository provides an automated framework for evaluating AI model competency on Next.js coding tasks, up to version 15.5.6. It empowers developers and researchers to benchmark AI agents against specific Next.js development challenges, offering insights into model performance and identifying areas for improvement.

How It Works

The system leverages @vercel/agent-eval to run evaluations. Each evaluation is a self-contained Next.js project within the evals/ directory, comprising a PROMPT.md defining the task, EVAL.ts for Vitest assertions (hidden from the agent), and necessary project files. A smart runner automatically detects new models or evals, executing only uncompleted pairs and cleaning up infrastructure failures before exporting results.

Quick Start & Requirements

Install dependencies: npm install
Configure environment: Copy .env.local .env and set VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEY.
Run evaluations: npm run eval (memoized), npm run eval:dry (preview), npm run eval -- --force (rerun all), npm run eval:smoke (sanity check).
Export results: npm run export-results to agent-results.json.
Adding new evals or models involves creating specific files/configurations within evals/ or experiments/ directories, respectively.

Highlighted Details

Automated evaluation runner intelligently detects new models and evals, executing only missing pairs and skipping completed ones.
Supports a range of Next.js specific evaluations, including Pages Router to App Router migration, optimizing fetch usage, preferring Server Actions, and managing cache directives.
Non-model related failures (e.g., infra, timeouts) are automatically purged during evaluation runs, ensuring clean results.
Results can be exported to agent-results.json and published to nextjs.org/evals.

Maintenance & Community

No specific details regarding contributors, sponsorships, or community channels were found in the provided README.

Licensing & Compatibility

The project's license is detailed in the LICENSE file. Specific compatibility notes for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

Requires specific Vercel environment variables for operation. The framework is designed for AI model evaluation rather than direct application development, and its scope is limited to Next.js versions up to 15.5.6.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days