Discover and explore top open-source AI tools and projects—updated daily.
NVlabsEvaluation harness for LLMs on Verilog code generation and spec-to-RTL tasks
Top 81.9% on SourcePulse
This repository provides an evaluation harness for benchmarking Large Language Models (LLMs) on Verilog hardware description language (HDL) code generation tasks. It targets researchers and engineers evaluating LLMs for hardware design automation, offering improved prompts, support for specification-to-RTL tasks, and detailed error analysis.
How It Works
The harness utilizes a Makefile to orchestrate the evaluation workflow, supporting two primary tasks: code-complete-iccad2023 and spec-to-rtl. It manages datasets as plain text files and allows flexible configuration of LLM parameters such as model choice, in-context learning examples (0-4 shots), number of samples, temperature, and top-p. The evaluation process involves generating Verilog code from LLM prompts and then verifying its correctness using iverilog and verilator.
Quick Start & Requirements
makeiverilog (v12, v13 not supported)verilatorpython3 (v3.11.0 recommended, e.g., via conda create -n codex python=3.11)langchain, langchain-openai, langchain-nvidia-ai-endpointsiverilog from source (v12 branch).Highlighted Details
iverilog compilation errors.Maintenance & Community
The project is associated with NVlabs and has published research papers detailing its methodology and findings. Links to relevant papers are provided for citation.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is currently Linux-only and requires manual compilation of a specific iverilog version (v12). MachineEval is not supported, and the original Pass@10 metric is no longer reported. A Dockerfile and prebuilt JSONL support are planned but not yet available.
3 months ago
1 week
wandb
LiveCodeBench
deepseek-ai