Design2Code by NoviScl

Research paper for converting visual design into code implementation

Created 2 years ago

565 stars

Top 56.9% on SourcePulse

Project Summary

This repository provides a benchmark dataset and tools for evaluating the automation of front-end web development from visual designs. It targets researchers and engineers working on multimodal large language models (VLMs) for UI generation, offering a standardized way to measure progress in converting screenshots to functional code.

How It Works

The project introduces the Design2Code benchmark, comprising real-world webpages converted from screenshots to HTML. It facilitates evaluation of VLMs by providing code for automatic metrics (Block-Match, Text, Position, Color, CLIP) and supports running prompting experiments with models like GPT-4V, Gemini Pro Vision, and Claude 3.5. The core innovation lies in the curated dataset and evaluation framework designed to challenge and quantify VLM capabilities in visual-to-code translation.

Quick Start & Requirements

Install dependencies: pip install -e .
Install browser for screenshots: playwright install
Python 3.11 is recommended.
API keys for OpenAI/Gemini are required for prompting experiments.
Full dataset and model checkpoints are available via Google Drive and Hugging Face.
Links: Dataset, Model Checkpoint, Project Page, Paper

Highlighted Details

Includes Design2Code (484 pages) and Design2Code-HARD (80 difficult pages) datasets.
Supports evaluation of multiple VLMs including GPT-4V, Gemini Pro Vision, Claude 3.5, and a custom Design2Code-18B model.
Provides code for fine-tuning and running inference on the Design2Code-18B model based on CogAgent.
Offers detailed automatic evaluation metrics and scripts for running prompting experiments.

Maintenance & Community

Maintained by the SALT lab from Stanford NLP.
Open to contributions via Pull Requests. Issues and email are available for questions.

Licensing & Compatibility

Data, code, and model checkpoint are licensed for research use only.
Benchmark built on C4 dataset, under ODC Attribution License (ODC-By).
Restrictions apply against malicious use.

Limitations & Caveats

The base CogAgent-18B model is noted as performing poorly on this task without fine-tuning. The provided scripts for API access might require minor adjustments for direct OpenAI API calls.

Design2Code by NoviScl

Explore Similar Projects

tinte by Railly

kiro-for-cc by notdp

Flame-Code-VLM by Flame-Code-VLM

awesome-ai-coding by wsxiaoys

compoder by IamLiuLv

llm-vscode by huggingface

CodeGeeX4 by zai-org

ScreenCoder by leigest519

AlphaCodium by Codium-ai

micro-agent by BuilderIO

CodeGen by salesforce

DeepSeek-Coder by deepseek-ai