Design2Code  by NoviScl

Research paper for converting visual design into code implementation

Created 2 years ago
556 stars

Top 57.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a benchmark dataset and tools for evaluating the automation of front-end web development from visual designs. It targets researchers and engineers working on multimodal large language models (VLMs) for UI generation, offering a standardized way to measure progress in converting screenshots to functional code.

How It Works

The project introduces the Design2Code benchmark, comprising real-world webpages converted from screenshots to HTML. It facilitates evaluation of VLMs by providing code for automatic metrics (Block-Match, Text, Position, Color, CLIP) and supports running prompting experiments with models like GPT-4V, Gemini Pro Vision, and Claude 3.5. The core innovation lies in the curated dataset and evaluation framework designed to challenge and quantify VLM capabilities in visual-to-code translation.

Quick Start & Requirements

  • Install dependencies: pip install -e .
  • Install browser for screenshots: playwright install
  • Python 3.11 is recommended.
  • API keys for OpenAI/Gemini are required for prompting experiments.
  • Full dataset and model checkpoints are available via Google Drive and Hugging Face.
  • Links: Dataset, Model Checkpoint, Project Page, Paper

Highlighted Details

  • Includes Design2Code (484 pages) and Design2Code-HARD (80 difficult pages) datasets.
  • Supports evaluation of multiple VLMs including GPT-4V, Gemini Pro Vision, Claude 3.5, and a custom Design2Code-18B model.
  • Provides code for fine-tuning and running inference on the Design2Code-18B model based on CogAgent.
  • Offers detailed automatic evaluation metrics and scripts for running prompting experiments.

Maintenance & Community

  • Maintained by the SALT lab from Stanford NLP.
  • Open to contributions via Pull Requests. Issues and email are available for questions.

Licensing & Compatibility

  • Data, code, and model checkpoint are licensed for research use only.
  • Benchmark built on C4 dataset, under ODC Attribution License (ODC-By).
  • Restrictions apply against malicious use.

Limitations & Caveats

The base CogAgent-18B model is noted as performing poorly on this task without fine-tuning. The provided scripts for API access might require minor adjustments for direct OpenAI API calls.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
8 more.

llm-vscode by huggingface

0%
1k
VSCode extension for LLM-powered code development
Created 2 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

AlphaCodium by Codium-ai

0.1%
4k
Code generation research paper implementation
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.