CLI tool for web form generation from document forms
Top 74.2% on sourcepulse
This prototype tool extracts structured data from PDF or image-based forms, converting them into interactive web forms adhering to the GOV.UK Forms schema. It targets users needing to digitize paper or scanned government forms, offering a cost-effective way to create accessible digital versions.
How It Works
The system leverages Large Language Models (LLMs), defaulting to OpenAI's GPT-4o or optionally Claude 3, to interpret form layouts. PDF documents are first rasterized into images using GraphicsMagick. These images, along with a specific prompt and JSON schema, are sent to the LLM for analysis. The LLM identifies questions, hints, field types, and even conditional logic, outputting a JSON representation of the form structure. This JSON is then used to dynamically generate multi-page web forms styled with GOV.UK Frontend components.
Quick Start & Requirements
npm install
OPENAI_API_KEY
or ANTHROPIC_API_KEY
npm start dev
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The tool's knowledge of question types is limited, and API key input is currently restricted to environment variables, not the UI. As with many generative AI applications, outputs can be unpredictable.
3 months ago
1 day