form-extractor-prototype  by timpaul

CLI tool for web form generation from document forms

created 1 year ago
394 stars

Top 74.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This prototype tool extracts structured data from PDF or image-based forms, converting them into interactive web forms adhering to the GOV.UK Forms schema. It targets users needing to digitize paper or scanned government forms, offering a cost-effective way to create accessible digital versions.

How It Works

The system leverages Large Language Models (LLMs), defaulting to OpenAI's GPT-4o or optionally Claude 3, to interpret form layouts. PDF documents are first rasterized into images using GraphicsMagick. These images, along with a specific prompt and JSON schema, are sent to the LLM for analysis. The LLM identifies questions, hints, field types, and even conditional logic, outputting a JSON representation of the form structure. This JSON is then used to dynamically generate multi-page web forms styled with GOV.UK Frontend components.

Quick Start & Requirements

Highlighted Details

  • Replicates form structure in JSON following GOV.UK Forms schema.
  • Generates multi-page web forms using GOV.UK Design System components.
  • Supports processing of hand-drawn forms and recognizes conditional routing.
  • Distinguishes between question, hint, and field text, and identifies common question types.

Maintenance & Community

  • Project maintained by timpaul.
  • No explicit community channels or roadmap links provided in the README.

Licensing & Compatibility

  • License not specified in the README.
  • Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The tool's knowledge of question types is limited, and API key input is currently restricted to environment variables, not the UI. As with many generative AI applications, outputs can be unpredictable.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

TypeChat by microsoft

0.1%
9k
Library for building natural language interfaces using types
created 2 years ago
updated 1 month ago
Feedback? Help us improve.