form-extractor-prototype  by timpaul

CLI tool for web form generation from document forms

Created 1 year ago
393 stars

Top 73.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This prototype tool extracts structured data from PDF or image-based forms, converting them into interactive web forms adhering to the GOV.UK Forms schema. It targets users needing to digitize paper or scanned government forms, offering a cost-effective way to create accessible digital versions.

How It Works

The system leverages Large Language Models (LLMs), defaulting to OpenAI's GPT-4o or optionally Claude 3, to interpret form layouts. PDF documents are first rasterized into images using GraphicsMagick. These images, along with a specific prompt and JSON schema, are sent to the LLM for analysis. The LLM identifies questions, hints, field types, and even conditional logic, outputting a JSON representation of the form structure. This JSON is then used to dynamically generate multi-page web forms styled with GOV.UK Frontend components.

Quick Start & Requirements

Highlighted Details

  • Replicates form structure in JSON following GOV.UK Forms schema.
  • Generates multi-page web forms using GOV.UK Design System components.
  • Supports processing of hand-drawn forms and recognizes conditional routing.
  • Distinguishes between question, hint, and field text, and identifies common question types.

Maintenance & Community

  • Project maintained by timpaul.
  • No explicit community channels or roadmap links provided in the README.

Licensing & Compatibility

  • License not specified in the README.
  • Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The tool's knowledge of question types is limited, and API key input is currently restricted to environment variables, not the UI. As with many generative AI applications, outputs can be unpredictable.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research).

poml by microsoft

1.4%
4k
Structured prompting for LLMs
Created 9 months ago
Updated 1 day ago
Feedback? Help us improve.