Overview
The PDF Form Parser automatically analyzes uploaded PDF files to detect AcroForm fields, including text inputs, checkboxes, radio buttons, and digital signature fields. This enables form templates to be created from existing PDF documents without manual field mapping.How PDF Parsing Works
When you upload a PDF form template, the system uses thePdfFormsParserService to extract all interactive fields.
Upload PDF Document
Navigate to Form Templates and click New Form Template. Upload a PDF file containing AcroForm fields.
Automatic Field Detection
The system uses pdftk to enumerate all form fields in the PDF, detecting:
- Text fields (TextBox)
- Checkboxes (Button)
- Radio buttons (Button with options)
- Signature fields (/FT /Sig)
- Dropdown lists (Choice)
Field Metadata Extraction
For each field, the parser extracts:
- Field name (original and sanitized)
- Field type
- Available options (for checkboxes/radio buttons)
- Human-readable label (generated from field name)
- Signature metadata (for signature fields)
Field Detection Engine
The parsing service is located inapp/services/pdf_forms_parser_service.rb:6.
Standard Field Parsing
Location_row_1→ “Location Row 1”buildingAddress→ “Building Address”Inspector_Name→ “Inspector Name”
Signature Field Detection
Signature fields are detected using HexaPDF to identify PDF signature annotations (/FT /Sig type).
Signature fields are always preserved during parsing, even if they’re empty or unsigned. The system marks them with
is_signature: true for special handling.UTF-8 and Special Characters
The parser handles international characters and special symbols through multiple encoding strategies:- UTF-8 Sanitization: Field names are sanitized to remove invalid UTF-8 sequences
- Fallback Parsing: If standard parsing fails, the system uses
pdftk dump_data_fieldsas a backup - Character Replacement: Invalid characters are replaced rather than causing parse failures
Field Filtering
The parser automatically filters out empty or invalid fields:- Fields with empty
label_namevalues are excluded (except signature fields) - Fields with value “Off” (unchecked checkboxes in their default state) are filtered
- Signature fields are always preserved regardless of their state
Error Handling
The parser includes robust error handling for corrupted or non-standard PDFs:PdftkError - Standard Parsing Failed
PdftkError - Standard Parsing Failed
When pdftk cannot read the PDF structure, the system automatically switches to
dump_data_fields method which uses raw PDF data extraction.Resolution: No action needed - fallback is automatic.StandardError - Unexpected PDF Format
StandardError - Unexpected PDF Format
If the PDF structure is completely unreadable, parsing returns an empty array and logs the error.Resolution: Verify the PDF is a valid AcroForm document. Some PDFs created with form builders may not have proper field annotations.
Encoding Errors
Encoding Errors
For PDFs with special characters in field names, the parser attempts multiple encoding approaches.Resolution: Handled automatically through UTF-8 sanitization and fallback methods.
Background Processing
Large PDFs with many fields are processed asynchronously to avoid blocking the web interface:- Upload initiates
ParseFormTemplateJob - Job processes PDF in background worker
- Form structure is saved when complete
- Page automatically refreshes to show parsed fields
Supported Field Types
| PDF Field Type | Detected As | Usage |
|---|---|---|
/FT /Tx | Text | Single-line or multi-line text input |
/FT /Btn (checkbox) | Button | Checkbox with On/Off state |
/FT /Btn (radio) | Button | Radio button group with options |
/FT /Ch | Choice | Dropdown or list selection |
/FT /Sig | Signature_Field | Digital signature field |
Next Steps
After PDF parsing completes:Customize Fields
Use the Form Builder to organize, rename, and configure parsed fields
Create Inspections
Start using your form template for fire safety inspections