Online form filling is one of the most common—yet most labor-intensive—daily tasks. Traditional automation tools are usually rule-based scripts that struggle with complex and ever-changing web layouts. With the recent rise of Multimodal Large Language Models (MLLMs), researchers have begun to explore vision-language agents capable of "one-click" form completion. However, current models still fall short in layout understanding and field–value alignment.
FormFactory bridges this gap by providing a high-fidelity benchmark and experimentation platform for multimodal form-filling agents. The project consists of:
- 🌐 Web Front-End – 40+ high-fidelity HTML forms from real-world scenarios spanning eight domains (academia, business, finance, healthcare, etc.).
- 🗄 Dataset –
•
data1/contains JSON gold answers (ground-truth field values) for each form. •data2/stores supporting textual materials that can be used as additional context or prompts. •labeled-images/provides page screenshots and corresponding bbox annotations (organized by folders A–H). - ⚙️ Back-End Service – A lightweight Flask API that renders pages, receives submissions, and saves results automatically.
- 📊 Evaluation Scripts – Utilities for computing field-level and form-level accuracy, layout reasoning metrics, and more.
- Python ≥ 3.8
- Flask == 2.3.*
- Install additional dependencies with:
pip install -r requirements.txt
git clone https://github.com/formfactory-ai/formfactory.git
cd formfactorypython app.py # default: http://127.0.0.1:5000/Open the URL in your browser, pick any form from the dashboard, and start interacting.
formfactory/
├─ app.py # Flask back-end entry point
├─ templates/ # 40+ HTML form templates
├─ static/ # Stylesheets & front-end assets
├─ data/ # Dataset root
│ ├─ data1/ # Gold answers in JSON format
│ ├─ data2/ # Supporting textual materials
│ └─ labeled-images/ # Screenshots + bbox annotations (A–H)
├─ submission/ # Auto-generated user/model submissions
└─ README.md # This document
- Multi-Domain Coverage – Academia, business, arts, technology, finance, healthcare, law, and manufacturing.
- High-Fidelity Pages – Realistic layouts to test visual understanding and field localization.
- Dynamic Fields – Groups, repeatable sections, cascading dependencies, and more.
- Interactive Evaluation – Run the server to observe model behavior in real time; results are saved automatically.
- Easy to Extend – Add a new form by simply providing an HTML template and a JSON answer file.
# 📁 Navigate to evaluation directory
cd eval
# 🔄 Process data with model API
python batch_processor.py A11 json --filename Art_Exhibition_Submission_Form
# 📊 Evaluate model performance
python evaluator.py --batch
# 📋 View detailed evaluation report
cat evaluation_results/batch_evaluation_*_report.txtNote
Complete your API call in model_call.py, and customize your evaluation in evaluator.py
This project is released under the MIT License – see the LICENSE file for details.
If you use FormFactory in your research, please cite:
@misc{li2025formfactoryinteractivebenchmarkingsuite,
title = {FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents},
author = {Bobo Li and Yuheng Wang and Hao Fei and Juncheng Li and Wei Ji and Mong-Li Lee and Wynne Hsu},
year = {2025},
eprint = {2506.01520},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2506.01520}
}Happy research, and let's push multimodal form-filling agents forward together!

