0% found this document useful (0 votes)
64 views6 pages

Character Keypoint

Uploaded by

Tung Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views6 pages

Character Keypoint

Uploaded by

Tung Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)

Character Keypoint-based Homography Estimation in Scanned Documents for


Efficient Information Extraction

Kushagra Mahajan, Monika Sharma, Lovekesh Vig


TCS Research, New Delhi, India
Email: {kushagra.mahajan, monika.sharma1, lovekesh.vig}@tcs.com

Abstract—Precise homography estimation between multiple Homography estimation is essential for various tasks in
images is a pre-requisite for many computer vision appli- computer vision like Simultaneous Localization and Map-
cations. One application that is particularly relevant in to- ping (SLAM), 3D reconstruction and panoramic image gen-
day’s digital era is the alignment of scanned or camera-
captured document images such as insurance claim forms for eration [1]–[3]. A homography exists between projections of
information extraction. Traditional learning based approaches points on a 3D plane in two different views, i.e., a homog-
perform poorly due to the absence of an appropriate gradient. raphy is said to be a transform / matrix which essentially
Feature based keypoint extraction techniques for homography converts points from one perspective to another perspective.
estimation in real scene images either detect an extremely To this end, we aim to find a transformation which allows
large number of inconsistent keypoints due to sharp textual
edges, or produce inaccurate keypoint correspondences due to matching / correspondence among pixels belonging to the
variations in illumination and viewpoint differences between test perturbed document and the template document image.
document images. In this paper, we propose a novel algorithm This transformation can be represented as a matrix as shown
for aligning scanned or camera-captured document images in Equation 1.
using character based keypoints and a reference template. The ⎡ ⎤
algorithm is both fast and accurate and utilizes a standard h11 h12 h13
Optical character recognition (OCR) engine such as Tesseract H = ⎣h21 h22 h23 ⎦ (1)
to find character based unambiguous keypoints, which are h31 h32 h33
utilized to identify precise keypoint correspondences between
two images. Finally, the keypoints are used to compute the Y ∼ HX (2)
homography mapping between a test document and a template.
We evaluated the proposed approach for information extraction
on two real world anonymized datasets comprised of health
Here, the homography matrix H has 8 free parameters as
insurance claim forms and the results support the viability of h33 = 1 or it imposes a unit vector constraint (h211 + h212 +
the proposed technique. h213 + h221 + h222 + h223 + h231 + h232 + h233 = 1). This means
Keywords-Homography estimation; Character keypoints; that we can compute a homography which describes how
Scanned documents; Information extraction to transform the first set of points X to the second set of
points Y using four pairs of matched points in our images.
I. I NTRODUCTION Existing homography estimation techniques fall into two
Today’s digital world calls for the digitization of every broad categories, namely direct pixel-based and feature-
aspect of industry. One such aspect is the digitization based methods. Among pixel based methods, Lucas
of scanned or camera-captured document images such as Kanade’s optical flow technique [5] which utilizes the sum
bank receipts, insurance claim forms etc. for facilitating of squared differences between pixel intensity values as the
fast information retrieval from documents. Future references error metric to estimate the motion of the pixels of the
of ‘scanned’ documents in the paper imply both scanned image contents is the most popular. An extension of the
and camera-captured document images. Automating the task method was proposed by Lucey et al. [6] which represents
of information extraction from scanned documents suffers the images in complex 2D Fourier domain for improved
from difficulties arising due to variations in scanning of performance. However, in the case of text document image
documents at different orientations and perspectives. This alignment, these direct pixel based methods fail miserably
increases the likelihood of errors and additional human to give the desired image alignment because sharp textual
effort is required to extract relevant information from the edges do not provide a smooth gradient which can be used
documents. To circumvent this issue, we resort to image for learning the homography. Feature-based methods first
alignment techniques like homography estimation for align- extract the keypoints, and then match the corresponding
ing the given test document with a reference template docu- keypoints between the original and transformed images
ment. Document alignment facilitates better performance by using their respective keypoint descriptors. This keypoint
reducing the errors in field extraction and also reduces time correspondence is used to estimate the homography between
and costs related to the digitization of scanned documents. the two images. The most fundamental feature-descriptors

978-1-7281-5054-3/19/$31.00 ©2019 IEEE 25


DOI 10.1109/ICDARW.2019.30060
approaches perform poorly when the problem space is
extended to scanned document images. The limitations
of these approaches are analyzed to come up with our
methodology.
• We show the effectiveness of our proposed approach
using information extraction from two real world
anonymized datasets comprised of health insurance
(a) (b) claim forms, and present the qualitative and quantitative
results in Section IV.
Remainder of the paper is organized as follows : Sec-
tion I-A discusses some of the prior work done in the field
of image alignment, keypoint extraction and homography es-
timation. A detailed step-by-step explanation of the proposed
approach is presented in Section II. Section III gives details
of the two real world anonymized health insurance claim
(c) (d)
form datasets used for document alignment. Subsequently,
Figure 1. (a) and (b) show SIFT keypoints extracted on patches of a the experimental results and discussions on the same are
document and a corresponding rotated document respectively. (c) and (d)
show ORB [4] outputs on patches of the same image pair. Please note given in Section IV. Finally, we conclude the paper in
the difference in the keypoint detections between the original and rotated Section V.
versions of the document. We observe that there is lack of consistency
between the keypoint detections for both the feature descriptors i.e. very A. Related Work
often, keypoints are not detected at corresponding locations in the two
documents. By far, feature based methods relying on detection and
matching of local image features are the most widely used
techniques for homography estimation and subsequent image
used for keypoint extraction and matching tasks are Scale- alignment. [9] uses the centroids of words in the document
Invariant Feature Transform (SIFT) [7] and Oriented FAST to compute the features. Since centroid computation at
and Rotated BRIEF (ORB) [4]. When we use these feature- different orientations suffers from lack of precision, these
descriptors for detecting keypoints in the text document im- features cannot be used for our task which requires ex-
ages, a large number of inconsistent keypoints are detected actness. [10] used structures in the text document like
due to sharp textual edges producing inaccurate keypoint punctuation characters as keypoints for document mosaic-
correspondences, as illustrated in Figure 1. ing, while Royer et al. [11] explored keypoint selection
To overcome these challenges, in this paper, we propose methods which reduce the number of extracted keypoints
a novel and robust algorithm for aligning scanned document for improved document image matching. Recently, deep
images using character based keypoints and a reference neural networks have become popular to obtain powerful
empty template document. The proposed method utilizes a feature descriptors [12]–[15] compared with the traditional
standard OCR such as Tesseract [8] to find unambiguous descriptors. These approaches create patches with descrip-
character based keypoints, which are subsequently utilized tors computed for each patch. Similarity scores and dis-
to identify precise keypoints correspondences between the tance measures between the descriptors are then used for
two document images. Finally, the keypoint correspondences obtaining the matches. Similarly, [16] proposed an end-to-
are used to compute the homography mapping between the end architecture for learning affine transformations without
two document images. The proposed approach is fast, robust manual annotation where features for each of the two images
and involves minimal memory requirements in contrast to are extracted through a siamese architecture, followed by
complex deep learning based approaches which require huge trainable matching and geometric parameter estimation pro-
memory to store the model and large compute power to ducing state-of-the-art results on the Proposal Flow dataset.
produce results even during test time. Hence, this method is DeTone et al. [17] devised a deep neural network to address
ideal for real-time computation on mobile devices and other the problem of homography estimation. They estimate the
electronic gadgets with resource constraints. displacements between the four corners of the original and
To summarize, our contributions in the paper are as perturbed images in a supervised manner, and map it to
follows : the corresponding homography matrix. Another work of
particular interest to us is done by Nguyen et al. [18]
• We propose a novel, fast and memory efficient algo-
which trains a Convolutional Neural Network (CNN) for
rithm for robust character based unambiguous keypoint
unsupervised learning of planar homographies, achieving
detection, extracted using a standard OCR like Tesser-
faster inference and superior performance compared to the
act, from scanned textual documents.
supervised counterparts.
• We demonstrate how existing homography estimation

26
.H\SRLQW([WUDFWLRQ

7HPSODWH
'RFXPHQW
7KUHVKROG
5XQ2&5HQJLQHIRU
GRFXPHQW
ZRUGGHWHFWLRQ
LPDJHV
7HVW
,QWKHWHPSODWHGRFXPHQWILQG
'RFXPHQW
ZRUGVWKDWEHJLQRUHQGZLWK
WKHFKDUDFWHUVLQWKHOLVWV
)RUPOLVWVRIFKDUDFWHUVZLWK ([WUDFWWKHZRUGERXQGLQJ
GLVWLQFWOHIWULJKWWRSDQG ER[HV
ERWWRPNH\SRLQWV

8VHQHLJKERUKRRG
'HSHQGLQJRQWKHW\SHRI LQIRUPDWLRQWRILQG
,QIRUPDWLRQ([WUDFWLRQ

UHWULHYHGGDWDLWLV FRUUHVSRQGLQJZRUGVLQ
SURFHVVHGDQG WKHWHVWGRFXPHQW
LQIRUPDWLRQLVH[WUDFWHG

.H\SRLQW0DWFKLQJ
8VHUPDUNVERXQGLQJER[HVLQ )LQGFRQQHFWHG
WKHWHPSODWHDURXQGILHOGVWREH FRPSRQHQWVLQWKH
H[WUDFWHG&RUUHVSRQGLQJILHOGV ERXQGLQJER[HVRIWKH
DUHUHWULHYHGIURPWKHWHVW FRUUHVSRQGLQJZRUGV
GRFXPHQWV

(VWLPDWHKRPRJUDSK\ )LQGNH\SRLQWVLQWKHILUVW
XVLQJFRUUHVSRQGLQJ ODVWRUERWKWKHFRPSRQHQWV
NH\SRLQWORFDWLRQV$OLJQ GHSHQGLQJRQWKH
WKHWHVWGRFXPHQWZLWKWKH FKDUDFWHUVSUHVHQW LQWKH
WHPSODWH FRUUHVSRQGLQJZRUGV

'RFXPHQW$OLJQPHQW
Figure 2. Flowchart showing the entire pipeline for information extraction from scanned document images after aligning with the template document
using character keypoint-based homography estimation.

II. P ROPOSED A PPROACH the documents, read all the words present in them and return
the coordinates of the bounding boxes for each word. We
In this section, we discuss in detail the proposed method
observe an inherent trait present in certain characters like
for information extraction from documents aligned using
’A’, ’T’, ’r’ etc. that they have distinct tips. This attribute
character keypoint-based homography estimation. The task
of characters is used to extract precise and unambiguous
of information extraction in a scanned document image
character keypoints.
involves finding values of fields of interest marked by a
user. To accomplish this, the proposed method requires a We create four separate lists, namely begCharList,
template document image with empty fields for each docu- endCharList, topCharList, bottomCharList for char-
ment dataset. We attempt to align the test document images acters that have distinct tips on the left, right, top or
which have filled fields of interest with the empty template bottom respectively, as shown in Figure 3. For example,
document. After the documents are aligned, the desired begCharList includes characters such as ’A’, ’V’, ’T’, ’Y’,
text fields are retrieved from the filled test documents, and ’4’, ’v’, ’w’ etc. with a distinct left tip, and endCharList
information is read using an Optical Character Recognition consists of characters like ’V’, ’T’, ’L’, ’Y’, ’7’, ’r’ etc.
(OCR) engine [8] and Handwritten Text Recognition (HTR) with a distinct right tip. We refrain from selecting characters
deep network [19]. The entire pipeline for the algorithm is like ’O’ or ’D’ since there is impreciseness in the keypoint
shown in Figure 2. detection in the curved portions which can ultimately impact
the overall image alignment. We ensure that the accuracy of
A. Character based Keypoint Extraction our system is not compromised, therefore, only unambiguous
We begin by thresholding the empty template document characters are considered for keypoint detection.
as well as the filled test document from which information In the next step, we extract the word patch from the docu-
is to be extracted. Thresholding allows us to mitigate the ment that either begins with one of the characters present in
impact of illumination variations present in scanned docu- begCharList, topCharList or bottomCharList, or ends
ment images. Next, we use Tesseract as the OCR on both with one of the characters in endCharList, topCharList

27
$7/
candidate neighbourhood region can have some additional
words in the form of handwritten or printed text from the
filled fields. So, we keep a threshold of 90%, which means
that if the test candidate neighbourhood has at least 90%
of the words present in the template word neighbourhood,

Y
then the test candidate is the corresponding matching word
/HIW7LS
in the test document. An analogy can be drawn with feature
5LJKW7LS
matching involving commonly used local descriptors like
7RS7LS
SIFT and ORB which compute keypoint descriptors using
%RWWRP7LS
a neighbourhood region around the keypoint, and use the
Figure 3. Left, right, top and bottom tips are shown for some of the similarity between descriptors for the matching task.
characters included in the begCharList, endCharList, topCharList
and bottomCharList respectively. Table I
C HARACTER RECOGNITION ACCURACY FOR FIELDS IN THE FIRST
INSURANCE DATASET. C OLUMN ( A ) GIVES THE ACCURACY OF THE
PRINTED TEXT, COLUMN ( B ) SHOWS THE ACCURACY FOR
or bottomCharList. After that, we run the connected HANDWRITTEN TEXT TESTED ON THE HTR [19], WHILE COLUMN ( C )
components algorithm to find the components for that word. MENTIONS THE ACCURACY OF HANDWRITTEN TEXT USING THE
G OOGLE V ISION API.
Then, we look at the leftmost and the rightmost component
in the word. We search for the distinct tips in the first Field Tesseract (A) HTR [19](B) Vision API (C)
Name 98.6% 88% 92.2%
component if the component character is in begCharList, Pet Name 99.2% 89.5% 92.9%
topCharList or bottomCharList, and the last com- Address 98.3% 80.4% 85.8%
ponent if the component character is in endCharList, Hospital 98.7% 77.3% 82.5%
Injury 97.1% 78% 82.6%
topCharList or bottomCharList. As a result, we get a set
of keypoints in the template document and the corresponding
keypoints in the test document. We only use the first and C. Document Alignment
last components of the word since these are guaranteed to
include the first and last characters. Ideally each character The next step in the pipeline is to find the homography
should be detected as a separate component. However, in mapping between the template and test documents from
reality, this may not be the case because characters within the keypoint correspondences obtained in the previous
the word may touch each other as a result of thresholding. step. OpenCV’s findHomography method finds this
To improve the performance of our proposed method, transformation matrix between the documents. The method
certain heuristic checks are also imposed. Words with two makes use of Equations 1 and 2 to find the transformation
or lesser characters are ignored since they are more likely matrix H. The noise in keypoint detection might hamper
to be false positive detections by Tesseract. A constraint system performance. To make the method more robust, we
is put on the font size. This is done because it was found supply a much larger set of keypoint pairs than the minimum
empirically that very small font sizes tend to get broken four required for homography estimation. RANSAC [20] is
during thresholding and are likely to be incorrectly detected used to get rid of any noise in the system which appears
by Tesseract. We use the Enchant spell checking library in the form of outliers. The transformation obtained is
in Python to make sure that the words used for detecting then applied to the test document using warpPerspective
the keypoints are valid words of the English language. function in OpenCV which takes the transformation matrix
This prevents any junk words from being used for keypoint and the image on which the transformation matrix is to be
detection since these words might be detected differently applied as input. This operation is equivalent to Equation 2
across the template and test documents. being applied to every pixel in the test document. It gives
us the test document aligned with the template document.
B. Keypoint Matching
D. Information Extraction
The next step is to obtain correspondences between key-
points of the template and test documents. Since a word Having aligned the test document with the template, the
in a template can appear multiple times, we need to be user now simply marks the text field regions in the template
sure that keypoints of the corresponding words are being document that need to be extracted from each of the test
matched. For this, we take a neighbourhood region centred documents. The corresponding patches in the test documents
at the word under consideration in the template document. A are retrieved. Textual information is best read if nature of
similar region is taken around the matched candidate word in the text is known. Hence, we train a convolutional neural
the test document. In an ideal scenario, all the words in the network based classifier to identify whether a textual field
template word neighbourhood region should also occur in is handwritten or printed. The classifier gives near-perfect
the test candidate neighbourhood region. However, the test performance, with the accuracy being 98.5%. Now, if the

28
text is recognized as printed, the retrieved field patch is sent Table II
C HARACTER RECOGNITION ACCURACY FOR FIELDS IN THE SECOND
to Tesseract for recognition. For handwritten text, we use INSURANCE DATASET OF APPLICATION FORMS . C OLUMN ( A ) REPORTS
the work of Chowdhury et al. [19] and the Google Vision THE ACCURACY FOR HANDWRITTEN TEXT TESTED ON THE HTR MODEL
API 1 for recognition. GIVEN BY A RINDAM ET AL ., WHILE COLUMN ( B ) GIVES THE ACCURACY
OF HANDWRITTEN TEXT USING THE G OOGLE V ISION API. T HIS
III. DATASET DATASET DOES NOT CONTAIN ADDED TEXT IN PRINTED FORM .

We evaluated our proposed approach on two real world Field HTR Model [19] (A) Google Vision API (B)
anonymized document datasets. The first dataset consists Agency Name 78.7% 83.5%
Agency Address 78.3% 84.6%
of 15 insurance claim forms and one corresponding empty First Name 80.1% 84.5%
template form. The second dataset contains 15 life insurance Last Name 80.7% 86.7%
application forms along with one corresponding empty tem- Applicant Address 78.4% 82.6%
City 81.9% 93.5%
plate form. This dataset does not have filled text in printed State 83.2% 89.6%
form. The filled data is only in the form of handwritten text.
These datasets contain documents with variations in illumi-
nation, different backgrounds like wooden table and also,
the documents are affine transformed relative to the template Alignment is followed by text field retrieval and classi-
document. All the documents are resized to 1600 × 2400, fication of the text into printed or handwritten. We train a
and converted to grayscale for further experiments. 5-layer CNN on patches of printed text cropped from text
lines detected by CTPN [21], and patches of handwritten
text obtained from the IAM dataset [22]. We obtain a test
accuracy of 98.5% when the model is tested on fields ex-
tracted from our documents. The quantitative measure of our
information extraction pipeline is the character recognition
accuracy of the retrieved text fields. Different models are
employed for handwritten and printed text as specified in
(a) (b) (c) (d)
Section II. Table II-B reports the accuracies of some of the
fields of interest in the first insurance dataset.
To get an estimate of the amount of perturbations that
our system can handle, we make use of the second insurance
dataset mentioned in Section III and perform varying degrees
of transformations like rotation, translation and scaling. We
observe that our algorithm is able to handle translations
and scaling of the test documents. For rotations, the system
(e) (f) (g) (h) performance is unaffected for rotations upto ±7◦ in the
x-y plane of the image. For rotations beyond this range,
Figure 4. (a), (b), (c) and (d) are the template, test image, aligned image, Tesseract output degrades significantly and thus, the image
and the result of XOR operation between the template and aligned images
for a sample document of the first dataset. (e), (f), (g) and (h) are the may not be aligned well. Horizontal and vertical translations
corresponding images for a sample document of the second dataset. The range in between ±40% of the document width and height
XOR operation allows us to visualize how perfectly the test document is respectively. Scaling factors largely depend on the font
aligned with the template, and the filled text in the test document stands
out distinctly in bright white. size on the document and the system performance is not
impacted until the image gets pixelated. For our datasets,
scaling works perfectly when the width and height are varied
IV. E XPERIMENTAL R ESULTS AND D ISCUSSION
from 50% to 200% of their original values. The character
In this section, we present our experimental results on recognition accuracies for the fields extracted during this
two document datasets of insurance claim forms. We used stress test for the second insurance dataset are mentioned in
a threshold of 170, which was determined empirically, Table III.
for binarization of documents. Figure 4 shows some test
documents and their corresponding documents aligned with V. C ONCLUSION
the template. The fourth column in Figure 4 is obtained when
We proposed a character keypoint-based approach for
we perform XOR operation between the aligned image and
homography estimation using textual information present in
the template. It provides us greater visual understanding of
the document to address the problem of image alignment,
how our system performs on the homography estimation and
specifically for scanned textual document images. Since such
alignment task.
documents do not have smooth pixel intensity gradients for
1 Google Cloud Vision Api : https://cloud.google.com/vision/ warp estimation, we cannot use the contemporary machine
learning and deep learning algorithms relying on pixel inten-

29
sity values for image alignment. To address these limitations, [12] S. Zagoruyko and N. Komodakis, “Learning to compare
we create an automated system which takes an empty image patches via convolutional neural networks,” in Pro-
template document image and the corresponding filled test ceedings of the IEEE conference on computer vision and
pattern recognition, 2015, pp. 4353–4361.
document, and aligns the test document with the template
for extraction and analysis of textual fields. Experiments [13] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua,
conducted on two real world datasets of insurance forms and F. Moreno-Noguer, “Discriminative learning of deep
support the viability of our proposed approach. convolutional feature point descriptors,” in Proceedings of the
IEEE International Conference on Computer Vision, 2015, pp.
118–126.
R EFERENCES
[14] X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg,
[1] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb- “Matchnet: Unifying feature and metric learning for patch-
slam: a versatile and accurate monocular slam system,” IEEE based matching,” in Proceedings of the IEEE Conference on
transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015. Computer Vision and Pattern Recognition, 2015, pp. 3279–
3286.
[2] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski,
“Building rome in a day,” in 2009 IEEE 12th international [15] V. Balntas, E. Johns, L. Tang, and K. Mikolajczyk, “Pn-
conference on computer vision. IEEE, 2009, pp. 72–79. net: Conjoined triple deep network for learning local image
descriptors,” arXiv preprint arXiv:1601.05030, 2016.
[3] M. Brown and D. G. Lowe, “Automatic panoramic image
stitching using invariant features,” International journal of [16] I. Rocco, R. Arandjelovic, and J. Sivic, “Convolutional neural
computer vision, vol. 74, no. 1, pp. 59–73, 2007. network architecture for geometric matching,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern
[4] E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “Orb: Recognition, 2017, pp. 6148–6157.
An efficient alternative to sift or surf.” in ICCV, vol. 11, no. 1.
Citeseer, 2011, p. 2. [17] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Deep image
homography estimation,” arXiv preprint arXiv:1606.03798,
2016.
[5] B. D. Lucas, T. Kanade et al., “An iterative image registration
technique with an application to stereo vision,” 1981. [18] T. Nguyen, S. W. Chen, S. S. Shivakumar, C. J. Taylor,
and V. Kumar, “Unsupervised deep homography: A fast and
[6] S. Lucey, R. Navarathna, A. B. Ashraf, and S. Sridharan, robust homography estimation model,” IEEE Robotics and
“Fourier lucas-kanade algorithm,” IEEE transactions on pat- Automation Letters, vol. 3, no. 3, pp. 2346–2353, 2018.
tern analysis and machine intelligence, vol. 35, no. 6, pp.
1383–1396, 2013. [19] A. Chowdhury and L. Vig, “An efficient end-to-end neu-
ral model for handwritten text recognition,” arXiv preprint
[7] D. G. Lowe, “Distinctive image features from scale-invariant arXiv:1807.07965, 2018.
keypoints,” International journal of computer vision, vol. 60,
no. 2, pp. 91–110, 2004. [20] M. A. Fischler and R. C. Bolles, “Random sample
consensus: A paradigm for model fitting with applications to
[8] R. Smith, “An overview of the tesseract ocr engine,” image analysis and automated cartography,” Commun. ACM,
in Proceedings of the Ninth International Conference on vol. 24, no. 6, pp. 381–395, Jun. 1981. [Online]. Available:
Document Analysis and Recognition - Volume 02, ser. http://doi.acm.org/10.1145/358669.358692
ICDAR ’07. Washington, DC, USA: IEEE Computer
Society, 2007, pp. 629–633. [Online]. Available: http: [21] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text
//dl.acm.org/citation.cfm?id=1304596.1304846 in natural image with connectionist text proposal network,”
in European conference on computer vision. Springer, 2016,
[9] K. Takeda, K. Kise, and M. Iwamura, “Real-time document pp. 56–72.
image retrieval for a 10 million pages database with a memory
efficient and stability improved llah,” in 2011 International [22] U.-V. Marti and H. Bunke, “The iam-database: an english
Conference on Document Analysis and Recognition. IEEE, sentence database for offline handwriting recognition,” In-
2011, pp. 1054–1058. ternational Journal on Document Analysis and Recognition,
vol. 5, no. 1, pp. 39–46, 2002.
[10] M. Block, M. R. Ortegón, A. Seibert, J. Kretzschmar, and
R. Rojas, “Sitt-a simple robust scaleinvariant text feature
detector for document mosaicing,” Proc. of ICIA2009, pp.
400–403, 2007.

[11] E. Royer, J. Chazalon, M. Rusiñol, and F. Bouchara, “Bench-


marking keypoint filtering approaches for document image
matching,” in 2017 14th IAPR International Conference on
Document Analysis and Recognition (ICDAR), vol. 1. IEEE,
2017, pp. 343–348.

30

You might also like