0% found this document useful (0 votes)

26 views7 pages

Text Extraction From Document Image

The document presents a study on developing an OCR system pipeline utilizing Convolutional Neural Networks (CNN) for text extraction from document images. It discusses the use of datasets like EMNIST and CHARS74k for training the CNN, as well as challenges in text extraction such as noise and layout variations. The paper emphasizes the integration of text and layout information for improved accuracy in document interpretation and outlines various methodologies and algorithms used in the process.

Uploaded by

shivphd1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views7 pages

Text Extraction From Document Image

Uploaded by

shivphd1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129

VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

Text Extraction From Document Image

Anusha C, Saket Mishra, Rohit Metre, Harsh Gurawalia

Department of CSE, PES University, Bangalore-79, Karnataka

Email: [email protected], [email protected], [email protected], [email protected]

Contact: +91 7676240083, +91 9731963460, +91 9611910110, +91 9901360787

Guided By:
Dr.SapnaV.M, Assistant Professor, Dept. of CSE, PES UNIVERSITY,Bangalore,Karnataka
Email: [email protected]

Abstract:. We'll be putting together an OCR system pipeline. A Convolutional neural network will be used to classify each
individual character. CNN requires less training than a fully linked network because it has fewer parameters. To make this work,
we'll first split the lines, then the words, and ultimately the individual characters that will be sent to CNN. The English character
dataset that has been acquired will be used to train the CNN. The EMNIST dataset (Extended Modified National Institute of
Standards and Technology) has around 8 lakh samples divided into 62 classes (10 digits + 26 lowercase alphabets + 26
uppercase alphabets). We discovered another CHARS74k dataset since this dataset comprises handwritten characters.
CHARS74k has 62 classes, identical to EMNIST, and is a normalized dataset with 1016 samples for each character class. To
build words, we will merge the expected character label from CNN. It's possible that the prediction is inaccurate and contains
some misclassification. As a result, some adjustments are required. To accomplish this, we will utilize an English word spell
checker to locate all similar words and select the most appropriate one.

I. INTRODUCTION
Computer vision and machine learning skills are required to
extract text from a document image. Only one person can II. LITERATURE REVIEW
complete the task of text extraction by combining these two The existing research in the subject of text recognition is
skills. Text Extraction from photos has a wide range of briefly summarized in this section. There has been text
practical uses. Images may contain useful text information, but recognition technology for a very long time.
we will lose a vital image feature if we don't have a way to In [1], The task of document layout analysis is carried out to
extract it. Text data can provide useful information that helps separate the pre-defined semantic units from aesthetically rich
to explain an image. publications, such as the abstract, author, caption, equation,
Text extraction has many different applications. Text figure, footer, list, paragraph, section, table, and title. The
extraction from sign boards, converting physical copies of dataset used here id DocBank. It is a big dataset that was
books to digital versions, copying text data from photos produced with scant monitoring. It enables models to contain
obtained using cell phones, real-time vehicle number plate both textual and layout information for subsequent activities.
tracking, and many other applications. There are unlimited It has the advantage of being able to be used in any sequence
opportunities. Text extraction will be a sub-component in labelling model from an NLP standpoint. BERT, RoBERTa,
many systems, therefore it will be hidden beneath other layoutLM, and Faster R-CNN are the four baseline models
applications. A human can read text easily, but a computer they use in their experiments. It is designed to enable both
can't. A key to success is to take inspiration from biology. NLP and machine vision models with token-level annotations.
Humans utilize their brains to extract and make sense of visual There are three steps in DocBank: Token Annotation,
input, whereas computers employ neural networks. Humans Document Acquisition, Semantic Structures Detection.
use their eyes to take visual input, while computers use The provided document D is made up of a discrete token
cameras. As a result, we may see a link between humans and set t = t0, t1,..., tn, where each token ti = (w,(x0, y0, x1, y1)))
computer vision systems. is made up of the word w and its bounding box (x0, y0, x1,
y1). Additionally, the semantic categories into which the
Color bleeding, low resolution, low contrast, image tokens are divided are defined by C = "c0, c1,.., cm". A
dimension, layout position, document orientation, text color, function F: (C, D) S, where S is the prediction set, is what
text backdrop, uneven light, and other obstacles will be we're looking for. S = (t0 0,..., tn0 0, c0),..., (t0 k,..., tnk k, ck).
encountered while extracting text. To compensate for all of the In order to validate the effectiveness of DocBank, three
foregoing difficulties, we'll need to perform extensive picture representative pre-trained language models—BERT,
processing. RoBERTa, and LayoutLM—were evaluated on our dataset.

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 1

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

They trained the Faster R-CNN model using DocBank's object

detection format and integrated its output with the sequence
labelling models in order to compare the performance of the
models from different modalities.
Although the pre-trained BERT and RoBERTa models
showed comparable performance, the pre-trained LayoutLM
model incorporates both text and layout information. Thus, it
outperforms the benchmark dataset by a significant margin.
This is due to the fact that 2D position embeddings might
capture both the borders of the semantic structure and the
spatial distance between them in a single framework,
increasing the detection accuracy. The research' findings
imply that using DocBank to combine text and layout data is Figure 1: Text Region Extraction System
an interesting topic that merits more investigation. It is
expected that DocBank will eventually make it possible to use
more deep learning models for jobs involving document In[3] this paper they have discussed about text extraction from
layout analysis. pdf documents. Wrapper is a program that uses structured
data, such as that found on a webpage, to choose and extract
In [2], The major purpose of this paper is to recognize the useful information before delivering it to a database.
noisy newspaper photo and find the textual region from it and Lixto visual wrapper will assist end-users in extracting data
divide them from the graphical area using delicate models that from structured online pages or websites, among other things.
allow them develop and use edge information from grayscale Its goal is to create user-friendly hierarchical HTML patterns,
document pictures to extract textual blocks. The program will and there are numerous technologies that require data
trace it out with a large number of entities before merging extraction from PDF. Where people ask about extracting pdf
them into groups that are the textual area's edge points. data using NEXTWRAP, this page will detail our work with
Because of their awareness of locating text in-group with low-level segmentation methods.
numerous page layouts of newspaper photos, the provided
solutions were suggested.” The majority of knowledge on today's online pages and
websites is in PDF format. Document analysis has two levels:
In today's extensive use of technical world, paper geometric and logical, both with their own connected
documentations play a significant function. Segmenting a structures. Geometric level is available in PDF format, while
picture to a definite region using a model, such as before using logical level is retrieved with document knowledge. Extraction
OCR, different characters must be taken from the picture that patterns are Elog web extraction language forms that have the
was given; while working on this paper, during the process of position of the element in the HTML parse tree. The Lixto
checking the quality degraded newspaper picture, a discussion Visual Wrapper, recently, has been employing tree-like
is held about the matter of finding text region in it; this work inheritance for HTML documents to provide proper and clear
can also be done by an element of a work that makes the data for the user.
newspaper article retrieval method easier. There is a lot of
noise in newspaper pictures. Initially, a document can understand the process involved by
text is recovered from low-quality document images. The dividing it into blocks, which are referred to as atomic, i.e. this
extraction design is shown in below figure A grey or color is the document structure with the smallest logical entity,
image obtained by examining the newspaper serves as the which typically includes paragraphs, titles, and captions, and
system's input. Input documents contain figures, text, and experimenting with various algorithms for the sake of this
other information. process, which are clearly explained.
Transformation into a structured format: Because of
OCR techniques do not operate well with non-structured procedural language restrictions, this is critical in determining
layout documents. Their scanner can also produce poor the structure of logically leveled documentation. This
results. It has the advantage of being more efficient and conversion to a structured format, such as XML, allows us to
quicker. The disadvantage is that text is presented define the hierarchical structure's content. The current
horizontally. When textual segments pass across incorrect techniques are only in the Lixto suite to protect HTML tasks.
graphical components, the triumphant detection ratio There are several segmentation algorithms, all of which are
gradually decreases. Adaptive thresholds, rather than global improved by the OCR association, which works with input as
thresholds, should be used for advanced refinement in the binarized images. By converting the PDF, we may employ
future. these algorithms. In this case, for example, we should segment
the data immediately. The top-Down and
Bottom-Up ways to page segmentation are the two options.
They created the prototype with the help of the PDFBox

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 2

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

package to acquire access to data in the PDF, and used the quickly extracting the image's key points. With huge font and
XMIllum2 environment for visualization. They developed a some specific drawings where the corners react
new version of the top-Down algorithm called whitespace disproportionately, the strategy fails. However, it is rather
density graph in it, which will scan horizontally or vertically speedy (and easily parallelizable), and it has room for
in a given region of the space in it for projection profiling improvement so that it can compete with other cutting-edge
technique, with each point having its complete density of systems while being simpler. And last, it seems that this
whitespace in the graph representation at horizontal or vertical method can extract more complicated layouts like lines and
position, which is the most popular algorithm (x-y). paragraphs.
In[5] this paper Model LayoutLM interfaces across scanned
Low level page segmentations on higher level approaches for document between text and layout information pictures, which
document interpretation and collecting data from idea bases are useful for variety of real-world document image
form the premises. We believe that combining both the ways comprehension applications like information extraction based
improved the results, and that each strategy has its own pros on scanned documents. Pre-training models are frequently
and downsides. We made no attempt to analyze or identify used in NLP applications, however they almost always
tables on the page, and suddenly this algorithm in this scenario concentrate on text-level manipulation, omitting crucial layout
is not producing the expected results with tabular data. With and style details for document visual interpretation.
the aid of scoring-based systems, researchers are still working
on flexible and effective approaches. When a column's Embedding Position in 2D: 2-D Unlike position embedding,
location and the scores are equal, higher-level decisions are which represents the location of words in a sequence, position
made that take into account the feasibility of a segment's embedding seeks to describe the relative spatial position of
proximity to other segments that are familiar to it and the words in a document. They included an image embedding
confidence measure that goes along with it. capability that allows you to leverage a document's photo
functionality and align it with the text. In language
In [4], The paper basically focuses on corner points of the representation, an image feature layer is used. We divided the
image since the corner points are less sensitive to variations of image into several pieces, perfectly matching the phrases in
illumination, resolution, etc. Various methods such as each piece.
segmentation focus on the textual part not on the non-textual
part so results are erroneous. Techniques such as binarization, The pre-training method known as LayoutLM integrates text
noise removal are able to deliver the correct output but are a and layout knowledge into a single framework. It is simple yet
little expensive.” effective. Token embeddings, layout embeddings, and picture
embeddings are just a few of the multimodal inputs that are
● The method developed is: used in this Transformer-based system. In the meanwhile, the
1. Smooth an image by applying Gaussian filter: model may be simply trained on vast quantities of unlabeled
scanned document pictures under self-supervision. In order to
assess the LayoutLM model, three tasks—form
comprehension, receipt comprehension, and scanned
document picture categorization—are employed. Studies show
that LayoutLM significantly outperforms a number of SOTA
“here Ꝺ is deviation of filter and x, y are coordinates of pre-trained models in a variety of tasks.
image.”
2. Identify corner dots by an algorithm. III. DATASET
3. Divide it into blocks and calculate the number of corner
points. The dataset that is being used here is the char62.csv it is taken
4. lock which has a max corner point(defining a threshold). from char74k dataset which has 64 classes containing
This is a value that is correct even if characteristics of the numbers[0-9] and alphabets[A-Z][a-z] each class contains
image are changed. Take 20% of maximum density as the 1016 images. We converted the image to 32x32 pixels array
threshold. and stored in char62.csv. In addition to this dataset we also
5. Blocks that have more density points than the threshold are created the dataset of our own by creating the images of digits
added to text regions and then check the connectivity of the and alphabets in every fonts available in the operating system
blocks to build the text regions accurately. of same size and stored in the separate folder A Python-based
“ deep learning API is called Keras. It is a useful dataset for
In conclusion, this study has taught us how to train data and individuals who want to experiment with pattern recognition.
how to pre-process images using the techniques of With 20% designated as test data, the dataset is divided into
binarisation, noise reduction, and skew correction. Using a train and test subsets. Using the TensorFlow library, we built a
zoning strategy, the method keeps zones with a corner density CNN model and trained it using the train dataset.
greater than 20% of the image's density as text blocks after

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 3

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

given image to the best possible orientation using this method.

This makes it easier to distinguish between lines.

Figure 2: Some samples image from dataset.

IV. METHODOLOGY
We are doing extraction of text from the image in three steps
that is pre-processing, processing step(segmentation step) and
post-processing. The input image we are taking is scanned
document image or image taken from mobile camera and
image should contains text in the lines or paragraphs.

The input image is treated to remove any disturbances that can

influence the image during transmission in the first stage of
the pre-processing step. This eliminates challenges created by
noise, uneven lighting, and blurring effect, making text
detection, extraction, and recognition that are embedded in
picture documents simple. Figure 5: Skew Correction
In image each color (RGB) is represented by a 2D array in
which each cell is called a pixel and can store an 8-bit integer Noise Removal - A machine learning model's accuracy might
with a range of 0-255. be impacted by image noise, eliminating noise from the entire
image before sending it for categorization is critical. Noise
Some of the important Pre-processing stages are: Removal is a technique for cleaning an image of any dots,
Binarization - It's a technique for turning a colored image into texture lines, or other elements that are unimportant to the data
a grayscale image. Normally, black pixels equal 0 and white it contains.
pixels equal 255. As a result, the cutoff is 127. Binarization
aims to entirely separate black and white pixels by removing
any intermediate values. To do this, all pixel values below
threshold(127) must be set to zero, and all pixel values above
threshold(127) must be set to 255.

Figure 4: Noise Reduction

The next step is processing stage. In This stage contains

further text processing operations such as character
Figure 3: Binarization recognition and segmentation. Segmentation is the process of
dividing images into useful areas. The segmentation is done in
Skew Correction - In order to readily separate lines, the 3 levels that is line level, word level and character level.
document must first be properly oriented. We try to realign the

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

Line level segmentation: Horizontal Histogram Projection the scale parameter. Consequently, a scale exists where each
Technique is used for this. A binary image is a black and unique word creates a separate blob. When the scale parameter
white image in which pixels containing relevant information, is set to this value, the output (blob) is at its maximum.
such as text, are called foreground pixels while the remaining
part are called background pixels. In this technique, the values The procedure for blob extraction entails picking y and the
of foreground pixels along a row are added together. As a multiplication factor. Based on the discovery that the best
result, the final result will be a 1d array with a size of pixels scale for filtering corresponds to the maximum of the spatial
along the image's height. This array will now be plotted extent of the blobs, we provide an analysis that helped us build
vertically alongside the original image. Higher peaks an easy scale selection strategy.
correspond to more foreground pixels, while lower peaks
correspond to fewer foreground pixels.
We need to identify lines from a skew corrected image.
When using the Horizontal histogram to project the image,
• Row’s that represent text in a line have a higher
number of foreground pixels, which translates to
higher peaks.
• Gap rows feature a low amount of foreground
pixels, which correspond to lower peaks.
• This distinguishes one line from others (lower
peaks in histogram) Fig.7 Word segmentation using scale space technique

Character-level segmentation: We now have an image with a

single word, or a sequence of characters. The goal is to divide
the image into distinct characters.
We may simply use the prior method that was used for word
separation with different scaling parameters to separate
characters for text that is not connected.

Figure 8: Letter segmentation

Figure 6: Line level segmentation using Horizontal Histogram Projection
The model is trained using CNN in the post-processing stage.
Word-level Segmentation: We get the line images in the One of the better methods for deep learning OCR for the text
document which has sequence of words in it. The line images detection step is CNNs. Convolution layers are frequently
obtained by above process is subjected to the word employed for image classification applications because of how
segmentation. For the word segmentation in this case, we well they extract features. They allow detecting the
employed the Scale space technique[14]. The goal is to divide meaningful edges in an image and (on a higher level) forms
the image into words where we get separate word images from and complicated objects. Compared to fully-connected layers,
the line. for example, convolutional layers lower the complexity of a
The scale space theory investigates how scale affects the machine learning OCR system by repeating the pattern-
meaning of any physical observation, i.e., how certain detection filters across an image.
characteristics and objects only matter at particular scales.
Starting with an initial image in scale space, smoothed images
are produced along the scale dimension. Numerous scholars
have demonstrated in [4, 6] that under specific conditions, the
Gaussian constructs the image's linear scale space in a unique
way.

Structures like letters, words, and lines at various scales make

up a document image. However, document images differ from
other sorts of images in that a significant change in scale is not
necessary to extract a certain kind of structure. For instance,
the scales of all the words are fundamentally similar, making
it possible to extract them all without significantly changing

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

Figure 9: fully-connected layers of CNN model

Figure 11: User interface showing document Image and the text extracted
from that image.

Figure 10: Using pre-trained model to predict real character V. CONCLUSION AND FUTURE WORK
We have provided a thorough method for the detection of text
In the next step the character images that we obtained from the
areas and the identification of texts from photographs in this
above step will be passed to a CNN model trained with the
project. To work more efficiently, the system requires major
alphabet and number images which return the text format of
additional improvements. We have performed line
that image. The letter we get are combined to form word and
segmentation for our image, which gives a line image from the
the word is passed through spellchecker algorithm to get spell
paragraph in the input image. Before that, we eliminated other
corrected word. These words are added to form a line and the
kinds of images apart from the text in it. Then we performed
final output will be editable text present in the document
the word segmentation for the line images and successfully got
image. Python library is used to construct a Simple User
the separate word image in that line. After this, we segmented
Interface. There is a button there for uploading image files,
the word images into characters and got the individual
and when submitted text that is extracted from the image is
character image, which was passed to the trained CNN model,
displayed, allowing us to copy it or download the file for later
which detected the text character similar to the image. For our
use.
final output, we got the editable form of text that was in the
image. To make our work in compressed domain processing
even faster, we intend to expand it. The technique we employ
is unable to identify non-horizontal text in a picture. To better
monitor text with complicated motion, we must look into ways
to improve our technique. For text with complicated
backgrounds, our system's recognition accuracy is poor. We
have to further work on the documents with specific formats
like IEEE format documents, forms, bills, etc., and we have
done the text extraction only for English-language text
documents so we can improve it for multiple languages.

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129
VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

AN INTERNATIONAL SCHOLARLY || MULTIDISCIPLINARY || OPEN ACCESS || INDEXING IN ALL MAJOR DATABASE & METADATA

VI. REFERENCES conference on scale-space theories in computer vision

[1] Gregory Cohen, Saeed Afshar, Jonathan Tapson and 1999 Sep 26 (pp. 22-33). Springer, Berlin,Heidelberg.
Andr ́e van Schaik, “EMNIST: an extension of MNIST to
handwritten letters,” The MARCS Institute for Brain,
Behaviour and Development, Western Sydney University.
[2] Vikas Yadav ECE department Vishweshwaraya national
institute of technology Nagpur, Nicolas Ragot Université
François Rabelais Tours Laboratoire Informatique (LI
EA6300) Tours, France , “Text extraction in document
images: highlight on using corner points”, April 2016.
[3] S. Abirami, D. Manjula, “Text Region Extraction from
Quality Degraded Document Images” , Department of
Computer Science & Engg, Anna University,
ChennaiIndia ,2010.
[4] Tamir Hassan, Robert Baumgartner ,Database & Artificial
Intelligence Group, Vienna University of Technology,
Austria, “Intelligent Text Extraction From PDF”, 2005.
[5] Xu, Yiheng, et al. "Layoutlm: Pre-training of text and
layout for document image understanding." Proceedings
of the 26th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. 2020.
[6] Li, Minghao, et al. "DocBank: A benchmark dataset for
document layout analysis." arXiv preprint
arXiv:2006.01038 (2020).
[7] Kaplan, Frédéric, et al. "Combining visual and textual
features for semantic segmentation of historical
newspapers." Journal of Data Mining & Digital
Humanities (2021).
[8] Zhang, Peng, et al. "TRIE: end-to-end text reading and
information extraction for document
understanding." Proceedings of the 28th ACM
International Conference on Multimedia. 2020. Cartic
Ramakrishnan, Abhishek Patnia, Eduard Hovy1 and Gully
APC Burns , “Layout-aware text extraction from full-text
PDF of scientific articles”, 2012.
[9] Zhong, Xu, Jianbin Tang, and Antonio Jimeno Yepes.
"Publaynet: largest dataset ever for document layout
analysis." 2019 International Conference on Document
Analysis and Recognition (ICDAR). IEEE, 2019.
[10] Oral, Berke, et al. "Information extraction from text
intensive and visually rich banking
documents." Information Processing & Management 57.6
(2020): 102361.
[11] Subramani, Nishant, et al. "A survey of deep learning
approaches for ocr and document understanding." preprint
arXiv:2011.13534 (2020).
[12] Chen, Xiaoxue, et al. "Text recognition in the wild: A
survey." ACM Computing Surveys (CSUR) 54.2 (2021):
1-35.
[13] Codebasics. “Simple explanation of convolutional neural
network | Deep Learning Tutorial 23 (Tensorflow &
Python)” Online video clip. YouTube, 14 Oct 2020. Web.
10 March 2022.
[14] Manmatha R, Srimal N. Scale space technique for word
segmentation in handwritten documents. In International

PDL-III Report FINAL
No ratings yet
PDL-III Report FINAL
34 pages
Smart Image To Text and Text To Speech Reorganization Using Machine Learning
No ratings yet
Smart Image To Text and Text To Speech Reorganization Using Machine Learning
5 pages
Department of Computer Science: Image To Text Using Text Recognition & Text To Speech
No ratings yet
Department of Computer Science: Image To Text Using Text Recognition & Text To Speech
66 pages
Paper 10793
No ratings yet
Paper 10793
5 pages
Fin Irjmets1683577784
No ratings yet
Fin Irjmets1683577784
6 pages
Text Extraction and Detection From Images Using Machine Shivani
No ratings yet
Text Extraction and Detection From Images Using Machine Shivani
7 pages
Optical Character Recognizer: Team Member
No ratings yet
Optical Character Recognizer: Team Member
7 pages
Bengal College of Engineering and Technology, Durgapur: "Handwritten Text Recognition"
No ratings yet
Bengal College of Engineering and Technology, Durgapur: "Handwritten Text Recognition"
15 pages
Optical Character Recognition Using Convolutional Neural Network
No ratings yet
Optical Character Recognition Using Convolutional Neural Network
5 pages
Neural OCR for Handwriting Recognition
No ratings yet
Neural OCR for Handwriting Recognition
21 pages
OCR Technology Overview & Tools
No ratings yet
OCR Technology Overview & Tools
7 pages
Project Report 8th Sem 2 Final Edit
No ratings yet
Project Report 8th Sem 2 Final Edit
29 pages
Cam2Pdf .
No ratings yet
Cam2Pdf .
6 pages
Final - Synopsis (2) With Pages Removed
No ratings yet
Final - Synopsis (2) With Pages Removed
15 pages
Latest Base Paper
No ratings yet
Latest Base Paper
4 pages
Handwriting Recognition Using Deep Learning: Image Processing
No ratings yet
Handwriting Recognition Using Deep Learning: Image Processing
14 pages
Mini Project
No ratings yet
Mini Project
30 pages
Character Recoganization
No ratings yet
Character Recoganization
6 pages
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
No ratings yet
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
7 pages
MANVA
No ratings yet
MANVA
51 pages
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
No ratings yet
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
10 pages
Research On Text Recognition Methods Based On Artificial Intelligence and Machine Learning
No ratings yet
Research On Text Recognition Methods Based On Artificial Intelligence and Machine Learning
5 pages
Text To Voice Conversion of Text Embedded in Images
No ratings yet
Text To Voice Conversion of Text Embedded in Images
7 pages
Deep Learning for Character Recognition
No ratings yet
Deep Learning for Character Recognition
49 pages
Text Recognition Using Image Processing Technology For Visiting Card
No ratings yet
Text Recognition Using Image Processing Technology For Visiting Card
7 pages
Image Classification and Text Extraction Using Convolutional Neural Network
No ratings yet
Image Classification and Text Extraction Using Convolutional Neural Network
7 pages
Text Interpreter & Converter
No ratings yet
Text Interpreter & Converter
13 pages
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
No ratings yet
Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks
6 pages
Fin Irjmets1684836352
No ratings yet
Fin Irjmets1684836352
7 pages
Jaderberg15b PDF
No ratings yet
Jaderberg15b PDF
188 pages
Text Detection in Document Images: Highlight On Using FAST Algorithm
No ratings yet
Text Detection in Document Images: Highlight On Using FAST Algorithm
11 pages
Smart Glasses For Blind People: Abstract
No ratings yet
Smart Glasses For Blind People: Abstract
7 pages
Mit
No ratings yet
Mit
102 pages
Plagiarism Checker X Originality Report: Similarity Found: 26%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 26%
29 pages
Final New Pro v2.7 New12
No ratings yet
Final New Pro v2.7 New12
128 pages
Applying AI To Biometric Identification For Recognizing Text Using One-Hot Encoding and CNN
No ratings yet
Applying AI To Biometric Identification For Recognizing Text Using One-Hot Encoding and CNN
9 pages
Handwritten Text Recognition Using Machine Learning Techniques in Application of NLP
No ratings yet
Handwritten Text Recognition Using Machine Learning Techniques in Application of NLP
4 pages
Paper 1
No ratings yet
Paper 1
13 pages
Final Presentation
No ratings yet
Final Presentation
31 pages
Hand Written Character Recognition Using Neural Network: BACHELOR OF ENGINEERING (Computer Engineering)
No ratings yet
Hand Written Character Recognition Using Neural Network: BACHELOR OF ENGINEERING (Computer Engineering)
46 pages
Multilingual OCR Software Report
No ratings yet
Multilingual OCR Software Report
85 pages
A Survey of Deep Learning Approaches For OCR and D
No ratings yet
A Survey of Deep Learning Approaches For OCR and D
14 pages
APP2
No ratings yet
APP2
16 pages
Ocr 3
No ratings yet
Ocr 3
22 pages
Artificial Intelligence and Machine Learning Approaches To Text Recognition: A Research Overview
No ratings yet
Artificial Intelligence and Machine Learning Approaches To Text Recognition: A Research Overview
5 pages
TensorFlow Handwriting Recognition
No ratings yet
TensorFlow Handwriting Recognition
28 pages
End-To-End Text Recognition With Convolutional Neural Networks
No ratings yet
End-To-End Text Recognition With Convolutional Neural Networks
60 pages
Ijsred V8i3p312
No ratings yet
Ijsred V8i3p312
6 pages
System For Identifying Texts Written in Kazakh Language
No ratings yet
System For Identifying Texts Written in Kazakh Language
5 pages
Optical Character Recognition Using Convolutional Neural Network
No ratings yet
Optical Character Recognition Using Convolutional Neural Network
5 pages
Final Project
No ratings yet
Final Project
17 pages
Sample Project Final Document
No ratings yet
Sample Project Final Document
68 pages
Optical Character Recognition Using Artificial Neural Network
No ratings yet
Optical Character Recognition Using Artificial Neural Network
4 pages
Real-Time Handwritten Spell Check System
No ratings yet
Real-Time Handwritten Spell Check System
70 pages
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
No ratings yet
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
18 pages
Pattern 3
No ratings yet
Pattern 3
14 pages
Bofinal
No ratings yet
Bofinal
10 pages
ACC262 SPECIMEN PAPER (Nov 2024) (1) - Merged
No ratings yet
ACC262 SPECIMEN PAPER (Nov 2024) (1) - Merged
22 pages
Free Range Farming Manual
No ratings yet
Free Range Farming Manual
55 pages
HDFC Life Insurance: Market Overview 2024
No ratings yet
HDFC Life Insurance: Market Overview 2024
59 pages
The Serpent and The Crown Short Story
No ratings yet
The Serpent and The Crown Short Story
16 pages
Unboxing TP-5013 Power Socket Review
No ratings yet
Unboxing TP-5013 Power Socket Review
1 page
Wan. 2" Medicine: - Puioeopathy S
100% (1)
Wan. 2" Medicine: - Puioeopathy S
244 pages
Predicting Heart Disease at Early Stages Using Machine Learning: A Survey
No ratings yet
Predicting Heart Disease at Early Stages Using Machine Learning: A Survey
4 pages
Akta Satelit On Astra 4A at 4
No ratings yet
Akta Satelit On Astra 4A at 4
6 pages
Ecolonia: Sustainable Housing Project
No ratings yet
Ecolonia: Sustainable Housing Project
15 pages
Kinds of Adverbs
No ratings yet
Kinds of Adverbs
4 pages
Concrete Standards for Durable Structures
No ratings yet
Concrete Standards for Durable Structures
13 pages
Data Structures Complete
50% (2)
Data Structures Complete
255 pages
Conan - Shining Kingdoms - Eye of The Vulture
100% (4)
Conan - Shining Kingdoms - Eye of The Vulture
14 pages
Karthik June24
No ratings yet
Karthik June24
1 page
How To Derive A Formula (Vol. 1) (By Alexei A. Kornyshev Dominic OLee)
100% (2)
How To Derive A Formula (Vol. 1) (By Alexei A. Kornyshev Dominic OLee)
702 pages
40 (3ph) KVA-WC
No ratings yet
40 (3ph) KVA-WC
5 pages
The Tef Business Summary Template 22
No ratings yet
The Tef Business Summary Template 22
4 pages
Fractions Year 4
100% (1)
Fractions Year 4
3 pages
Digital Management To Shape The Future: Richard C. Geibel Shalva Machavariani
No ratings yet
Digital Management To Shape The Future: Richard C. Geibel Shalva Machavariani
298 pages
Grades 1-12 Performance Overview
100% (2)
Grades 1-12 Performance Overview
12 pages
Saint Mary’s School Rajkot Mind Fest Quiz
No ratings yet
Saint Mary’s School Rajkot Mind Fest Quiz
4 pages
ТЕМА 3
No ratings yet
ТЕМА 3
4 pages
S8 - MP - Allocation List (A) - MP-24-Apr-2025
No ratings yet
S8 - MP - Allocation List (A) - MP-24-Apr-2025
3 pages
E-Sax Manual July 2011 Rev A
No ratings yet
E-Sax Manual July 2011 Rev A
7 pages
"Good, Better, Best" How Do I Know Which Progesterone Cream To Buy
100% (2)
"Good, Better, Best" How Do I Know Which Progesterone Cream To Buy
26 pages
Lectures in International Marketing 2019
No ratings yet
Lectures in International Marketing 2019
61 pages
Level Hex Level Dif Total Hex Total Class
No ratings yet
Level Hex Level Dif Total Hex Total Class
55 pages
BBCCT 103
No ratings yet
BBCCT 103
7 pages
Database Lab: Attributes & Queries
No ratings yet
Database Lab: Attributes & Queries
3 pages
Gnome Sheet
No ratings yet
Gnome Sheet
4 pages

Text Extraction From Document Image

Uploaded by

Text Extraction From Document Image

Uploaded by

INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND MANAGEMENT ISSN: 2583-6129

VOLUME: 02 ISSUE: 05 | MAY – 2023 DOI: 10.55041/ISJEM00744 WWW.ISJEM.COM

Text Extraction From Document Image

Department of CSE, PES University, Bangalore-79, Karnataka

Email: [email protected], [email protected], [email protected], [email protected]

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 1

They trained the Faster R-CNN model using DocBank's object

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 2

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 3

given image to the best possible orientation using this method.

Figure 2: Some samples image from dataset.

The input image is treated to remove any disturbances that can

Figure 4: Noise Reduction

The next step is processing stage. In This stage contains

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 4

Character-level segmentation: We now have an image with a

Figure 8: Letter segmentation

Structures like letters, words, and lines at various scales make

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 5

Figure 9: fully-connected layers of CNN model

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 6

VI. REFERENCES conference on scale-space theories in computer vision

© 2023, ISJEM (All Rights Reserved) | www.isjem.com | Page 7

You might also like