0% found this document useful (0 votes)
18 views45 pages

Project Final File

Uploaded by

Devanshu verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views45 pages

Project Final File

Uploaded by

Devanshu verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

A PROJECT REPORT

ON

SNAP SEEKERAndroid Application


SUBMITTEDFOR PARTIAL FULFILLMENT FOR THE
AWARD OF THE DEGREE OF
of
Bachelor of Technology(B.Tech.)
In
“Computer Science and Engineering”

by
DEVANSHU VERMA 1902300100029
PANKAJ 1902300100065

Under theSupervision
of

Prof. Saranya Raj

DRONACHARYA GROUP OF INSTITUTIONS


GREATER NOIDA
(Dr. A.P.J. Abdul Kalam TechnicalUniversity)
MAY, 2023
CERTIFICATE
Certified that Devanshu Verma and Pankaj has carried out the research work presented in
this dissertation entitled "Snap_Seeker Android Application" for the award of Bachelor of
Technology in CSE from DGI- under my supervision. The dissertation embodies result of
original work and studies carried out by students themselves and the contents of the
dissertation do not form the basis for the award of any other degree candidates or to anybody
else.

Project Guide Head Of Department


Prof. Saranya Raj (Associate Professor)
Prof. Bipin Pandey
Department of CSE Department of CSE
Dronacharya Group of Institution Greater Dronacharya Group of Institution
Noida Greater Noida

Date:
Place: DGI, GN

(ii)
ABSTRACT
The popularity of machine learning technologies has increased exponentially in the past ten
years. In many fields of study, possible use cases for machine learning are tested continuously.
Thus, the basics of developing software that uses machine learning are becoming more critical
by the day.

This report describes the analysis, design, and implementation of the proposed application
(Snap_Seeker Android Application). In this application users can take the photos of the object
from their mobile phone and run it through the object detector for keywords related to the
image then the application searches for information on the web about the keyword and the
information is shown to the end user.

Snap_Seeker is an Android application designed to convert images containing text into digital
text that can be edited, copied, and shared. The application uses optical character recognition
(OCR) technology to extract the text from images and convert it into machine-readable format.
The Snap_Seeker application allows users to capture images of text using their smartphone
camera or select an image from their photo library. Once the image is captured or selected, the
application processes the image using OCR to extract the text.

Users can then edit the text, copy it to the clipboard, or share it through various means such
as email or social media. The application supports multiple languages and fonts, making it
suitable for a wide range of users.

In addition, the Snap_Seeker application includes features such as image cropping, rotation,
and resizing, which can be useful for improving the quality of the image before processing.
The application is user-friendly and easy to use, making it suitable for users of all levels of
technical proficiency

Overall, Snap_Seeker is a useful application for anyone who needs to extract text from

images quickly and efficiently.

(iii)
ACKNOWLEDGEMENT

It is always a difficult task to acknowledge all those who have been of tremendous help in an
academic project of this nature and magnitude nevertheless, we have made a sincere attempt to
express our gratitude to all those who have contributed to the successful completion of this
project through this project report.

As we represent this report on “Snap_Seeker Android Application” we are aware of the


humanity and gratitude to all the individuals who have so kindly offered us their time, skill,
knowledge, advice and facilities or guidance.

We are extremely grateful to Prof. K.K. SAINI, DIRECTOR DGI-GN for giving his valuable
input and opportunity to develop this project and making all the resources available to us
with the intension of success of this project

We would also like to thank Prof. Bipin Pandey, H.O.D DGI-GN for his idea, advice,
knowledge and about all his support throughout this project.

We also take this opportunity to express my deepest gratitude to our Project Guide and our
faculty Prof. Saranya Raj, Associate Professor DGI-GN for his constant support, guidance and
encouragement and thus being a constant source of inspiration for us.

Last but not the least we would like to thank my parents, all my friends who have always
been a strong support during the entire course of my project and without their co-operation
the completion of this project would not have been possible.

Devanshu Verma - 1902300100029


Pankaj - 1902300100065

(iv)
LIST OF FIGURES (v)

Figure No CONTENT Page No.

1 Difference b/w a traditional and a machine learning approach 1

2 Different learning methods 3

3 Supervised learning method 3

4 Unsupervised learning method 4

5 Relationship between mobile application and local model 5

6 Optical Character Recognition Detection 7

7 Data flow diagram of the application 12

8 Internal Representation of RGB 15

9 Binarization 16

10 Formula used 16

11 Skew Correction. 19

12 Feature Upscaling. 21

13 Text Recognition 22

14 Postprocessing 22

15 Illustration of CNN architecture used as vanilla model 23


16 Illustration of CNN 24

17 ML Kit architecture 25

18 Android studio Platform IDE 27

19 App Demo (image capture screen) 29

20 Open camera Intent. 30

21 Passing data to ML Model 30

22 App Demo (Classify image screen). 31

23 Resulted App 31

24 Conclusion And Future Scope 32

25 References 33

(vi)

LIST OF TABLES
Table No CONTENT Page No.
1 Dependency used in the android application 45

2 Classes of the android application 46


(vii)
TABLE OF CONTENTS
CERTIFICATE ii
ABSTRACT iii
ACKNOWLEDGMENT iv
LIST OF FIGURES v
LIST OF TABLES vii
TABLE OF CONTENTS viii

CHAPTER 1: INTRODUCTION: MACHINE LEARNING 1-8


1.1 What is Machine Learning?
1.2 Basic Requirement: Learning Methods
1.2.1 Supervised Learning
1.2.2 Unsupervised Learning
1.3 Basic Requirements: Model and Training
1.4 On-device Machine Learning
1.5 Applications of Machine Learning
1.5.1 General Applications
1.5.2 Optical Character Recognition detection 1.6
1.6 Objective of our project

CHAPTER 2: SURVEY OF TECHNOLOGIES 9-11


2.1 Software Requirements
2.2 Hardware Requirements

CHAPTER 3: SYSTEM DESIGN: FLOW CHART 12-13

CHAPTER 4: MODEL IMPLEMENTATION INFORMATION 14-22


4.1 Image Pre-Processing
4.2 Text-Detection
4.3 Text-Recognition
4.4 Postprocessing

CHAPTER 5: ABOUT MODAL 23-26


5.1 Model Compilation
5.1.1 Convolutional Neural Networks
5.2 Model training
5.3 API Model Architecture

CHAPTER 6: SNAP_SEEKER APP IMPLEMENTATION 27-31


6.1 Android Platform
6.2 Image Intent
6.3 Model Conversion of image into text
6.4 Resulted Application working Images
CHAPTER 7: CONCLUSION & FUTURE SCOPE 32

REFERENCES 33-34
APPENDIX 35
CHAPTER-1
INTRODUCTION: MACHINE LEARNING

1.1 What is Machine Learning?


The term “machine learning” is quite self-explanatory. In simple terms, machine learning (or
“ML” for short) is a capability for the computer to start making decisions without being explicitly instructed

each time. Machine learning is a subfield of artificial intelligence (or “AI”).[2].

When a computer program can improve the way that the tasks are performed by using its previous experience,
we can then say that the computer is learning. This scenario is entirely different from one where a program
performs a task based on existing rules, logic, and information added by a programmer. For example, a chess
game where the programmer implements a winning strategy for the program, illustrates this perfectly. The
machine learning approach differs from that because it does not have any clue on how to win the game at first.
The programmer only defines a set of legal moves, and the rest is for the computer to learn. By learning from
its mistakes, the computer starts to win eventually, but this requires plenty of repetition.
Machine learning is an iterative process for the computer (Figure 1), meaning that it takes some time to
complete, just like the process for humans to learn to ride a bicycle. The process requires much previous
experience, trial, and error to achieve mastery in a specific task.

Figure 1. Difference between a traditional and a machine learning approach Machine

learning simplifies many everyday tasks that would be tedious for a programmer to code
traditionally. Typical tasks that would otherwise require infinite lines of code because they do

1|Page
not have a well-defined set of rules, meaning that they are non-existent or too abstract for the
programmer to follow. The information processing also takes time, and in real-time
applications, a way for the computer to make assumptions based on previous experiences will
reduce the processing time significantly. Computers were the first ones to use machine
learning, but it is proven that mobile development with machine learning implemented on
mobile devices is trendy. Modern mobile devices show a high productive capacity level that
is close enough to perform appropriate tasks to the degree that modern computers do. Any
application that does one of the following is using machine learning in some way:

● Speech recognition

● Computer vision and image classification

● Gesture recognition

● Translation from one language into another

● Interactive on-device detection of text

● Autonomous vehicles, drone navigation, and robotics

When applying more advanced technologies and algorithms in a mobile environment one of
the challenges is the limited computational power of the mobile hardware. As inference is
computationally expensive, it is crucial that operations are optimised for mobile devices. By
using the mobile version of TensorFlow (TF) namely TensorFlow Mobile (TFM) and the
updated mobile framework TensorFlow Lite (TFL), developers can use pre-trained models on
mobile devices for inference with optimisation for mobile hardware [3].

1.2 Basic requirements: Learning methods


Machine learning can, on a larger scale, be categorised into supervised- and unsupervised
learning methods. Other methods, like semi-supervised and reinforced learning (Figure 2),
are out-of-scope of this thesis.

2|Page
Figure 2. Different learning methods

1.2.1 Supervised learning


The main goal of supervised learning is to create a function that maps inputs to outputs so
well that the unforeseen input (x) can be mapped to output (y) by predicting where it should
belong based on prior knowledge. A set of single-digit numbers is a good example that is also
used in the demo project later. When all ten digits are labelled, and a new image of a number
comes in, the model should be able to predict what number it was (Figure 3).

Figure 3. Supervised learning method

In supervised learning, if the predicted output is a discrete value, such algorithms are part of
classification algorithms. However, when the predicted output is a continuous value, such
algorithms fall under the umbrella of regression algorithms. These differences must be taken
into consideration when training the model.
List of multiple algorithms that are associated with the supervised learning, such as K-nearest
neighbours, Naive Bayes, Decision trees, Linear regression, Logistic regression, Support
vector machines, and Random Forest.

3|Page
1.2.2 Unsupervised learning
On the contrary, unsupervised learning does not learn under supervision. The model learns
based on the data that gets fed to it and discovers hidden patterns in the data (Figure 4). This
type is useful when there are enormous amounts of data, and the patterns we are looking for
are unknown. Unsupervised learning algorithms can, in general, provide useful insights about
given data like confirm what we might already know or, in some cases, predict what is going
to happen next. A suitable example of using unsupervised learning could be customer
segmentation. When vast amounts of customer data are available that captures all customer
features, unsupervised learning algorithms could cluster different kinds of customers. This
type of information could be precious to the marketing team.

Figure 4. Unsupervised learning method

1.3 Basic requirements: Models and training


Machine learning uses models to store past experiences; the models are then used by the
computer to apply the machine learning algorithm to the problem in hand. Model formats
vary between different machine learning frameworks. As a side-effect, this causes most of the
existing model formats to be incompatible with 14 each other. A particular script or program
is needed to convert between different model formats. The model storage space requirements
vary greatly too, but usually, the models optimised for mobile platforms are much smaller
than their desktop counterparts.

4|Page
Model training can be done from scratch and can be a tedious, time-consuming task. Training
from scratch can, however, yield more reliable results if done correctly. One of the
prerequisites is that the problem needs to be well-defined. When the machine learning
problem is well-defined, the following conditions are satisfied: We have the right problem,
the right data, and the right criteria for success. Sometimes this is the only way to approach a
particular issue because application data requirements can be precise.
An alternative way is to use an existing model if a suitable model is available and retrain it
based on application needs. Using the existing model can save valuable time and resources at
the possible cost of model accuracy. That is why the model used for the basis should be
selected very carefully.

In both cases, the model requires a useful and extensive dataset, meaning that it is qualitative
but also quantitative. In the end, when training the model, it is all about following the best
practices available. On the list below, there are a few examples of these best practices.

● Use multiple success metrics

● Use metrics that are easy to understand

● Use metrics that make the model comparisons easier

● Introduce randomness to training data in order to avoid biases. Reduce useless or

redundant training data.

1.4 On-device Machine Learning


Nowadays, people are spending more time with mobile devices, and it makes much more

sense to run machine learning models on the device itself instead of storing it in the cloud.

The most significant advantage of the on-device model is that the model is always available
without the need to send information back and forth (Figure 5). Localness is also the reason
why using the model is convenient.

5|Page
However, the Cloud model is not useless because the
Cloud model can be updated seamlessly, without users
ever noticing the difference. The Cloud model is the
way to go if the model file is enormous and would
drastically increase the application size (Figure 6).

Figure 5. Relationship between mobile application and local (on-device) model

1.5 Application of Machine Learning

1.5.1 General applications


Various features used in existing applications use machine learning. High-level examples of
these features are email spam identification systems, product recommendations on
ecommerce sites, and automatic face tagging functionality on social media. All formerly
mentioned features use machine learning to a varying extent.

Today, many applications rely heavily on using machine learning frameworks for specific
tasks, but still use traditional solutions for most of their functionality. Machine learning is not
a replacement for conventional ways of developing software but an extension of it. In many
cases, the machine learning approach can simplify the convoluted logic of specific tasks.

These popular mobile applications use machine learning.

● Google Maps

● Facebook
● Snapchat

6|Page
● Netflix

● Tinder

● Uber

There are many other popular applications not listed that use machine learning functionality.
What comes to using machine learning functionality in viral applications is a smart move
because, firstly, it can massively improve the user experience and, secondly, provide
information with actual business value to the developer(s) or the company.

1.5.2 Optical Character Recognition detection

OCR stands for Optical Character Recognition. An OCR model is a type of machine learning
model that is designed to recognize text characters in images or scanned documents and
convert them into machine-readable text. The goal of OCR is to automate the process of data
entry and document digitization by extracting text from images, which can then be processed
and analysed by computers. OCR models typically use techniques from computer vision and
natural language processing to recognize text characters and convert them into digital text.
OCR models can be trained on large datasets of images and corresponding text to improve
their accuracy and recognize a wide range of fonts and languages.

Figure 6. OCR Model

OCR is used in a variety of applications, including document scanning and conversion,


automated data entry, and text recognition in images and videos.

7|Page
1.6 Objective of our project
The first goal of this thesis is to create an instruction manual of the development process of
the application (Snap_Seeker Android Application). which machine learning framework was
used, including information about machine learning methods, models, general android
applications, techniques, suitability, and more.

Secondly, to fully understand the basic concepts of machine learning development, a “object
classification” demonstrating the development process, is an essential factor. The project uses
a machine learning framework, which allows the development process while also showcasing
the required setup and dependencies when using such a framework.

The objective of text from image recognition project is to develop an algorithm or model that
can accurately identify and extract text from images. This is typically done using a
combination of computer vision techniques and natural language processing (NLP) methods.

The extracted text can be used for a variety of purposes, such as:

Improving accessibility: Text from image recognition can help make images more accessible

to visually impaired individuals by converting the text from the image.

Enhancing searchability: Extracted text can be used to improve the searchability of images by

allowing users to search for specific keywords or phrases within the image.

Automated data entry: Text from image recognition can automate data entry by extracting

information from images and converting it into structured data.

Translation: Extracted text can be used for translation purposes, such as translating foreign
language text in images into the user's language.

Overall, the objective of text from image recognition is to make information contained in

images more accessible and usable for a wide range of applications.

8|Page
CHAPTER – 2

SURVEY OF TECHNOLOGIES
2.1 Software requirements

● Anaconda
Anaconda is a distribution of the Python and R programming languages for scientific
computing that aims to simplify package management and deployment. The
distribution includes data-science packages suitable for Windows, Linux, and macOS.

We used anaconda as our environment for our python installation and trained our
model on it. It managed all the packages like OpenCV, NumPy, Keras etc. and CPU
distribution for fast usage.

● Jupiter Notebook
The Jupiter Notebook is an open-source web application that you can use to create and

share documents that contain live code, equations, visualisations, and text.

Jupiter notebook has offered a lot of tools in the development of the project.

We were able to plot graphs and execute code in a single file. It reduces the time

consumption in the development cycle.

● OpenCV
OpenCV (Open-Source Computer Vision) library is a widely-used open-source software

library for developing computer vision and machine learning applications. In the field of

optical character recognition (OCR), OpenCV provides a range of image processing and

computer vision functions that are essential for the development of OCR systems.

OpenCV can be used for a variety of OCR tasks, such as image preprocessing, text detection,
character recognition, and post-processing. It includes a variety of algorithms and
methods

9|Page
that can be used for these tasks, including edge detection, morphological operations,
connected component analysis, template matching, machine learning-based OCR,
and deep learning-based OCR.

● Matplotlib
Matplotlib is a low-level graph plotting library in python that serves as a visualisation

utility. In our project we needed to plot the output of our training date to evaluate the

performance of the model.

Plotting graphs makes the train and implementation easy to understand.

● Google: TensorFlow Lite


TensorFlow Lite by Google is an open-sourced cross-platform machine learning
framework aimed at mobile development. TensorFlow Lite is the lightweight version
of its big brother, TensorFlow, and is suitable for microcontrollers, mobile, and
embedded devices (such as the Internet of Things). (Khandelwal 2020.) The newest
stable release of TensorFlow Lite is version 2.2.0 [3].
TensorFlow Lite is a popular machine learning framework primarily used in Android
applications. Newer versions of TensorFlow Lite can use Android Neural Network
API (NNAPI) to accelerate the inference functionality. The framework supports a
wide variety of different models, but the most common model format is
TFLiteformat. Supported mobile platforms

● Android

● iOS
The usage of TensorFlow Lite on Android and iOS platforms is very straightforward.
Both platforms require the framework as a dependency. The only real difference is that on
iOS, the project must use a Cocoa Pods dependency manager.

● Android Studio
Android Studio is an IDE for android application development and we have used this

10 | P a g e
IDE for our project development. Android studio gives us the compatibility with

TensorFlow models that is why we were able to use it in the application.

2.2 Hardware Requirements

● Computer

We used HP, Intel i5 for the development of our application. Android studio IDE was

used on the computer.

● GPU

We used Nvidia's GPU for training our deep learning model. Because our deep

learning training had a lot of data and large data requires large computational power.

● Mobile Phone

We used an android mobile phone to run and test the application.

● Data Cable

We need to use data cable for run, test and use the application on android mobile.

11 | P a g e
CHAPTER – 3
SURVEY OF TECHNOLOGIES

Figure 7: Data flow diagram of the application

In this project we will be building an android application. The application has a machine

learning model implemented inside it which is loaded when the application starts. Through

the application we open the mobile phone camera and take the photo of the image that we

want to take image about.

The image from the camera is then compressed and sent to the ML model which is integrated

in the application. The application shows the recognize text which we can edit according to

our requirement and copy paste it anywhere we want.

Next step of the application is to gather information from the image. For this purpose, we

used Camera and Gallery image content also to convert it into a recognised text after

processing.

The diagram can start with an input image, which is processed through the following stages:

12 | P a g e
1. Image Pre-processing: This stage involves image enhancement techniques such as

noise removal, contrast adjustment, and edge detection. The output of this stage is a

pre-processed image ready for further processing.

2. Character Segmentation: In this stage, individual characters are identified and

separated from the pre-processed image. This stage may involve techniques such as

connected component analysis and contour detection.

3. Feature Extraction: This stage involves extracting relevant features from the

segmented characters. Features such as edges, corners, and texture are commonly

used.

4. Character Recognition: In this stage, the feature vectors generated in the previous

stage are used to recognize the characters. This stage involves techniques such as

pattern recognition, machine learning, and artificial neural networks.

5. Postprocessing: This stage involves refining the recognized characters to improve the

accuracy of the output. Techniques such as spell-checking, context analysis,etc.

6. Output Text Generation: In this stage, the recognized and refined characters are

combined to generate the output text. This output text can be in a digital format such

as a text file or a searchable PDF.

7. Verification: In this stage, the accuracy of the output text is verified using techniques

8. such as error rate analysis and ground-truth comparison.

9. Output: This stage involves presenting the final output to the user. The output can be

in the form of a digital file or printed document.

10. Each stage of the OCR model is represented by a colourful image, making it easier to

understand the various stages involved in the OCR process.

13 | P a g e
CHAPTER - 4

IMPLEMENTATION OF APPLICATION

4.1 Image Pre-Processing


Image pre-processing is a critical step in OCR that involves modifying the image to enhance
the quality of the text for character recognition. This process typically involves various image
processing techniques, such as filtering, thresholding, normalization, and enhancement, to
improve the contrast, sharpness, and resolution of the text regions in the image.
One of the primary goals of image pre-processing is to separate the foreground (text) from
the background, making it easier for the OCR algorithm to identify the characters. For
example, image thresholding is a commonly used technique that converts the grayscale
image into a binary image, where the text pixels are set to black and the background pixels
are set to white.
Image normalization is another essential technique that adjusts the contrast, brightness, and
gamma values of the image to make the text more visible and easier to recognize. This helps to
reduce any distortions or inconsistencies in the image that can negatively impact the OCR
algorithm's performance.

Another crucial pre-processing step is image de-skewing, which involves correcting the
image orientation by aligning the text regions along a horizontal or vertical axis. This helps to
improve the accuracy of the OCR algorithm by reducing any skew or distortion in the text
regions that can make it difficult for the algorithm to recognize the characters.

Overall, image pre-processing is an essential step in OCR that significantly impacts the
accuracy and speed of character recognition. By enhancing the image quality and removing
any noise or unwanted artifacts, image preprocessing can help to ensure that the OCR
algorithm can accurately recognize and extract the text from the image.

The main objective of the Pre-processing phase is to make as easy as possible for the OCR system to
distinguish a character/word from the background. Some of the most basic and important Pre-processing
techniques are: -
1) Binarization

2) Skew Correction

3) Noise Removal

4) Thinning and Skeletonization

14 | P a g e
Before discussing these techniques, let’s understand how an OCR system

comprehends an image. For an OCR system, an Image is a multidimensional array

(2D array if the image is grayscale (or) binary, 3D array if the image is coloured).

Each cell in the matrix is called a pixel and it can store 8-bit integer which means

the pixel range is 0–255.

Figure 8. Internal Representation of RGB

Internal Representation of RGB image with Red, Green and Blue


Channels. Source: left image from semantics scholar, right image
from researchgate.

Let’s go through each preprocessing technique mentioned above one-by-one

• Binarization: In layman’s terms Binarization means converting a coloured

image into an image which consists of only black and white pixels (Black

pixel value=0 and White pixel value=255). As a basic rule, this can be

done by fixing a threshold (normally threshold=127, as it is exactly half

of the pixel range 0–255). If the pixel value is greater than the threshold,

it is considered as a white pixel, else considered as a black pixel.

15 | P a g e
Binarization conditions. Source: Image by author

But this strategy may not always give us desired results. In the cases where lighting

conditions are not uniform in the image, this method fails.

Figure 9. Binarization

Binarization using a threshold on the image captured under non-uniform lighting.


Source: left image from this post and right image binarised by author.

So, the crucial part of binarization is determining the threshold. This can be done by

using various techniques.

→ Local Maxima and Minima Method:

Figure 10. Formula

Imax= Maximum pixel value in the image, Imin= Minimum pixel value in the
image, E = Constant value Source: Reference [2]

16 | P a g e
C(i,j) is the threshold for a defined size of locality in the image (like a 10x10 size

part). Using this strategy we’ll have different threshold values for different parts of

the image, depending on the surrounding lighting conditions but the transition is not

that smooth.

→ Otsu’s Binarization: This method gives a threshold for the whole image

considering the various characteristics of the whole image (like lighting

conditions, contrast, sharpness etc) and that threshold is used for Binarizing image.

This can be accomplished using OpenCV python in the following way:

ret, imgf = cv2.threshold(img, 0, 255,


cv2.THRESH_BINARY,cv2.THRESH_OTSU) #imgf contains Binary image

-> Adaptive Thresholding: This method gives a threshold for a small part of the

image depending on the characteristics of its locality and neighbours i.e there is no

single fixed threshold for the whole image but every small part of the image has a

different threshold depending upon the locality and also gives smooth transition.

imgf = cv2.adaptiveThreshold(img,255,

cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2) #imgf
contains Binary image

2. Skew Correction: While scanning a document, it might be slightly skewed

(image aligned at a certain angle with horizontal) sometimes. While extracting the

information from the scanned image, detecting & correcting the skew is crucial.

Several techniques are used for skew correction.

→ Projection profile method

→ Hough transformation method

17 | P a g e
→ Topline method

→ Scanline method
However, the projection profile method is the simplest, easiest and most widely

used way to determine skew in documents.

In this method, First, we’ll take the binary image, then

• project it horizontally (taking the sum of pixels along rows of the image

matrix) to get a histogram of pixels along the height of the image i.e

count of foreground pixels for every row.

• Now the image is rotated at various angles (at a small interval of angles
called Delta) and the difference between the peaks will be calculated

(Variance can also be used as one of the metrics). The angle at which

the maximum difference between peaks (or Variance) is found, that

corresponding angle will be the Skew angle for the image.

• After finding the Skew angle, we can correct the skewness by rotating

the image through an angle equal to the skew angle in the opposite

direction of skew.

Correcting skew using the Projection Profile method. Source: Reference [1]

import sys
import matplotlib.pyplot as plt
import numpy as np from PIL
import Image as im
from scipy.ndimage import interpolation as interinput_file =
sys.argv[1]img = im.open(input_file)# convert to binary wd, ht = img.size pix
= np.array(img.convert('1').getdata(), np.uint8) bin_img = 1 -
(pix.reshape((ht, wd)) / 255.0)
plt.imshow(bin_img, cmap='gray') plt.savefig('binary.png')def find_score(arr,
angle):
data = inter.rotate(arr, angle, reshape=False,
order=0) hist = np.sum(data, axis=1) score
= np.sum((hist[1:] - hist[:-1]) ** 2) return
hist, scoredelta = 1

18 | P a g e
limit = 5
angles = np.arange(-limit, limit+delta, delta) scores
= [] for angle in angles: hist, score = find_score(bin_img,
angle) scores.append(score)best_score = max(scores)
best_angle = angles[scores.index(best_score)] print('Best
angle: {}'.formate(best_angle))# correct skew data =
inter.rotate(bin_img, best_angle, reshape=False, order=0)
img = im.fromarray((255 *
data).astype("uint8")).convert("RGB") img.save('skew_corrected.png')

Figure 11. Skew Correction

Skew Correction. Source: pyimagesearch.com by Adrian Rosebrock

• Noise Removal: The main objective of the Noise removal stage is to

smoothen the image by removing small dots/patches which have high

intensity than the rest of the image. Noise removal can be performed

for both Coloured and Binary images.

One way of performing Noise removal by using

OpenCV fastNlMeansDenoisingColored function.

import numpy as np
import cv2
from matplotlib import pyplot as plt
# Reading image from folder where it is stored img =
cv2.imread('bear.png')
# denoising of image saving it into dst image
dst = cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 15)
# Plotting of source and destination image plt.subplot(121),
plt.imshow(img)
plt.subplot(122), plt.imshow(dst) plt.show()

Smoothening and Denoising of image. Source: Reference [4]


More about Noise removal & Image smoothening techniques can be found in

this wonderful article

19 | P a g e
4. Thinning and Skeletonization: This is an optional preprocessing task which

depends on the context in which the OCR is being used.

→ If we are using the OCR system for the printed text, No need of performing this

task because the printed text always has a uniform stroke width.

→ If we are using the OCR system for handwritten text, this task has to be

performed since different writers have a different style of writing and hence

different stroke width. So to make the width of strokes uniform, we have to perform

Thinning and Skeletonization.

This can be performed using OpenCV in the following way import


cv2

Import numpy as npimg =


cv2.imread('j.png',0) kernel =
np.ones((5,5),np.uint8) erosion =
cv2.erode(img,kernel,iterations = 1)

In the above code, Thinning of the image depends upon kernel size and no.of

iterations. In this article, we have seen some of the basic and most widely used

Preprocessing techniques which gives us a basic idea of what’s happening inside the

OCR system.

4.2 Text-Detection
In general, an object detection problem in computer vision refers to the task of detecting object positions in
images. The output of such algorithms is a list of bounding boxes corresponding to the positions of each
object detected in the image.

Using the same extracted features as in object detection, the set of (N, N) spatial features, we
now need to upscale those back to the image dimensions (H, W). Remember that N is smaller
than H or W.

Feature upscaling:

Especially if our (N, N) matrices are much smaller than the image, basic upsampling would
bring no value. Rather than just interpolating our matrices to (H, W), architectures have

20 | P a g e
different tricks to learn those upscaling operations such as transposed convolutions. We then
obtain fine-grained features for each pixel of the image.

Figure 12. Feature Upscaling

Binary classification:
Using the many features of these pixels, a few operations are performed to determine their
category. In our case, we will only determine whether the pixel belongs to a word, which
produces a result like the “segmentation” illustration from the previous figure.

Bounding box conversion:


Finally, the binary segmentation map needs to be converted into a set of bounding boxes. It sounds like a
much easier task than producing the segmentation map, but two words close to each other might look like
they are the same according to your segmentation map. Some postprocessing is required to produce
relevant object localization.

4.3 Text-Recognition
The Text Recognizer segments text into blocks, lines, elements and symbols. Roughly speaking:
• a Block is a contiguous set of text lines, such as a paragraph or column,
• a Line is a contiguous set of words on the same axis, and
• an Element is a contiguous set of alphanumeric characters ("word") on the same axis
in most Latin languages, or a word in others
• an Symbol is a single alphanumeric character on the same axis in most Latin
languages, or a character in others
The image below highlights examples of each of these in descending order. The first
highlighted block, in cyan, is a Block of text. The second set of highlighted blocks, in blue,
are Lines of text. Finally, the third set of highlighted blocks, in dark blue, are Words.

21 | P a g e
Figure 13. Text Recognition

For all detected blocks, lines, elements and symbols, the API returns the bounding boxes,
corner points, rotation information, confidence score, recognized languages and recognized
text.

4.4 Postprocessing
Post processing is an important step in OCR that involves analysing and correcting errors in
the recognized text output. OCR algorithms may make mistakes in character recognition due
to various factors such as image quality, font styles, and language complexities. post
processing techniques help to correct these errors and improve the accuracy of the final text
output.

Figure 14. Post processing

Post processing typically involves comparing the recognized text output against a dictionary
of known words and analysing the context and grammar of the text to identify and correct
errors. It can also involve using predefined rules or machine learning algorithms to correct
specific types of errors.

The goal of post processing is to ensure that the recognized text output is accurate and reliable
for use in various applications such as text search, transcription, and translation. By applying

22 | P a g e
post processing techniques, the accuracy of the final text output can be significantly improved,
making it more suitable for use in real-world scenarios.

CHAPTER - 5

ABOUT MODAL

5.1 Model Compilation


CNN or Convolutional Neural Network is a type of deep learning neural network algorithm
that is often used in the image classification process. Recently, deep learning algorithms have
had great success in various computer vision problems [11]. In its application, CNN
implements a convolutional layer which is used to carry out the convolution process by
creating a kernel based on the data section of the input data.
The output of the convolutional layer is then processed at the pooling layer which is used to
reduce the amount of data without losing important features by taking data that has been
determined by the formula on certain pieces of data, such as the average value or the
maximum value. This process is then repeated with an amount as the total of convolutional
and pooling layers. Then, the data is generally flattened to convert the data into one long
dimension. And for the last step, the data is then going into the fully-connected layer for the
classification process to get the output of the correct label from the CNN process. In this
study, the

Figure15: Illustration of CNN architecture used as vanilla model

Vanilla CNN architecture used for the experiment is illustrated in Figure.

23 | P a g e
Convolutional Neural Network analyses the image, sending it to the recurrent part of the
important features detected. The recurrent part analyses these features in order, taking into
consideration previous information in order to realize what are some important links between
these features that influence the output.

To understand a bit more about how a CRNN works in some tasks, let’s take Handwritten Text
Recognition as an example.

Let’s imagine we have images containing words, and we want to train the NNet in order to
give us what word is initially in the image.

Firstly, we would want our Neural Network to be able to extract important features for
different letters, such as loops from “g” or “l”, or even circles from “a” or “o”. For this, we can
use a Convolutional Neural Network. As explained earlier, CNN uses filters in order to extract
the important features (we saw how different filters have different effects on the initial image).
Of course, these filters will detect in practice more abstract features that we can’t really
understand, but intuitively we can think of simpler features, such as the ones mentioned
earlier.

Then, we would want to analyse these features. Let’s take a look as to why we can’t decide
what a letter is based solely on its own features. In the image below, we see that the letter is
either “a” (from “aux”) or “o” (from for).

The difference is made by the way the letter is linked to the other letters. So, we would need to
know information from previous places in the image in order to determine the letter. Sounds
familiar? This is where the RNN part comes in. It analyzes recursively the information
extracted by the CNN, where the input for each cell might be the features detected in a specific
slice of the image, as represented below, with only 10 slices (less than we would use in real
models):

24 | P a g e
Figure16: Illustration of CNN

We don’t feed the RNN the image itself, as shown in the above image, but rather the features
extracted from that “slice”.

We might also see that processing the image forward is as important as processing the image
backward, so we can add a layer of cells that process the features in the other way, taking into
consideration both of them when computing the output. Or even vertically, depending on the
task at hand.

5.2 Model Training


Move into the following directory:

Models/research/attention_ocr

Open the file named 'common_flags.py' and specify where you'd want to log your training. And
run the following command on your terminal:

# change this if you changed the dataset name in the


# Number_plates.py script or if you want to change the
# Number of epochs

Python eval.py --dataset_name='number_plates'

Python train.py --dataset_name=number_plates --max_number_of_steps=3000


Evaluate the model

Run the following command from terminal.

5.3 Model Used


ML Kit supports both Android and iOS platforms. The initial step involves Conversion of a
trained Machine learning model kit API into our Project.

25 | P a g e
Figure 17: ML Kit architecture

Machine learning KIT (ML-KIT) is a mobile SDK provided by Google that allows
developers to easily integrate machine learning features into their Android and iOS apps.
ML-KIT offers a range of pre-built APIs and models for tasks like text recognition, face
detection, image labeling, and language identification, making it easy for developers to
add machine learning capabilities to their apps without needing to develop custom
models or algorithms. The architecture consists of three main components:

• Base API: The base API provides common functionality like camera input, model
management, and results output for all MLKIT APIs. It also handles tasks like
image scaling, rotation, and colour conversion.

• Vision API: The Vision API includes a range of pre-built models and algorithms
for tasks like face detection, text recognition, image labelling, and barcode
scanning. It also provides APIs for custom model integration and cloud-based
model hosting.

• Natural Language Processing (NLP) API: The NLP API includes pre-built
models and algorithms for tasks like language identification, sentiment analysis,
and entity recognition. It also provides APIs for custom model integration and
cloud-based model hosting.

Developers can choose to use any combination of these components based on their
app's requirements.

Overall, ML-KIT architecture is designed to be easy to use and integrate into mobile
apps while providing developers with the flexibility to choose the specific machine
learning features they need.

26 | P a g e
CHAPTER – 6

ANDROID DEVELOPMENT

6.1 Android Platform


The Android application was developed by using the Android Studio IDE (Figure 18). The

Android implementation uses the ML Kit library as its machine learning framework.

Figure 18: Android studio IDE

27 | P a g e
The first steps of development on the Android-platform were to create a new project with

Android Studio from an empty template and define application layout, along with adding the

following dependencies to the app-level Grade-file.

OCR in Android devices:

• In this blog, we will learn how to implement OCR in Android applications. To


implement it, we will use [Machine Learning Kit API] that provides an easy

way to integrate OCR on almost all Android devices.

• We explored how Text Detection works it uses machine learning kit API and process
it to convert image into Recognized Text.

• Create a project on Android Studio with one blank Activity. Add the Google Play
services dependency to it:

• Add permission for camera in the manifest file:

• Our main and only Activity file is MainActivity.java and layout xml file is
activity_main.xml. activity_main.xml: We have one Surface View to show the camera
view and one Text View to show the detected text.

• In the Main Activity, check if camera-permission is available or not. If not, request for
it:

• On receiving the permission, create a Text Recognizer object.

• Create a Camera Source object to start the camera.

• Set one processor to the Text Recognizer to detect if any text is available on the
camera screen. We will receive one call back and update the Text View that is on the
camera screen. The code for starting the camera source looks like:

28 | P a g e
Dependency name Dependency source

ML library API 'com.google.android.gms:play-


servicesmlkittextrecognition:18.0.2'

Layout 'androidx.constraintlayout:constraintlayout:2.1.4'

Table 1: Dependency used in the android application

The android app contains three activities:


Class Name Description

Main Activity This is the file where main code activity is written.

Android
This File provides permission of Camera and External Storage.
Manifest

Activity.xml It shows how our app is designed.

Table 2: Classes of the android application

The Choose Model (Main Activity) opens when the user launches the application. The choose

model activity loads the application layout and handles user interactions. On choosing the

model the mobile camera is opened to capture images.

29 | P a g e
6.2 Image Intent: Captured Image

Figure 19: Demo image captured

30 | P a g e
6.3 Model Conversion of image into text: Converted into Text

Following Figure 20: shows how the converted Text from Image.

Open Camera Intent:

Figure 21: Open camera Intent.

After successful capture of the image, you will get to the converted text. In this class the

model is loaded in the background.

31 | P a g e
Data Passing to Model and Conversion:
Now we will be passing the data to the ML Kit model to display the predictions to the screen

Figure 22: Passing date to ML Kit model

6.4 Resulted Application working Images:

32 | P a g e
CHAPTER – 7
CONCLUSION & FUTURE SCOPE

In the thesis we have presented the approach of building a machine learning application for

mobile. We have created an OCR text Recognition with a CNN model which is in-built in

Google Machine Learning Architecture. Utilising multiple base models and OCR text

Recognition we successfully trained and implemented a variety of models that could be used

for real-time classification of objects in an Android application.

This Bachelors’ thesis aimed to improve basic knowledge of machine learning related

processes and techniques. All goals set for this thesis were achieved. A theoretical part of this

thesis answered the questions which were to run a machine learning model in low

computational power computers like mobile phones.

Using the Google Machine Learning Kit API really helped us in achieving the goal of
building a machine learning model. We used CNN as our machine learning model. We use
the ML KIT dataset provided by Google Machine Learning Kit to train and test our model.
After successfully training our model, we got to the accuracy of 99.1% through multiple trial
and error methods.

Soon we would like to take the accuracy from 99.1% to 99.9%. Currently we are clicking
image and converting into text in basic looking app. And also, we add the functionality of
Take images from gallery and convert it into text. We will work it to make a user- friendly
with powerful UI with more functionality.
We cannot have a large size of model in an Android application; it will affect the performance

of the app. To tackle this problem, we will shift the model to the cloud so that there will be no

effect of size of model to the performance of the application.

33 | P a g e
REFERENCES

1. Hardt, Moritz, Price, Eric & Srebro, Nati. “Equality of opportunity in supervised learning.” Advances
in neural information processing systems. 2016.

2. Stuart J. Russell, Peter Norvig (2010) Artificial Intelligence: A Modern Approach, Third Edition,
Prentice Hall ISBN 9780136042594.

3. Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar (2012) Foundations of Machine Learning, The
MIT Press ISBN 9780262018258.

4. S. Geman, E. Bienenstock, and R. Doursat (1992). Neural networks and the bias/variance dilemma.
Neural Computation 4, 1–58.

5. Schantz, Herbert F. (1982). The history of OCR, optical character recognition.


[Manchester Center, Vt.]: Recognition Technologies Users Association. ISBN 9780943072012.

6. Assefi, Mehdi (December 2016). "OCR as a Service: An Experimental Evaluation of Google Docs
OCR, Tesseract, ABBYY FineReader, and Transym". ResearchGate.

7. Ashok Popat (Sep 4, 2015). "IEEE SPS: Optical Character Recognition for Most of the World's
Languages" Retrieved 2021-12-20.

8. Ian Goodfellow and Yoshua Bengio and Aaron Courville (2016). Deep Learning. MIT Press. p. 326.

9. Collobert, Ronan; Weston, Jason (2008-01-01). A Unified Architecture for Natural Language
Processing: Deep Neural Networks with Multitask Learning. Proceedings of the 25th International
Conference on Machine Learning. ICML '08. New York, NY, USA: ACM. pp. 160–167.
doi:10.1145/1390156.1390177. ISBN 978-1-60558-205-4. S2CID 2617020.

10. Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24). "ImageNet classification with
deep convolutional neural networks" (PDF). Communications of the ACM. 60 (6): 84–90.
doi:10.1145/3065386. ISSN 0001-0782. S2CID 195908774.

11. Ed, Burnette (July 13, 2010). Hello, Android: Introducing Google's Mobile Development

Platform (3rd ed.). Pragmatic Bookshelf. ISBN 978-1-934356-56-2..

12. "SDK Tools | Android Developers". Developer.android.com. Retrieved April 25, 2018.

13. Loukas, S. 2020. What is Machine Learning: Supervised, Unsupervised, Semi Supervised and
Reinforcement learning methods. Digital article. Published 10.6.2020. Read 24.8.2021.
https://towardsdatascience.com/what-is-machine-learning-a-short-noteonsupervisedunsupervised -
semi-supervised-and-aed1573ae9bb

34 | P a g e
14. Brahim Elbouchikhi. A Basic Introduction to Google Machine Learning Kit. Digital article. Published
May 09, 2018 Read 02.02.2023.

15. Stephen Perkins. On-device machine learning can make smartphones even better. Published 26.04.2023
https://www.androidpolice.com/google-ml-kit-explainer/

16. Brahim Elbouchikhi. Machine Learning Kit Main page Published 09-05-2018 Read 02.02.2023.
https://developers.google.com/ml-kit

17. Train the model common objects. Last updated 29.03.2022.


https://nanonets.com/blog/attention-ocr-for-text-recogntion/#train-the-model

18. Sorana. Basic Overview of Convolutional Neural Network . Published November 24th, 2020
https://www.analyticsvidhya.com/blog/2020/11/a-short-intuitive-explanation-of-
convolutionalrecurrent-neural-networks/

19. Mike Driscoll. 2019. Jupyter Notebook: An Introduction


https://realpython.com/jupyternotebookintroduction/

20. Anaconda environment and package distribution. 2022. https://www.anaconda.com/

21. Android Studio Download. Updated April 21, 2023 https://developer.android.com/studio?


gclid=Cj0KCQjwmN2iBhCrARIsAG_G2i6z9lw_- fSN18ZP4IisFw2otXn24VJQTllydDkT-
jHbqQRTmXv7j90aAvIFEALw_wcB&gclsrc=aw.ds

35 | P a g e
APPENDIX

Appendix 1. Project source code

The complete source code for Android mobile applications is publicly available on the

GitHub version control service.

Android application source code (written in Java)


https://github.com/Devanshuverma07/Snap_Seeker/tree/master

36 | P a g e

You might also like