0% found this document useful (0 votes)
39 views9 pages

OHKWR Offline Handwritten Kannada Words Recognitio

The document presents a study on Offline Handwritten Kannada Words Recognition using a Support Vector Machine (SVM) classifier combined with Convolutional Neural Networks (CNN). It addresses the challenges of recognizing Kannada script due to its complex character set and proposes a character segmentation algorithm to improve Optical Character Recognition (OCR) systems. The methodology includes preprocessing steps, feature extraction, and classification, ultimately demonstrating improved recognition rates compared to existing methods.

Uploaded by

sanjeevkunte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views9 pages

OHKWR Offline Handwritten Kannada Words Recognitio

The document presents a study on Offline Handwritten Kannada Words Recognition using a Support Vector Machine (SVM) classifier combined with Convolutional Neural Networks (CNN). It addresses the challenges of recognizing Kannada script due to its complex character set and proposes a character segmentation algorithm to improve Optical Character Recognition (OCR) systems. The methodology includes preprocessing steps, feature extraction, and classification, ultimately demonstrating improved recognition rates compared to existing methods.

Uploaded by

sanjeevkunte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075 (Online), Volume-9 Issue-10, August 2020

OHKWR: Offline Handwritten Kannada Words


Recognition using SVM Classifier with CNN
Ramesh G, Sandeep Kumar N, Champa H. N

various languages. Recognizing the scripts of South Indian


Abstract: In field of handwriting recognition, Robust languages is very difficult and involves a lot of effort in
algorithms for recognition and character segmentation are account due to its compound characters and large character
presented for multilingual Indian archive images of Devanagari set among the Indian languages. Therefore, there is a good
and Latin scripts. These report basically suffer from their format
organizations, low print and local skews quality and contain
scope of research and tremendous demand for optical
intermixed messages (machine-printed and manually written). In character and word recognition system development that
order to overcome these drawbacks, a character segmentation includes handwritten documents. To identify the text in
algorithm is proposed for kannada handwriting recognition. In handwritten or typed papers, the technique of OCR (Optical
this work, in initial steps we are obtained the segmentation paths character recognition) is used. The need for OCR system
by using the characters of structural property and also the graph development also raises the prevalence of web and
distance theory whereas overlapped and connected character are
separated. Finally, we are calculated results by using the SVM
multimedia techniques. There are several attempts that have
classifier. In proposed recognition of character, they are three taken place to build an efficient OCR system to identify
new geometrical shapes based on new features such as center handwritten characters from Kannada script. The design of an
pixel of character is obtained by first and second feature and third effective and stable OCR system requires a lot of efforts and
feature is calculation purpose we are used in neighborhood several steps to be followed. The OCR system included the
information of text pixels. Benchmarking results represent that steps of pre-processing, extraction and classification
proposed algorithms have best work identified with other
contemporary methodologies, where best recognition rates and
functions. These steps must be taken care in a predetermined
segmentation are obtained. manner in order to achieve the result. Preprocessing is the
procedure where the raw image is transformed into an
Keywords: Convolutional Neural Network, Computer Vision,
appropriate processed image. This processed image will be
character recognition, Word recognition, SVM classifier.
used as an input for extracting the features from the image. In
I. INTRODUCTION the preprocessing cycle, suitable methods are used to
methodologically scrutinize the raw image given as an input.
The most critical process in the OCR system development is
C omputer Vision is said to be fast-growing and the extraction of the element. The Convolution Neural
technically emerging field of computer science that is making Network and Support Vector Machine classifier is used to
its own path into different domains. Developing a generic and recognize Kannada handwritten words.
efficient recognition system for handwritten text is an In image processing, a digital image is being collected
important challenge in the field of Computer vision that from a determinate number of fundamental elements in
meets the interest of a wide range of applications. Reading which each individual element has an assessment. The
sign boards, translation of different scripts, banks, legislative elements are technically referred to as picture elements or
bodies, offices, areas of literature, assistance to blind, image elements or pixels. The image which is digitized can
archaeological applications are the main areas of handwriting then be displayed on a high-resolution monitor. For the image
recognition application. By using the handwriting to be displayed, it needs to be stored in a buffer memory with
recognition method huge number of achive can be efficiently rapid-access. The most critical process in OCR system
carried out. One such application is identifying the scripts of development is the extraction of the element. The
convolution neural network and support vector machine
classifier is used to recognize Kannada handwritten words.
Revised Manuscript Received on August 30, 2020. The fourth major language of south India is kannada
* Correspondence Author language. Its spoken by around 50 million people in
Ramesh. G,*, Department of Computer Science and Engineering,
University Visvesvaraya College of Engineering, Bangalore, India. E-mail: Karnataka, Andhra Pradesh, Tamilnadu and Maharastra. A
[email protected] low quality process of segmentation results in
Sandeep Kumar N, Department of Computer Science and Engineering, mis-recognition or faulty segmentation. Because of the
University Visvesvaraya College of Engineering, Bangalore, India. E-mail:
[email protected] existence of conjunct consonant characters. Pre-processing
Champa H. N, Department of Computer Science and Engineering, consist of removing noise, identifying skews, recognizing
University Visvesvaraya College of Engineering, Bangalore, India. E-mail: regions, thicking, thinning and binarizing. Segmentation
[email protected]
process involves two steps: segmentation of the word and
© The Authors. Published by Blue Eyes Intelligence Engineering and segmentation of the character.
Sciences Publication (BEIESP). This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 458 and Sciences Publication
OHKWR: Offline Handwritten Kannada Words Recognition using SVM Classifier with CNN

Word segmentation is the process where only words are Organization: The rest of the paper is organized as
extracted from the pre-processed image. Since we know that follows, Initially, we discuss the Related works on in Section
from one word to another there is a gap, for word II. Problem Statement and Objectives are discussed in
segmentation, we use concept of vertical projection profile Section III. Proposed System are discussed in Section IV.
[4]. Character segmentation is the method in which we Section V discusses the Feature Extraction. Classification is
remove only characters from word segmentation. explained in section VI. Implementation is examined in
Segmentation of characters being an important aspect in Section VII. Section VIII contains Results. Conclusions in
the identification of character, the process of character Section IX.
segmentation various experts has worked easily. In English
language the character set quantity is 26, while Kannada II. LITERATURE SURVEY
character set consonant, vowel consolidation is 35x16= 560.
R.S. Kunte et al., [2] were the first persons to report
The number of vowel and consonant mixes that can be
research on handwritten recognition of Kannada Online.
imagined is 35x35x16= 19600. The bend shape of the 19600
Kunte used a wavelet function in 2000 and also identified
Kannada character mix makes the recognition system more
MILE LAB's efforts to manually write and create the
and more complex.
100,000-word dataset for each Kannada and Tamil to
establish an online character recognition system. The
segmentation of character is a most important pre-processing
step in all OCR systems for recognition of characters.
Segmentation of character has been a well-researched area
over the previous decade and its main point has been to bring
the optical character recognition (OCR) calculations to
individual characters. Yungang Zhang et al., [3] have
presented the character segmentation estimate, using Hough
adjustment for a License plate character segmentation. For
the segmentation and identification tasks of character
recognition, researchers have made several different
Figure. 1: Kannada Character set approaches. In case of segmentation, only less number of
Kannada is one among all the 22 official languages of researchers have used Artificial Neural Networks [ANN].
India. Kannada is recognized at 33rd place in the list of Blumenstein1 et al., [7] has presented a new smart technique
extensively spoken languages entire the world. The state for segmentation that can be used together with a neural
Karnataka Kannada is extensively spoken and used as the network classifier and a very simple lexicon to identify
official language. In modern Kannada or the Hosagannada difficult handwritten words. Prasad Mahadeva et al., [8] have
they are 49 base characters with 10 numerals. Sample of these used KNN’s main component analysis classifier and reported
handwritten numerals are shown in Figure 1. We can see an average 81 percentage of recognition rate. Ramakrishan et
numerous structural features in the character set of this al., [9] have used Support Vector Machine as a classifier for
language. One of the difficult understanding with the the recognition of characters using dataset that is provided by
Kannada character recognition is the recognition of the MILE Lab and recorded 56% accuracy of preliminary
characters which are comparable fit as a shape. These segmentation (PS) output and accuracy of 62% Attention
comparable characters are having little varieties among them Feed-Based Segmentation (AFS) output. Niranjan Jhosi et
and they assume a significant job in the recognition accuracy. al., [10] was presented demonstrated the correlation of elastic
Due to its structured unpredictability, Kannada content is matching schemes and the adaptive time distance warping
complex when compared with Latin-based languages. test for Tamil characters based on online handwriting
Furthermore, Kannada language has 49 letters, as shown in recognition. S. Karthik et al., [21] have introduced another
the Figure 1, All 49 letters are further divided into 3-groups, methodology depending on deep learning technique with the
Vowels, Visarga (15), Consonants (34) and modifier glyphs distributed average of gradients feature is introduced for the
(Half-letter). 15 vowels are used to modify the 34 base recognition of manually written characters of Kannada which
consonants, making a total of (34* 15) + 34=544 characters, brought about 97.04% accuracy. Ramesh et al., [22] has
however, modifiers are used. This gives an aggregate of proposed the Deep learning technique, ANN’s have also been
(544*34) +15=18511 characters, examples of modifiers used claiming a considerable increase in results,
appeared in the Figure 2. Disadvantages of CNN which are solved with the use of
Capsule Networks. Magnificent outcomes have been
acquired as far as accuracies. Further new methods, for
example, CNN have been reported for to be recognized to
Kannada numerals. Ramesh et al., [23] have presented the
work on a significant scale devises and the state-of-the-art
technology, deep learning, for transcribed character
recognition, using the
convolutional neural Network.

Figure 2. Kannada Consonant conjuncts (vattakshara)

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 459 and Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075 (Online), Volume-9 Issue-10, August 2020

CNN have been known to have performed quite well, on threshold value where the total of foreground and
the vintage grouping issue in the field of computer vision. background spreads is at its less.
The execution of the network on two unique methodologies 2) Noise removal
with the dataset. The acquired accuracy. This step is performed to remove the noise with the technique
called Median filtering. This technique reduces maximum
III. PROBLEM STATEMENT AND OBJECTIVES background noise from the image.
The problem considered in the proposed work is to recognize 3) Normalization
kannada handwritten words. The technique for changing the images that are of irregular
The objectives considered are: measured images to standard-sized images is called
1. To achieve better accuracy in training and validation. normalization. This method is utilized to dispose of variety
2. To achieve better specificity of words and characters. between characters between groups. Before to the procedure
3. To maintain higher precision ratio. of normalization, all the additional blank areas in the image
4. To increase higher sensitivity while recognizing are evacuated. In conclusion, the given information image is
characters. normalization to a standard 32x32 resolution.
5. To obtain highest segmentation and recognition rates. 4) Thinning
This operation of thinning is performed by removing the
IV. PROPOSED SYSTEM
binary-valued image regions to lines approximating the
This section describes the proposed Kannada word and region's skeletons to make the image crisper. The images that
character recognition technique of unconstrained handwritten are preprocessed are ready for use in further phases of
words. Initially, the connected component is used to analyze extraction and classification of features.
the CCA and Vertical Projection Profile (VPP) algorithm for
detecting all the components in a word image. Scanner V. SEGMENTATION
digitizes this collected data collection on a 300-dpi flatbed Segmentation is a strategy used to segment the image into
HP scanner which typically yields low noise and good report many segments. A two-level segmentation is carried out
image quality. These pictures were manually cropped and using the method of the vertical projection profile and the
stored as gray frames. Image binarization is performed using bounding box. The method of the projection profile is also
the global thresholding method of Otsu and is stored in the called projection or vertical projection. The two
format of bmp files. The digitized images typically contain segmentations are discussed below, such as word
noise in the digitization of the actual input due to inaccuracies segmentation and character segmentation.
and erratic hand movements. The noises present in the
images are eliminated by using median filter. All the above A. Word Segmentation
procedures are briefly explained in following subsections and Segmentation is the step where it analyses the distance from
Figure 3. one word to another. In word segmentation technique, a text
line has taken as an input. After a content line is separated, it
is checked vertically. On the off chance that in one vertical
output two or less dark pixels are encountered, at that point
the is indicated by 0, else the output is signified by the
number of dark pixels. In this way a vertical projection
profile is built. Figure 4 displays the Kannada script vertical
projection.

Figure 3: Proposed System Architecture of Handwritten


Kannada Word Recognition
A. Pre-Processing
Pre-processing is the method to convert the data image to a
picture that is better suited for extracting features. As
pre-processing methods slant correction, binarization, skew
detection and noise reduction and morphological operations Figure 4: Vertical projection of words
are used. The methods for preprocessing are explained B. Character Segmentation
below. Skew detection of images and correction: Fourier
transform [15] is used to perform this operation. The word segmentation contributes to word separation and
separate sub-pictures are generated we extract just characters
1) Binarization
from word. Character segmentation is very difficult step of
This process is basically used to convert all the gray scale OCR system as it extricates meaningful areas for analysis.
images into binary images through a method of global Otsu This method breaks down the image into classifiable units
threshold approach. Otsu's thresholding strategy includes called character.
iterating through all the possible threshold values and
computing a measure of spread for the pixel levels each side
of the edge, for example the pixels that either fall in
background or foreground. The point is to discover the

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 460 and Sciences Publication
OHKWR: Offline Handwritten Kannada Words Recognition using SVM Classifier with CNN

Segmentation is very poor division process leads miss B. Wavelet Transform


recognition or dismissal segmentation process completed Worldwide highlights are separated utilizing Wavelet
simply after the preprocessing of the image. transform. Wavelets are numerical functions that cut up
Wherever a character is segmented, a bounding box information into various recurrence segments and analyze
technique is applied for the segmented character to every part with a goal coordinated to its scale. Wavelet
demonstrate the segmentation of characters. The transform are confined in both real and Fourier space
representation of bounding box technique for character contrasted with Fourier transform. The wavelet changes are
segmentation is showed in Figure 5. utilized to break down the image at various frequencies with
various goals. Multi resolution analysis, break down the sign
at various frequencies with various resolutions, to separate
the signal into a lot of signals, so on are a portion of wavelets.
C. Combined Features
A lot of 21 features are getting as structural feature and put
away in 'vector 1'. A numerous of 128 features are getting as
worldwide feature utilizing Wavelet transform and put away
in 'vector 2'. At last, the 'vector 1' and 'vector 2' are joined to
frame a solitary list of capabilities (149 highlights). These
Figure 5: Character segmentation represented in features set are utilized in the arrangement stage to perceive
bounding box. the manually written in Kannada numerals.

VI. FEATURE EXTRACTION VII. CLASSIFICATION


The feature extraction is the issue of extracting from raw Despite choosing suitable features, to understand a solid
information, the data which is generally significant for order character recognizer, it is crucial to choose a fair classifier.
purposes, in the sense of limiting the inside class design The use of SVM with a straight bit as the classifier in the tests
pattern while improving the between-class design pattern. to plan and check the function vectors. Separate SVM models
great highlights must fulfill the accompanying requirements: were designated for each Kannada character class using
First, intra-class variance must be little. Furthermore, the feature vectors obtained from 64,811 tests (approx. 500
inter class partition ought to be enormous. So as to recognize examples for each class). In addition, insights into the
numerous varieties of a similar character, includes that are perspective ratios (width to tallness ratio) of the examples of
invariant top certain transformations on the character should each group used and calculated during segmentation with the
be utilized. At that point, based on data contained in the goal of testing.
element vector [10], the model recognition system
characterizes that word from the populace. In feature A. Convolution Neural Network (CNN
extraction by extracting they are global transformation and Convolutional Neural Network (CNN) is a standard class of
structural features. In structural features are depend on artificial feed-forward neural networks which are very
geometrical and topological properties of the character, such commonly implemented in areas like computer vision, for
as branches, joints, curve, end point, aspect ratio, loop, line, issues such as object or a character recognition. CNN's
crossing point, are obtained. Wavelet transform are utilized distinction from a network of "flat" multilayer perceptron’s
transformation features. (MLP) is its use of convolutionary pooling layers, and non
linearity like ReLU, tanh, sigmoid for example, the
A. Structural features
convolution layer (referred to as CONV) where it has filters
The preprocessed image is partitioned into four quadrants of 5 * 5 * 1 (5 pixels for both width and also height, and 1 for
Q1, Q2, Q3 and Q4 as appeared in figure 6. From every the images which are in grayscale). The major work of the
quadrant, a set of five features, for example, corner detection, CONV layer of CNN is to "slide" input image through its
relationship, quadrant thickness, perspective proportion is width and height dimensions, calculating the width and
separated. This procedure is connected for all the four height product of the input image region and the parameters
quadrants. In this manner, 20 features are gotten from the of learning the weight properties. This will in effect generate
four quadrants. One more features, for example, width an activation map of 2-dimensional consisting of filter
feature is removed from the image as whole. Thus, 21 feature responses at certain regions. As described in Figure 7, the net
are acquired as structural features and put away as 'vector 1'. shows a typical architecture of Convolutional Neural
Network for handwritten word recognition. This consists of a
multi-layer array. The user input is initially combined with a
several different set of filters (C hidden layers) to get the
feature map values. Next, to decrease the dimensions and size
of the input data (S hidden layers) of the feature map's spatial
resolution, a sub-sampling layer pursues each convolution
layer. Convolutionary layers are the alternate way of
sub-sampling layers and they further extract features to
obtain discriminating
Figure 6: Structural Features Extraction properties or information from
the user provided raw images.

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 461 and Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075 (Online), Volume-9 Issue-10, August 2020

Finally, output layer and two fully connected layers (FCL)


followed those layers.
Every layer that is currently in process takes the output of
the previous layer as the source.

Figure 7: CNN architecture consisting of layers for


featured maps applied for recognition of kannada words
and characters.
Figure 8: Architecture of CNN Training and Testing
Here implementing a base CNN model with the following using SVM.
architecture:
Implementation of a base CNN model with the following
architecture:

Eventually, the trigger function is applied to implement to


compute non-linearities. Without this, the template can know
just linear mapping. The widely utilized trigger feature of
present generation is the ReLU function [8]. Which is widely
implemented over sigmoid and tanh because it has been
B. SVM Classifier
observed that it accelerates the process of converging
stochastic gradient descent relative to the other two functions Support Vector Machine is a powerful discriminating
[11]. In comparison, compared to the comprehensive classifier prasented by Vapnik[15] and Cortes [16]. It is very
computation needed by tanh and sigmoid, ReLU is commonly used for various pattern recognition /
implemented simply by thresholding matrix values at zero, in classification activities with good results [17]. It is known to
Figure 8,9. be the state of the art method for the resolution of non-linear
The L2-SVM is introduced at the 10th level for CNN instead or linear problems of classification in Figure 10, [15], due to
of the traditional softmax method with the cross-entropy its parsimony, mobility, global optimum character and
feature (for computational loss). In other terms, the output statistical potential.
will be converted into the following case and will be checked
by { −1, + 1}. The parameters of weight are then taught using
Adam optimizer [10].

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 462 and Sciences Publication
OHKWR: Offline Handwritten Kannada Words Recognition using SVM Classifier with CNN

Structural risk minimization is basis of their formulation VIII. IMPLEMENTATION


rather than empirical risk minimization, which is used in
Recognition of handwritten words and character is the
artificial neural networks [15]. SVM is essentially used to
important subject in OCR implementation and also in the
evaluate an optimal hyperplane separation (equation 1) or
field of pattern recognition. All the kannada scripts that
judgment.
contains handwritten content are distinct, because of its
varying font size and style. These handwritten documents are
ƒ(x ) = WT Φ ( x ) + b (1)
created without restrictions on paper, pen, flow of ink, color
of ink, size, etc. For testing, the data set was generated as
Where W ϵ Rn , b ϵ R and Φ(x) is a feature map. Because the
there is no data set which is standard. It is considered to be
linearly inseparable property of feature space, transformation
neatly typed documents containing Kannada script. For the
by correctly mapping the user provided input data (xi, yi) into
purpose of research, a minimum of 50 handwritten
a larger dimensional feature space by using a nonlinear
documents were considered. The proposed template is built
operator (x). As a consequence, the optimal hyperplane can
on the Windows 10 platform using Python and Django.
be described as :
Figure 12 shows the samples of handwritten documents
ƒ(x) = sgn( Σ yiαi K(xi , x) + b ) (2)
containing Kannada script collected from different writers
which are written in their own handwriting having various
Where K ( xi, x ) = exp ( -γ|| xi – x || 2) The kernel form based
age groups. All the images that were gathered the filtered
on the radial base function (RBF) and sgn is the sign variable.
with a resolution of 300dpi.
Another classifier template named the RBF kernel SVM is
applied to remove the last output layers of the CNN software
to conduct the Arabic handwritten text classification.

Figure 12. Handwritten custom kannada dataset.


Figure 10: Support Vector Machine principle; (a)
hyper-plane of two-class example, (b) one-versus-all Table 1 shows the accuracy achieved for segmentation phase.
method Word segmentation is achieved with an average accuracy of
97.5 percent. But accuracy cannot be calculated at character
level because the number of segmented characters is greater
than the total number of characters, which is 300 the total
number of characters available in 50 documents and 700 the
number of segmented characters. The explanation for this is
that the Kannada script consonant modifiers are combined
with one of the characters to form compound characters, and
these compound characters often occur quietly in this
language. Segmentation of such compound characters is
complicated and the treatment of compound characters
requires a different perspective.

IX. RESULTS
As all the parameters are considered in calculating
Figure 11: SVM pseudocode accuracy of handwritten word and character recognition we
also consider the training and validation of both Accuracy
▪ Input: set of (input, output) preparing pair tests; call the and loss. The only difference between them is the accuracy is
information test features x1, x2… xn, and the output a measure of actual correct recognition of words and
result y. Commonly, there can be various of input feature characters and loss is the measure of incorrect recognition,
xi. while we train our neural network correct data should be
▪ Output: set of weights w (or wi ), one for each feature, provided, so that the neural network can understand similar
whose straight predicts the estimation of y. type of inputs very easily with high efficiency. As in the
▪ Important Difference: The strategy utilized of proposed method the accuracy of each input is above 85%
enhancement of maximing the edge ('road width') to and the left out is considered as loss. Figure 13 shows the
decrease the quantity of weights that are nonzero to only accuracy graph for the training and validation and Figure 14
a not many that relate to the important features. This is shows the loss for training and
significant as that 'matter’ choosing the isolating line, validation.
these nonzero loads relate to the help vectors.

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 463 and Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075 (Online), Volume-9 Issue-10, August 2020

➢ Sensitivity: Sensitivity is the proportion of the actual


positive that the classifier correctly identifies as positive.
Figure 16 shows the sensitivity for each characters
where they are separated by words.
Sensitivity=TP/(TP+FN) (4)

Figure 13: Training vs Validation Accuracy in CNN

Figure 16: Sensitivity for the words Taalidavanu and


Figure 14. Training vs Validation Loss in CNN Baaliyaanu
➢ Accuracy: This is the shortest indicator of performance.
Figure 15 shows the accuracy for each and every ➢ Specificity: Specificity refers to the ability of the
characters of both the word thalidavanu and baaliyaanu, classifier to distinguish negative outcomes. Figure 17
when we calculate the overall accuracy of both words we shows specificity of word and characters where it
have achieved 98%. includes the details of how specific is a character or word
when it comes to recognition of handwritten words.
Accuracy = (TP+TN) / (TP + TN + FP + FN) (3) Specificity= TN/ (TN + FP) (5 )

Figure 17: specificity for the words Taalidavanu


and Baaliyaanu

Figure 15: Accuracy for the words Taalidavanu and


Baaliyaanu

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 464 and Sciences Publication
OHKWR: Offline Handwritten Kannada Words Recognition using SVM Classifier with CNN

➢ Precision: Precision is a measure of retrieving the


relevant instances. Figure 18 is the representation of Table II Comparison of segmentation in proposed
correctly recognized similar characters and words in method with the Existing methods
handwritten word recognition. Authors
Segmentation Size of Accuracy in
Precision =TP / ( TP + FP ) Technique Data set Percentage
(6) Morphological
R Prajna operations,
100 86%
[17] projection
profile
Horizontal
Wangshe
Projection 204 84%
ng [18]
Profile
Parul
Projection
Sahare 800 96%
Profile
[1]
Modified
projection
Proposed
profile, 50 97.5%
Method
connected
component.

Table III Proposed Work with Previous Works


Authors Method Accuracy
Karthik et SVM+H
96.41
al. [21] OG
Karthik Deep
et al. Belief 97.04
2018 [12] Networks
FCDF +
Parul
FCCF +
Sahare et
NCF 99.84
al., 2018
(Characte
[1]
r)
Figure 18: Precision for the words Taalidavanu and CNN+SV
Proposed 97.81
Baaliyaanu M (Word)

X. CONCLUSION AND FUTURE ENHANCEMENT


In this article, we proposed the Segmentation technique is
depending on the method of Vertical Projection Profile. This
process includes pre-processing steps, then they are
subdivided into individual words of entire line. As given in
the table, if the user input is an image with text with words
that had an enough spacing between them, a segmentation
accuracy of 97 percent was achieved. With a precision of 97
percent, when slightly spaced terms are segmented. But for
closely spaced text columns, a low precision was obtained.
Figure 19: Represents all the parameters in a Bar graph This is because of the overlapping characters and much less
consisting of its percentages correspondingly space between words. The accuracy related to recognition of
word was found to be more than 80 percent for each of the
Table I Accuracy Achieved in Segmentation correctly segmented words, but in cases where the word
Total number of Words Available in 50 spacing between two words were very small segmentation
500
documents accuracy is less. For the identification and recognition of the
Number of Words Segmented 470 handwritten kannada words and characters, a neural
network-based kannada character recognition system was
Accuracy of Word segmentation 97% introduced. The pixel information obtained from the
Total number of Characters Available in 50 characters that are resized were used directly to train the
1100 neural network using image
documents
processing techniques.
Number of Characters Segmented 1050
Average Accuracy of Word Segmentation 97.5%

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 465 and Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075 (Online), Volume-9 Issue-10, August 2020

and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-8


Issue-6, August, 2019
Similar to other forms of recognition schemes, the system 23. Ramesh. G, J. Manoj Balaji, Ganesh. N. Sharma, Champa H.N
that is proposed will therefore be less complex. Several “Offline Kannada Handwritten Character Recognition Using
neural network architectures are used to classify the Convolutional Neural Networks” IEEE-WIECON, 2019
handwritten kannada words and characters. Future
AUTHORS PROFILE
enhancement for the already implemented system will be
recognition of paragraphs where there will be an extra layer Ramesh G is currently a Research Scholar in the
of segmentation that includes Line Segmentation with an Department of Computer Science and Engineering,
existing word and character segmentation. University Visvesvaraya College of Engineering
(UVCE), Bangalore University, Bangalore. He has
completed his B.E and M.Tech from Vishveswaraya
REFERENCES
Technological University (VTU), Karnataka. All the
1. Parul Sahare and Sanjay B. Dhok “Multilingual Character degrees are in Computer Science and Engineering
Segmentation and Recognition Schemes for Indian Document (CS&E) discipline. He has published papers in
Images”. IEEE, 2018. International Reputed Journals and International Conferences. He has
2. R Sanjeev Kunte and R D Sudhaker Samuel "A simple and efficient attended various FDP programs. His current research lies in the areas of
optical character recognition system for basic symbols in printed Image Processing, Machine learning, Deep learning.
Kannada text" 32(5):521-533 · October 2008.
3. Celine Mancas - Thillou, Bernard Gosselin Facult´e Polytechnique de
Sandeep Kumar N. is currently pursuing his Master of
Mons, Avenue Copernic 1, 7000 Mons, Belgium. “Character
Engineering in Web Technology from University
Segmentation-by-Recognition Using Log-Gabor Filters” August 2006.
Visvesvaraya College of Engineering (UVCE),
4. Yungang Zhang Changshui Zhang “A New Algorithm for Character
Bangalore University, Bangalore. He has completed his
Segmentation of License Plate” July 2003.
B.E in Computer Science and Engineering from Rava
5. T V Ashwin and P S Sastry “A font and size-independent OCR system
Institute Technology affiliated to Visvesvaraya
for printed Kannada documents using support vector machines” Vol.
Technological University (VTU). His area of interest
27, Part 1, February 2002.
includes Image processing, Networking, Cybersecurity,
6. Rohana K. Rajapakse, A. Ruvan Weerasinghe and E. Kevin
Seneviratne "A Neural network based character recognition system for Cloud security.
sinhala script”
7. M. Blumenstein1 and B.Verma1 "Neural-based Solutions for the Dr. Champa H N has completed Bachelor of
Segmentation and Recognition of Difficult Handwritten Words from a Engineering, Masters of Technology and Doctoral
Benchmark Database" January 1995. Degree in Computer Science and Engineering. She has
8. M. Mahadeva Prasad, M. Sukumar, and A. G. Ramakrishnan, "Divide 30 years of teaching experience. Currently she is
and conquer technique in online handwritten Kannada character Professor in the Dept. of CSE, University
recognition”, Proceedings of the international workshop on Visvesvaraya College of Engineering, Bangalore. She
multilingual OCR. ACM, 2009. has over 20 research papers to her credit. She is
9. Ramakrishnan, A. G., and J. Shashidhar. "Development of OHWR currently guiding 04 Ph.D Students. Her research
system for Kannada." VishwaBharat@ tdil 39 (2013): 40. interests include Image processing, Artificial
10. Joshi, Niranjan, et al. "Comparison of elastic matching algorithms for Intelligence, Machine learning and Database systems.
online Tamil handwritten character recognition." Frontiers in
Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International
Workshop on. IEEE, 2004.
11. Oivind Due Trier, Anil.K.Jain and Torfinn Taxt, Feature Extraction
Methods for Character Recognition – A Surve, July 1995.
12. Y. LeCun, F.J. Huang, L. Bottou, “Learning methods for generic object
recognition with invariance to pose and lighting,” Proc. Computer
Vision and Pattern Recognition Conference (CVPR), IEEE Press,
2004.
13. M. Ranzato, F. Huang, Y. Boureau, Y. LeCun, “Unsupervised learning
of invariant feature hierarchies with applications to object
recognition,” Proc. Computer Vision and Pattern Recognition
Conference (CVPR), IEEE Press, 2007.
14. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol.
86(11), pp. 2278-2324, 1998.
15. V. Vapnik, “Statistical Learn Theory,” John Wiley, New York, 1998.
16. C. Cortes, V. Vapnik, “Support vector networks,” Machine Learning,
vol. 20, pp. 273-297, 1995.
17. C. Burges, “A tutorial on support vector machines for pattern
recognition,” Data Mining Knowledge Discovery, vol. 2(2), pp.
121-167, 1998.
18. R Prajna, Ramya V R, Mamatha H R “A Study of different Text Line
Extraction Techniques for Multi-font and Multi-size Printed Kannada
Documents 1-2 Aug. 2014.
19. Wangsheng Zhu, Qin Chen, Chuanyi Wei, and Ziyang Li “A
segmentation algorithm based on image projection for complex text
layout” 05 October 2017.
20. Yifan Jiang ; Hyunhak Shin ; Hanseok Ko “Precise Regression for
Bounding Box Correction for Improved Tracking Based on Deep
Reinforcement Learning” 2018.
21. S. Karthik and K. S. Murthy, “Deep belief network based approach to
recognize handwritten kannada characters using distributed average of
gradients,” Cluster Computing, pp. 1–9, 2018.
22. Ramesh. G, J. Manoj Balaji, Ganesh. N. Sharma, Champa H.N
“Recognition of Off-line Kannada Handwritten Characters by Deep
Learning using Capsule Network” International Journal of Engineering

Retrieval Number: G5821059720/2020©BEIESP Published By:


DOI: 10.35940/ijitee.G5821.0891020 Blue Eyes Intelligence Engineering
Journal Website: www.ijitee.org 466 and Sciences Publication

You might also like