Using Machine Learning Techniques For Ev
Using Machine Learning Techniques For Ev
Tomato Ripeness
Nashwa El-Bendary1,4 , Esraa El hariri2,4 , Aboul Ella Hassanien3,4 , Amr
Badr3
1
Arab Academy for Science, Technology, and Maritime Transport, Cairo, Egypt
2
Faculty of Computers and Information, Fayoum University
3
Faculty of Computers and Information, Cairo University, Egypt
4
Scientific Research Group in Egypt [Link]
Abstract
Tomato quality is one of the most important factors that helps ensuring
a consistent marketing of tomato fruit. As ripeness is the main indicator
for tomato quality from customers perspective, the determination of tomato
ripeness stages is a basic industrial concern regarding tomato production in
order to get high quality product. Automatic ripeness evaluation of tomato
is an essential research topic as it may prove benefits in ensuring optimum
yield of high quality product, this will increase the income because tomato
is one of the most important crops in the world. This article presents an
automated multi-class classification approach for tomato ripeness measure-
ment and evaluation via investigating and classifying the different matu-
rity/ripeness stages. The proposed approach uses color features for classify-
ing tomato ripeness [Link] approach proposed in this article uses Prin-
cipal Components Analysis (PCA) in addition to Support Vector Machines
(SVMs) and Linear Discriminant Analysis (LDA) algorithms for feature ex-
traction and classification, respectively. Experiments have been conducted
on a dataset of total 250 images that has been used for both training and
testing datasets with 10-fold cross validation. Experimental results showed
that the proposed classification approach has obtained ripeness classification
accuracy of 90.80%, using one-against-one (OAO) multi-class SVMs algo-
1. Introduction
Fruits and vegetables development is characterized by a short period of
cell division followed by a longer period of cell elongation by water uptake.
The final fruit size mainly depends on initial cell number, rather than cell
size Cowan (2001). Fruit ripening, on the other hand, is characterized by the
development of color, flavor, texture and aroma. The actual time from an-
thesis until full maturity can vary tremendously among species/cultivars due
to genetic and environmental differences. Even between fruit on the same
plant, fruit development and ripening can take more or less time depend-
ing on local microclimate conditions and differences in sink/source relations
within the plant. In addition, when a fruit is harvested, the time of anthesis
of a particular fruit is generally unknown, as is its full history El Hariri et al.
(2014); Lang and Hübert (2012); Wei et al. (2014).
2
Coates and Johnson (1997), tomato belongs to climacteric category as it can
reach over-ripening stage after being harvested. Also, tomatoes have many
different ripeness stages, which are 1) Green, 2) Breaker, 3) Turning, 4)
Pink, 5) Light red and 6) Red stages, so they reach full red color even when
harvested green. Red stage is the most preferred ripeness stage commercially
El Hariri et al. (2014); Camelo (2004).
As has been noted, monitoring and controlling produce (fruits and veg-
etables) ripeness has become a very important issue in the crops industry,
since ripeness is perceived by customers as the main quality indicator. Also,
the product’s appearance is one of the most worrying issues for producers
as it has a high influence on product’s quality and consumer preferences.
However, up to this day, optimal harvest dates and prediction of storage
life are still mainly based on subjective interpretation and practical expe-
rience El Hariri et al. (2014). Hence, automation of that process is of a
great gain for agriculture and industry fields. For agriculture, it may be
used to develop automatic harvest systems and saving crops from damages
caused by environmental changes. On the other hand, for industry, it is used
to develop automatic sorting system or checking the quality of fruits to in-
crease customer satisfaction level Brezmes et al. (2000); Elhariri, El-Bendary,
Fouad, Platoš, Hassanien and Hussein (2014). Accordingly, an objective and
accurate ripeness assessment of agricultural crops is important in ensuring
optimum yield of high quality products. Moreover, identifying physiologi-
cal and harvest maturity of agricultural crops correctly, will ensure timely
harvest to avoid cutting of either under-ripe and over-ripe agricultural crops
El Hariri et al. (2014); Elhariri, El-Bendary, Fouad, Platoš, Hassanien and
Hussein (2014); May and Amaran (2011).
3
Minya city, Upper Egypt. Dataset of total 250 images was used for both
training and testing datasets with 10-fold cross-validation. Training dataset
is divided into 5 classes representing the different stages of tomato ripeness.
The proposed approach consists of three phases; namely pre-processing, fea-
ture extraction, and classification phases. During pre-processing phase, the
proposed approach resizes images to 250x250 pixels, in order to reduce their
color index, and the background of each image has been removed using back-
ground subtraction technique. Also, each image has been converted from
RGB to HSV color space. For feature extraction phase, Principal Compo-
nent Analysis (PCA) algorithm was applied in order to generate a feature
vector for each image in the dataset. Finally, for classification phase, the
proposed approach applied Support Vector Machines (SVMs) and Linear
discriminant analysis (LDA) algorithms for classification of ripeness stages.
Another basic research motivation is that, to the best of our knowledge,
none of the recent ripeness classification related research works have ad-
dressed the dependency of the classification approach performance on statis-
tics of the experimented dataset(s).
So, another contribution of this article is that it highlights the most ap-
propriate classification algorithm considering the dependency of the clas-
sification approach performance on statistics of the experimented dataset.
That has been achieved via adopting the utilization of principal component
analysis (PCA) in addition to Support Vector Machines (SVMs) and Linear
Discriminant Analysis (LDA) algorithms for feature extraction and classifi-
cation, respectively, for tomato ripeness stages evaluation and classification
considering the color features. Also, both training and testing datasets have
been generated via employing the 10-fold cross validation.
An essential finding is that the performance of LDA and SVMs was highly
dependent on statistics of the dataset. That is, on datasets with fewer classes
(ripeness categories), and many training examples per class, SVMs had an
advantage over the LDA classification approach.
The selection of both SVMs classification algorithm depended on the
facts that the application of SVMs classification algorithm has may advan-
tages such as, it deliver a unique solution, it doesn’t need any assumptions
about the functional form of the transformation, because the kernel implicitly
contains a non-linear transformation. Also, if an appropriate generalization
grade was chosen, even when the training sample has some bias, SVMs can
be robust. Moreover, by choosing an appropriate kernel, one can put more
stress on the similarity between samples. However, there is as well some lim-
4
itations for using SVMs algorithm, that is the lack of transparency of results
and the need for very large training time when using large datasets Auria and
Moro (2008). On the other hand, the selection of both LDA classification al-
gorithm depended on its advantages that are LDA has some advantages such
as, the employment of projection that solves the problem of illumination by
maximizing between-class scatter and minimizing within-class scatter and it
need less samples in order to obtain a reliable classifier. However, one com-
mon disadvantage of LDA is the singularity problem as well as it fails when
all scatter matrix are singular Kumar and Kaur (2012).
In general, the limitations we faced in this research are the dataset size
that’s needed to be larger, as the accuracy of SVMs increases by increasing
the number of images per training class, and accordingly a maximum accu-
racy of 90.2% has been achieved.
2. Related work
This section reviews a number of current research approaches that tackle
the problem of ripeness monitoring and classification for tomatoes and other
fruits/vegetables.
First of all, for tomato ripeness classification, various research works have
been proposed. In Zhang and McCarthy (2012), authors offered tomato ma-
turity evaluation approach using magnetic resonance imaging (MRI). For the
proposed approach, MR images were captured for tomatoes that were har-
vested from the field at different maturity stages. Then, for each of the MR
images, the mean and histogram features of the voxel intensities in the region
of interest (RoI) were calculated. Finally, partial least square discriminant
analysis (PLS-DA) algorithm was applied using both the calculated features
5
and maturity classes variables in order to deduce a maturity classification
model showing that different maturity stages are embedded in MR images
signal intensity.
Also, in Baltazar et al. (2008), authors used 128 tomato samples that were
harvested and preliminarily sorted with colorimeter choosing only those with
roughly breaker color, which represent the ripeness stage where there is a def-
inite break in color from green to tannish-yellow. So, they firstly applied data
fusion to nondestructive image of fresh intact tomatoes by assessing both of
colorimeter and nondestructive firmness measurements for the samples at the
selected testing days using two sensors placed at different points. Then, the
measurements data were normalized. Finally, a three-class Bayesian classifier
was applied and the results showed that multi-sensorial data fusion is better
than single sensor data and considerably reduces the classification error.
Moreover, in Polder et al. (2002), authors proposed an approach based on
spectral images analysis to measure the ripeness of tomatoes for automatic
sorting. The proposed approach compared hyper-spectral images with stan-
dard RGB images for classification of tomato ripeness stages. The proposed
classification approach based on individual pixels and includes gray reference
in each image for obtaining automatic compensation of different light sources.
The proposed approach in Polder et al. (2002) applied the linear discriminant
analysis (LDA) algorithm as a classification technique depending on pixels
values and proved that spectral images are better than standard RGB images
for measuring ripeness stages of tomatoes via offering more discriminating
power.
On the other hand, for Oil palm ripeness classification, in Fadilah and
Mohamad-Saleh (2014), authors proposed an automated ripeness classifica-
tion system based on color feature for the problem of oil palm fresh bunch
ripeness classification. They used the color of oil palm fresh bunch as a
ripeness indicator. The proposed system firstly applied image segmentation
using K-means clustering algorithm to separate fruits pixels from spikes ones.
Then, it extracted a hue histogram of 100 bins for each image as feature vector
via applying two different techniques; namely principal component analysis
(PCA) and stepwise discriminant analysis (SDA), for color features reduction
purpose. Finally, it applied Artificial Neural Network (ANN) as classifier to
classify ripeness into four categories: un-ripe, under-ripe, ripe and over-ripe.
Results showed that reducing the color features using stepwise discriminant
analysis improved the performance of classification accuracy by more than
6
10%.
Also, in Bensaeed et al. (2014), authors proposed a hyperspectral-based
classification system for the purpose of classifying the ripeness of oil palm
fresh fruit bunches (FFBs). A dataset of total of 469 fruits for three types of
oil palm FFBs (nigrescens, virescens, oleifera) has been collected from MPOB
farm area at Kluang, Johor, Malaysia to be classified into three ripeness cat-
egories: over-ripe, ripe, and under-ripe. The proposed system firstly scanned
oil palm FFBs using a hyperspectral device. Then, a pixel spectral process-
ing step was performed by applying, background removal, followed by pixel
discrimination (only reflectance data was analyzed). A Low Pass Filter was
applied to data for noise reduction purpose. Finally, it applied artificial neu-
ral network (ANN) as a classifier to classify the different wavelength regions
on oil palm fruit through pixel-wise processing. This system achieved an
accuracy of more than 95% for all three types of oil palm fruits.
Moreover, in Jaffar et al. (2009), authors applied photogrammetric method-
ology in order to depict a relationship between the color of the palm oil fruits
and their ripeness stages, then they have been sorted out physically. That
proposed approach was considered as the first automation form of palm oil
grading systems. Other previous grading systems faced difficulty of using
the average color digital number values at RGB color space for determining
ripeness, due to the fusion of palm fruit images with dirt and branches. The
proposed approach applied the K-means clustering and segmented the fruit
fresh bunches (FFBs) colors in an automated fashion. Then, to differentiate
ripe FFBs from unripe fruits, the computed color value to R/G and R/B
ratios of the digital number of the segmented images was utilized.
Also, in May and Amaran (2011), authors proposed an assessment ap-
proach for ensuring optimum yield of high quality oil in order to overcome
subjectivity and inconsistency of manual human grading techniques based
on experience. Palm ripeness stages were classified into under-ripe, ripe and
over-ripe depending on different color intensity. The developed approach is
an automated ripeness assessment using RGB and fuzzy logic feature extrac-
tion and classification model to assess the ripeness of oil palm. It depended
on color intensity and achieved an accuracy of 88.74%.
Furthermore, in Zhang et al. (2014), authors proposed a method for clas-
sifying harvested dates according to their color. After images capturing,
a threshold segmentation method was applied to separate fruit from back-
ground. From RGB color space, they used only R-G plane, because blue
channel does not give effective information for dates grading. For train-
7
ing stage, to generate 2D histogram (one for each maturity class), the co-
occurrence of every color in R and G channel in each class was counted. After
2D histogram creation, it was normalized. Then, back projection matrix was
generated. For grading stage, After background removal and extraction of R
and G values, a back projection step followed by color index analysis were
performed, then color grading was computed using some statistics. The pro-
posed method in Zhang et al. (2014) achieved good results in addition to not
requiring complicated training process and machine learning algorithms.
8
search approach achieved an accuracy of 86.51%.
Furthermore, there are many research works regarding some other fruits.
In Dadwal and Banga (2012), authors proposed an approach based on color
image segmentation and fuzzy logic technique to classify apples into ripe,
under-ripe and over-ripe categories. The proposed approach based on RGB
color components, where for each fruit four images were captured from dif-
ferent directions. Then, segmentation algorithm was applied to these images
for getting area of interest and the mean value for each color component
(R, G and B) was calculated. Finally, the fuzzy logic approach was applied
to decide the category of apple ripeness depending on mean values of Red,
Green and Blue color components.
In Dolaty (2012), authors proposed an approach based on image processing
for cherries sorting and grading. The proposed approach depended on the
RGB color components of the captured cherry images. It used cherry samples
from four different stages that were collected with an interval of 5 days. The
sorting system of cherries according to their ripeness used color criteria and
the Total Soluble Solids (TTS) in fruit to classify it into the right ripeness
stage. In order to minimize the error rate in calculating the average color
components, the proposed system achieved 92% accuracy in sorting cherries
according to their ripeness stage.
9
then to implement classification. The proposed system achieved 100% accu-
racy in classifying the lime based on their maturity and ripeness.
10
advantage over the LDA classification approach. Therefore, the obtained
experimental results showed that using one-against-one (OAO) multi-class
SVMs algorithm with linear kernel function outperformed using both one-
against-all (OAA) multi-class SVMs algorithm with linear kernel function
and LDA classification algorithms.
3. Preliminaries
This section presents a brief idea concerning the core concepts of PCA
feature extraction algorithm in addition to SVMs and LDA classification al-
gorithms.
11
Algorithm 1 Principal component analysis (PCA) Algorithm
1: Calculate the sample mean µ̄
Pn
Xi
µ̄ = i=1
n
2: Subtract sample mean from each observation Xi
Z̄i = Xi − µ̄
PM ·N
xij
j=1
x̄i = (1)
M ·N
v
u M.N
u 1 X
∂i = t (xij − x̄i )2 (2)
M · N j=1
v
u M.N
u 1 X
Si = t
3
(xij − x̄i )3 (3)
M · N j=1
Where, xij is the value of image pixel j of color channel i (e.g RGB,
HSV and etc..), x̄i is the mean for each channel i=(H,S and V), ∂i is the
standard deviation, and Si is the skewness for each channel Soman et al.
(2012); Singh and Hemachandran (2012). HSV channels can be computed
for RGB channels using equations (4), (5), and (6), where R, G and B are
color component of RGB color space Meskaldji K (2009).
12
1
−1 2
[(R
− G) + (R − B)]
H = cos p (4)
(R − G)2 + (R − B)(G − B)
3[min (R, G, B)]
S =1− (5)
(R + G + B)
R+G+B
V =( ) (6)
3
3.2.2. Color histogram
Color histogram is a color descriptor that shows representation of the
distribution of colors in an image. It represents the number of pixels that
have colors in each range of colors El-Bendary et al. (2011). Color histogram
can be calculated for many color spaces (e.g. RGB, HSV, etc). It is often used
with 3-dimensional spaces like as RGB and HSV color spaces. color histogram
is invariant with rotation, translation, and scale Meskaldji K (2009).
13
n n
X 1X
maximize αi − αi αj yi yj .K(xi , xj ) (7)
i=1
2 i,j=1
n
X
Subject − to : α i yi , 0 ≤ α i ≤ C (8)
i=1
14
Algorithm 3 one-against-one (OAO)
1: CreateN (N − 1)/2 binary SVMs
2: Train N (N − 1)/2 binary SVMs as follow
(1, 2), (1, 3), ..., (1, k), (2, 3), (2, 4), ..., (k1, k).
3: Select the one with the largest vote(The class label that occurs the most).
15
particular data set thereby guaranteeing maximal separability. The
δ|Sb |
maximization of δ|S w|
.
16
4.2. Feature extraction phase
As previously stated, the most important characteristic to assess tomato
ripeness is its surface color, so this system uses HSV color histogram and color
moments for ripeness stages classification. For feature extraction phase, PCA
algorithm is applied as features extraction technique in order to generate a
feature vector for each image in the dataset.
Figure 2: sample images before and after applying background subtraction algorithm
The proposed system transforms the input space into sub-spaces for di-
mensionality reduction. After completing the previous 1D 16x4x4 HSV his-
togram, 16 levels for hue and 4 levels for each of saturation and value are
resulted. In addition, nine color moments, three for each channel (H, S and V
channels) (mean, standard deviation, and skewness), were computed. Then,
a feature vector was formed as a combination of HSV 1D histogram and the
nine color moments.
17
along with the one-against-one (OAO) approach with 10-fold cross valida-
tion for multi-class SVMs problems. In addition to, another classification
algorithm LDA using quadratic discriminant function with 10-fold cross val-
idation.
5. Experimental Results
Simulation experiments in this article are done on a PC with Intel Core
i7 Q720 @ 1.60 GHZ CPU and 6GB memory. The proposed approach is
designed with Matlab running on Windows 7. The datasets used for experi-
ments were constructed based on real sample images for tomato at different
ripeness stages, which were collected from different farms in Minya city. The
collected datasets contained colored JPEG images of resolution 3664x2748
pixels that were captured using Kodak C1013 digital camera of 10.3 megapix-
els resolution. The dataset is of total 250 images were used for both training
and testing datasets with 10-fold cross-validation. Training dataset is di-
vided into 5 classes representing the different stages of tomato ripeness as
shown in Fig. 3. The classes are Green & Breaker, Turning, Pink, Light Red,
and Red stages U.S. Dept. Agric./AMS (1991). For Green & Breaker stage,
green represents the ripeness stage where fruit surface is completely green,
however breaker represents the ripeness stage where there is a definite break
in color from green to tannish-yellow, pink or red on not more than 10% of
the surface. For Turning stage, 10% to 30% of the surface is not green. For
Pink stage, 30% to 60% of the surface is not green. For Light Red stage, 60%
to 90% of the surface is not green. Finally, for Red stage, more than 90% of
the surface is not green. Some samples of both training and testing datasets
are shown in Fig. 4.
The proposed approach has been implemented considering three scenar-
ios; namely
18
Figure 3: Examples of tomato ripeness stages
Figure 5: Results for different SVM kernel functions using one-against-one multi-class
approach and 10-fold cross-validation
tic (ROC) curve and area under curve (AUC) for the best feature resulted
from different kernel functions using one-against-one multi-class SVMs ap-
proach with 10-fold cross-validation and total of 250 images (used for both
of training and testing). The ROC curve separates each class from other
classes. From figure 6, showing the ROC curve for the best feature using
linear kernel function for OAO multi-class SVMs with cross-validation, the
20
Figure 6: ROC curve for the best feature using linear kernel function (OAO multi-class
SVMs with cross-validation), AUC=0.8219
applied approach separated each class from each one of the rest classes by
AUCs shown at Table 1.
Table 1: AUCs of OAO multiclass-SVMs using Linear kernel function & 10-fold cross
validation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.7180 0.8615 0.9424 0.9785
Turning 0.7180 - 0.6649 0.8123 0.9268
Pink 0.8615 0.6649 - 0.6880 0.8652
Light Red 0.9424 0.8123 0.6880 - 0.7615
Red 0.9785 0.9268 0.8652 0.7615 -
From figure 7, showing ROC curve for the best feature using MLP kernel
function for OAO multi-class SVMs with cross-validation, the applied ap-
proach separated each class from each one of the rest classes by AUCs shown
21
at Table 2.
Figure 7: ROC curve for the best feature using MLP kernel function (OAO multi-class
SVMs with cross-validation), AUC=0.7217
Table 2: AUCs of OAO multiclass-SVMs using MLP kernel function & 10-fold cross vali-
dation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.6299 0.6998 0.8084 0.8775
Turning 0.6299 - 0.5985 0.7197 0.8115
Pink 0.6998 0.5985 - 0.6391 0.7521
Light Red 0.8084 0.7197 0.6391 - 0.6810
Red 0.8775 0.8115 0.7521 0.6810 -
From figure 8, showing ROC curve for the best feature using RBF kernel
function for OAO multi-class SVMs with cross-validation, the applied ap-
proach separated each class from each one of the rest classes by AUCs shown
at Table 3.
22
Figure 8: ROC curve for the best feature using RBF kernel function(OAO multi-class
SVMs with cross-validation), AUC=0.8191
Table 3: AUCs of OAO multiclass-SVMs using RBF kernel function & 10-fold cross vali-
dation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.7400 0.8646 0.9411 0.9853
Turning 0.7400 - 0.6658 0.7957 0.9231
Pink 0.8646 0.6658 - 0.6555 0.8460
Light Red 0.9411 0.7957 0.6555 - 0.7735
Red 0.9853 0.9231 0.8460 0.7735 -
From figure 9, showing ROC curve for the best feature using Polynomial
kernel function for OAO multi-class SVMs with cross-validation, the applied
approach separated each class from each one of the rest classes by AUCs
shown at Table 4.
23
Figure 9: ROC curve for the best feature using Polynomial kernel function(OAO multi-
class SVMs with cross-validation), AUC=0.8070
Table 4: AUCs of OAO multiclass-SVMs using Polynomial kernel function & 10-fold cross
validation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.8621 0.9646 0.9627 0.9234
Turning 0.8621 - 0.8093 0.7620 0.5581
Pink 0.9646 0.8093 - 0.5801 0.8601
Light Red 0.9627 0.7620 0.5801 - 0.7874
Red 0.9234 0.5581 0.8601 0.7874 -
24
Figure 10: Results for different kernel functions using one-against-All multi-class approach
and 10-fold cross-validation
From figure 11, showing ROC curve for the best feature using linear
kernel function for OAA multi-class SVMs with cross-validation, the applied
approach separated each class from each one of the rest classes by AUCs
shown at Table 5.
From figure 12, showing ROC curve for the best feature using MLP ker-
25
Figure 11: ROC curve for the best feature using linear kernel function (OAA multi-class
SVMs with cross-validation), AUC=0.8154
Table 5: AUCs of OAA multiclass-SVMs using Linear kernel function & 10-fold cross
validation
Grean &Breaker - 0.9043 0.9669 0.9653 0.9284
Turning 0.9043 - 0.8259 0.8176 0.5294
Pink 0.9669 0.8259 - 0.5361 0.8436
Light Red 0.9653 0.8176 0.5361 - 0.8362
Red 0.9284 0.5294 0.8436 0.8362 -
nel function for OAA multi-class SVMs with cross-validation, the applied
approach separated each class from each one of the rest classes by AUCs
shown at Table 6.
From figure 13, showing ROC curve for the best feature using RBF ker-
nel function for OAA multi-class SVMs with cross-validation, the applied
approach separated each class from each one of the rest classes by AUCs
shown at Table 7.
From figure 14, showing ROC curve for the best feature using Polynomial
kernel function for OAA multi-class SVMs with cross-validation, the applied
26
Figure 12: ROC curve for the best feature using MLP kernel function (OAA multi-class
SVMs with cross-validation), AUC=0.7232
Table 6: AUCs of OAA multiclass-SVMs using MLP kernel function & 10-fold cross vali-
dation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.7334 0.8269 0.9089 0.7826
Turning 0.7334 - 0.6901 0.7289 0.5050
Pink 0.8269 0.6901 - 0.5110 0.7616
Light Red 0.9089 0.7289 0.5110 - 0.7836
Red 0.7826 0.5050 0.7616 0.7836 -
approach separated each class from each one of the rest classes by AUCs
shown at Table 8.
27
Figure 13: ROC curve for the best feature using RBF kernel function(OAA multi-class
SVMs with cross-validation), AUC=0.8100
Table 7: AUCs of OAA multiclass-SVMs using RBF kernel function & 10-fold cross vali-
dation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.7555 0.8565 0.9242 0.9781
Turning 0.7555 - 0.6641 0.7730 0.9242
Pink 0.8565 0.6641 - 0.6091 0.8244
Light Red 0.9242 0.7730 0.6091 - 0.7911
Red 0.9781 0.9242 0.8244 0.7911 -
Figure 15, showing ROC curve for the best feature using LDA system
with cross-validation, the applied approach separated each class from each
one of the rest classes by AUCs shown at Table 9.
From the previously depicted experimental results, we found out that the
One-against-One multi-class SVMs approach is better than the One-against-
All multi-class SVMs and LDA approaches, when applied for ripeness stage
classification. Figure 16 shows a comparison between accuracies obtained by
28
Figure 14: ROC curve for the best feature using Polynomial kernel function(OAA multi-
class SVMs with Cross-validation), AUC=0.7967
Table 8: AUCs of OAA multiclass-SVMs using Polynomial kernel function & 10-fold cross
validation
Green & Turning Pink Light Red Red
Breaker
Grean &Breaker - 0.8178 0.9227 0.9225 0.8771
Turning 0.8178 - 0.8480 0.8196 0.6075
Pink 0.9227 0.8480 - 0 0.5230 0.8368
Light Red 0.9225 0.8196 0.5230 - 0.7924
Red 0.8771 0.6075 0.8368 0.7924 -
29
Figure 15: ROC curve for the best feature using LDA classifier with Cross-validation,
AUC=0.8531
30
Figure 16: Comparison between the classification accuracy of OAA, OAO multi-class
SVMs and LDA approaches
same maturity stage Molyneux et al. (2004); Zhang and McCarthy (2012);
El Hariri et al. (2014). During ripening, tomatoes go through a series of
highly ordered physiological and biochemical changes, such as chlorophyll
degradation and increased activity of cell wall-degrading enzymes, bring on
changes in color, firmness, and development of aromas and flavors El Hariri
et al. (2014); Prasanna et al. (2007). For tomatoes, ripeness issue is often
handled via classifying harvested produce according to discrete color classes
going from immature green to mature red, as stated in some recent researches
that have classified tomatoes in different maturity stages based on measure-
ments of color El Hariri et al. (2014); Hahn (2002); Aranda-Sanchez et al.
31
(2009). Different tomato products have distinct requirements for maturity
to achieve quality standards; hence, tomato maturity is one of the most im-
portant factors associated with the quality of processed tomato products.
In conclusion, the approach proposed in this article has one main re-
search motivation that is providing an automated multi-class classification
approach for tomato ripeness measurement and evaluation via investigating
and classifying the different maturity/ripeness stages based on the color fea-
tures. An essential finding is that the performance of classification algorithms
was highly relative to statistics of the experimented datasets. That is, on
datasets with fewer classes (ripeness categories), and many training examples
per class, SVMs had an advantage over other classification approaches. Many
points of research concerning ripeness classification for tomatoes and other
fruits/vegetables have been addressed by other researchers; however none of
those classification approaches addressed the dependency of the classification
approach performance on statistics of the experimented dataset(s).
The proposed system has three main phases; pre-processing implemented
by applying resizing, background removal, and extracting color components
for each image, PCA based feature extraction applied to each pre-processed
image in order to obtain HSV histogram and color moments feature vectors,
and finally, SVMs and LDA models are generated for ripeness stage classifica-
tion. The proposed approach has been implemented considering two scenar-
ios via applying One-against-One multi-class SVMs system, One-against-All
multi-class SVMs system and LDA system using 10-fold cross-validation.
Based on the obtained the experimental results, the highest ripeness classi-
fication accuracies of 90.80 % and 84.80% have been achieved by the first
32
scenario and the second scenarios, respectively, using linear kernel function
and 84% using third scenario. Thus, it can be concluded that the ripeness
classification accuracy obtained by the OAO multi-class SVMs approach is
better than ripeness classification accuracy obtained by the OAA multi-class
SVMs and LDA approaches.
On the other hand, the limitations we faced in this research are the dataset
size that’s needed to be larger, as the accuracy of SVMs increases by increas-
ing the number of images per training class, and accordingly a maximum
accuracy of 90.2% has been achieved using our current experimented dataset.
33
References
Ada and Kaur, R. (2013). Feature extraction and principal component analy-
sis for lung cancer detection in ct scan images, International Journal of Ad-
vanced Research in Computer Science and Software Engineering 3(3): 187–
190.
Bensaeed, O., Shariff, A., Mahmud, A., Shafri, H. and Alfatni, M. (2014).
Oil palm fruit grading using a hyperspectral device and machine learn-
ing algorithm, IOP Conference Series: Earth and Environmental Science,
Vol. 20, IOP Publishing, p. 012017.
Brezmes, J., Llobet, E., Vilanova, X., Saiz, G. and Correig, X. (2000). Fruit
ripeness monitoring using an electronic nose, Sensors and Actuators B:
Chemical 69(3): 223–229.
34
Camelo, A. F. L. (2004). Manual for the preparation and sale of fruits and
vegetables: from field to market, Vol. 151, Food & Agriculture Org.
Dadwal, M. and Banga, V. (2012). Estimate ripeness level of fruits using rgb
color space and fuzzy logic technique, International Journal of Engineering
and Advanced Technology 2(1): 225–229.
Dolaty, H. (2012). Sorting and grading of cherries on the basis of ripeness, size
and defects by using image processing techniques, International Journal
of Agriculture and Crop Sciences(IJACS) 4(16): 1144–1149.
35
Fadilah, N. and Mohamad-Saleh, J. (2014). Color feature extraction of
oil palm fresh fruit bunch image for ripeness classification, 13th Interna-
tional Conference on Applied Computer and Applied Computational Sci-
ence (ACACOS ’14), pp. 51–55.
Jaffar, A., Jaafar, R., Jamil, N., Low, C. Y. and Abdullah, B. (2009). Pho-
togrammetric grading of oil palm fresh fruit bunches, International Journal
of Mechanical and Mechatronics Engineering 9: 18–24.
Lang, C. and Hübert, T. (2012). A colour ripeness indicator for apples, Food
and Bioprocess Technology 5(8): 3244–3249.
Li, T., Zhu, S. and Ogihara, M. (2006). Using discriminant analysis for multi-
class classification: an experimental investigation, Journal of Knowledge
and information systems 10(4): 453–472.
36
May, Z. and Amaran, M. (2011). Automated ripeness assessment of oil
palm fruit using rgb and fuzzy logic technique, The 13th WSEAS In-
ternational Conference on Mathematical and Computational Methods in
Science and Engineering (MACMESE 2011), World Scientific and Engi-
neering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA,
pp. 52–59.
Polder, G., Van der Heijden, G. and Young, I. (2002). Spectral image analy-
sis for measuring ripeness of tomatoes, Transactions-American Society of
Agricultural Engineers 45(4): 1155–1162.
Shah Rizam, M., Farah Yasmin, A., Ahmad Ihsan, M. and Shazana, K.
(2009). Non-destructive watermelon ripeness determination using image
processing and artificial neural network (ann), International Journal of
Electrical and Computer Engineering 4(6).
37
Soman, S., Ghorpade, M., Sonone, V. and Chavan, S. (2012). Content
based image retrieval using advanced color and texture features, Interna-
tional Conference in Computational Intelligence (ICCIA 2012), Vol. IC-
CIA, pp. 1–5.
Suganthy, M. and Ramamoorthy, P. (2012). Principal component analysis
based feature extraction, morphological edge detection and localization for
fast iris recognition, Journal of Computer science 8(9): 1428–1433.
Suralkar, S., Karode, A. and Pawade, P. W. (2012). Texture image classifi-
cation using support vector machine, International Journal of Computer
Technology & Applications 3(1).
Syal, S., Mehta, T. and Darshni, P. (2013). Design & development of intel-
ligent system for grading of jatropha fruit by its feature value extraction
using fuzzy logics, International Journal of Advanced Research in Com-
puter Science and Software Engineering(IJARCSSE) 3(7): 1077–1081.
Tzotsos, A. and Argialas, D. (2006). A support vector machine approach
for object based image analysis, International Conference on Object-based
Image Analysis (OBIA06).
U.S. Dept. Agric./AMS, Washington, D. (1991). United states standards for
grades of fresh tomatoes.
URL: [Link]
Vanschoenwinkel, B. and Manderick, B. (2005). Appropriate kernel func-
tions for support vector machine learning with sequences of symbolic data,
Deterministic and Statistical Methods in Machine Learning, Vol. 3635,
Springer, pp. 256–280.
Wei, X., Liu, F., Qiu, Z., Shao, Y. and He, Y. (2014). Ripeness classification
of astringent persimmon using hyperspectral imaging technique, Food and
Bioprocess Technology 7(5): 1371–1380.
Wu, Q. and Zhou, D.-X. (2006). Analysis of support vector machine classi-
fication, Journal of Computational Analysis & Applications 8(2).
Xiao, B. (2010). Principal component analysis for feature extraction of im-
age sequence, 2010 International Conference on Computer and Commu-
nication Technologies in Agriculture Engineering (CCTAE), Vol. 1, IEEE,
pp. 250–253.
38
Yu, H., Li, M., Zhang, H.-J. and Feng, J. (2002). Color texture moments
for content-based image retrieval, International Conference on Image Pro-
cessing, Vol. 3, IEEE, pp. 929–932.
Zhang, Y., Xie, X. and Cheng, T. (2010). Application of pso and svm in
image classification, 2010 3rd IEEE International Conference on Computer
Science and Information Technology (ICCSIT), Vol. 6, IEEE, pp. 629–631.
39