0% found this document useful (0 votes)
35 views6 pages

Study of Discretization Methods in Classification

This study aims to compare the effectiveness of discretization methods on classification performance and the efficiency of eight discretization methods. Specifically, it will 1) compare classification using numerical attributes versus discretized attributes, 2) compare the efficiency of eight discretization methods (ChiMerge, Chi2, Modified Chi2, Extended Chi2, CAIM, CACC, Ameva, MDLP), and 3) investigate the suitability of these eight discretization methods when applied to five common classifiers (Neural Network, K-NN, Naive Bayes, C4.5, SVM). The paper is organized to first introduce the eight discretization methods and related work, then describe the datasets and classifiers used before explaining the major steps and reporting results.

Uploaded by

Hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

Study of Discretization Methods in Classification

This study aims to compare the effectiveness of discretization methods on classification performance and the efficiency of eight discretization methods. Specifically, it will 1) compare classification using numerical attributes versus discretized attributes, 2) compare the efficiency of eight discretization methods (ChiMerge, Chi2, Modified Chi2, Extended Chi2, CAIM, CACC, Ameva, MDLP), and 3) investigate the suitability of these eight discretization methods when applied to five common classifiers (Neural Network, K-NN, Naive Bayes, C4.5, SVM). The paper is organized to first introduce the eight discretization methods and related work, then describe the datasets and classifiers used before explaining the major steps and reporting results.

Uploaded by

Hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Study of Discretization Methods in Classification

Lavangnananda, K.1 and Chattanachot, S.2


School of Information Technology (SIT)
King Mongkut’s University of Technology Thonburi (KMUTT)
126 Pra-cha-u-tid Road, Bangmod, Thung-Kru, Bangkok 10140, 7KDLODQG
E-mail: 1 Kitt@[Link] 2 supharoek.c@[Link]

Abstract—Classification is one of the important tasks in Data discretization methods. These are, ChiMerge, Chi2, Modified
Mining or Knowledge Discovery with prolific applications. Chi2, Extended Chi2, Class-Attribute Interdependence
Satisfactory classification depends on characteristics of the Maximization (CAIM), Class-Attribute Contingency
dataset too. Numerical and nominal attributes are commonly Coefficient (CACC), Autonomous Discretization Algorithm
occurred in the dataset. However, classification performance may (Ameva), and Minimum Description Length Principle
be aided by discretization of numerical attributes. At present, (MDLP). The final objective is to investigate how suitable
several discretization methods and numerous techniques for these eight discretization methods perform together with the
implementing classifiers exist. This study has three main commonly known five classifiers. These are, Neural Network,
objectives. First is to study the effectiveness of discretization of
K Nearest Neighbour (K-NN), Naive Bayes, C4.5, and Support
attributes, and second is to compare the efficiency of eight
Vector machine (SVM).
discretization methods. These are ChiMerge, Chi2, Modified
Chi2, Extended Chi2, Class-Attribute Interdependence The paper is organized as follows. Section II introduces the
Maximization (CAIM), Class-Attribute Contingency Coefficient eight discretization methods. Section III presents the related
(CACC), Autonomous Discretization Algorithm (Ameva), and work, this is followed by Section IV where datasets and
Minimum Description Length Principle (MDLP). Finally, the classifiers selected are described. Section V is the section
study investigates suitability of the eight discretization methods where major steps in the study are explained. Results and
when applied to the five commonly known classifiers, Neural discussion are included in Section VI. Finally, the paper is
Network, K Nearest Neighbour (K-NN), Naive Bayes, C4.5, and
concluded in Section VII.
Support Vector machine (SVM).

Keywords—Ameva, CACC, CAIM, ChiMerge, Chi2, C4.5, II. DISCRETIZATION METHODS


Extended Chi2, K-Nearest Neighbour, MDLP, Modified Chi2, Discretization is the process that transforms the continuous
Naive Bayes, Neural Network, SVM
or numerical attribute into the categorical or nominal data. It is
needed in many tasks when using numerical values directly is
I. INTRODUCTION unnecessary from computational perspective or useful
Classification is a subfield in a much larger field of Data information cannot be extracted. For example, ‘recommended
Mining or Knowledge Discovery with prolific applications [1]. exercise duration’ becomes tedious if reported for each age, but
There have been numerous techniques for implementing more comprehensible in ages are expressed in interval format.
classifiers for various tasks. Understanding characteristics of In classification, especially, not only discretization can be
these classifiers is essential to satisfactory classification necessary, but also finding suitable intervals in discretization
performance. Another crucial factor in successful classification can improve its performance considerably.
is the nature of attributes in the dataset. At superficial level, Techniques in discretization can be categorized from two
they can be categorized into 2 types, numerical and nominal perspectives, supervised vs. unsupervised [2] and top-down vs.
attributes, as these two types mostly occur in almost all datasets bottom-up [3]. The supervised discretization considers the class
used in classification. Even though numerical values in an information where as class information may not be available in
attribute may be normalized into zero to one range, using them unsupervised discretization. In another perspective, the top-
directly into a classifier may not always be appropriate. For down and bottom-up are categorized by behavior of the
example, if an attribute ‘temperature’ ranges from 5 to 50 discretization process. The top-down process begins with only
degrees Celsius, theoretically there are infinite values within one interval then separate to many intervals and, on the
this range. Such instances can be detrimental to classification, contrary, bottom-up process starts with many intervals and
especially for some classifiers such as decision trees. reduces the number of interval by merging. In classification,
Therefore, in many cases classification can be aided by the goal of discretization is to make a better classification by
discretization of numerical attributes. To date, several yielding higher accuracy. As classification is a supervised
discretization methods have been developed for many different process, this study investigates supervised discretization
purposes. methods from both top-down and bottom-up perspective. Eight
This study has three main objectives. First is to compare the supervised discretization methods selected for this study are,
classification performance between using numerical values of ChiMerge, Chi2, Modified Chi2, Extend Chi2, CAIM, CACC,
attributes directly with values of attributes after being Ameva, and MDLP. They are briefly introduced as follows :
discretized. Second is to compare efficiency of eight

978-1-4673-9077-4/17/$31.00 ©2017 IEEE

50
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
ChiMerge : This is a bottom-up discretization method ξ C,D = max m1 ,m2 (3)
using the χ2 value to determine the merging point [4]. The
discretization process of ChiMerge begins with sorting the where:
numerical attributes and calculate the χ2 value for every pair of m1 = 1- min c E,D EC* and 0.5<c E,D 
adjacent as shown in equation 1. The pair of adjacent values
m2 =max{c(E,D)|E C* and c E,D < 0.5 },
which has the lowest χ2 value are merged into one interval. The card(E∩D)
2
χ -Threshold is the used as a stopping criterion while merging c E,D = 1- ,
card(E)
process is conducted. C is the equivalence relation set,
D is the decision set,
k (Aij - Eij )
2 C* = { E1 , E2 ,…, En } is the equivalence classes.
= m
i=1 j=1 (1)
E ij CAIM : Class-Attribute Interdependence Maximization
(CAIM) is a top-down discretization proposed by Kurgan and
where: Cios [8]. The main goal of CAIM is to maximize the class-
m = 2 (the 2 intervals being compared), attribute interdependence and to generate the minimal discrete
k = number of classes, interval. CAIM criterion is proposed as a heuristic measure that
Aij = number of examples in the ith interval, jth class, is used to quantify the interdependence between class and the
Eij = expected frequency of Aij =
Ri *Cj
, discrete interval. The larger CAIM value indicates the higher
N interdependence. CAIM method begins discretizing process by
Ri = number of examples in ith interval = kj=1 Aij , select numerical attributes and sorted in ascending then the
Cj = number of examples in jth class = m i=1 Aij , CAIM value is computed using equation 4 for the midpoint for
k
N = total number of examples = j=1 Cj . every pair of adjacent and the cutting point is defined by select
the midpoint that has the highest CAIM value.
Chi2 : This is a bottom-up discretization method which
also use the χ2 to determine the merging point [5]. The process 2
n maxr
of Chi2 is similar to ChiMerge, thus, the χ2 value is also r=1 M+r
calculated by (1). Instead of the user specify χ2-Threshold in CAIM C,D F = (4)
n
ChiMerge, Chi2 introduces an inconsistency rate, used in the
process for determine the proper χ2-Threshold. Discretization in where:
Chi2 also includes feature selection, this makes it perform well
when the dataset contains attributes with many noises. n is the number of intervals,
r iterates through all intervals,
Modified Chi2 : This is a bottom-up discretization method
which stems from Chi2, the χ2 value is also used for determine maxr is the maximum value among within the rth
the merging point [6]. Instead of using the inconsistency rate as column of the quanta matrix,
a stopping criterion in Chi2, the level of consistency (Lc) M+r is the total number of continuous values of
coined from Rough Sets Theory as shown in equation 2. attribute F,
Therefore, the inconsistency checking (In-ConCheck(data) < δ) C is the class variable,
is replaced by (Lc-discretized ≤ Lc-original ) after each step of
D is the discretization variable,
discretization.
F is the attribute.
card(BXi ) CACC : Class-Attribute Contingency Coefficient (CACC)
Lc = (2) is a top-down discretization method [9]. In CACC, contingency
card(U)
coefficient is used to determine the cutting point in discretizing
where: process. To speed up the discretization and to reduce the
number of intervals, the contingency coefficient formula has
U is the set of all objects of the data set, been customized by divide the y with log(n). CACC value can
X can be any subset of U, then be calculated using equation 5.

BX is the lower approximation of X in B (B⊆A),


y'
A is the set of attributes, CACC= (5)
y' + M '
card denotes set cardinality.
where:
Extended Chi2 : This is yet another discretization S n q2ir
extension of Chi2 which uses χ2 value in discretization process y=M i=1 r=1 M M -1
i+ +r
[7]. Its main feature is the ability to deal with uncertain data or
misclassified. It uses the lease upper bound ξ after each step of y' =y/ log (n),
discretization (ξdiscretized < ξoriginal) instead of inconsistency M is the total number of samples,
checking (InCon-Check (data) < δ). Equation 3 is used to
calculate the least upper bound of the dataset. n is the number of intervals,

51
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
qir is the number of samples with class i, C4.5 [5]. Comparison between Chi2 and MDLP was carried
out in [6] while the work in [10] compared Ameva with
Mi+ is the total number of samples with class i,
CAIM. Application of discretization in C4.5 and Naive Bayes
M+r is the total number of samples in the interval (dr-1,dr). using Equal width binning, RMEP, Error based, and SOM
Based showed in improvement in accuracy and reduction in
Ameva : Autonomous Discretization Algorithm (Ameva) is construction time [12]. The work in [8] claimed that CAIM
a top-down discretization based on χ2 statistic [10]. The lower was the discretization that guarantee the lowest number of
number of interval is shown as a main feature of Ameva. Since intervals and the highest dependence between class and
Ameva is a top-down discretization method, the discretization discrete intervals when compared to Equal-Width, Equal-
begins with one interval and repeatedly divide into many Frequency, Paterson-Niblett, Maximum Entropy, CADD, and
intervals by using the Ameva criterion as shown in equation 6. IEM with CLIP4 and C5.0 classifiers. Similar work also
confirm this facts in [13], however, Naïve Bayes and Semi-
X2 (k) Naïve Bayes with IEM discretization were the best for AODE
Ameva k = (6) classifiers in term of accuracy. ChiMerge had been used
k(l-1)
together with ID3 in [4] where MDLP was adopted for neural
where: network classifier in [11]. MDLP was used as the method for
discretization for comparison of different variations of Naive
k is the selected interval,
Bayes classifiers using medical datasets in [14]. Naïve Bayes
l is the number of classes. with Minimal Entropy Partitioning yielded better results than
Equal Width Interval, Holte’s 1R, and Minimal Entropy
MDLP : Minimum Description Length Principle (MDLP) Partitioning discretization [15]. A combination of Naïve Bayes
is a top-down discretization which uses the entropy with LD, NDD, WPKID, and WNDD discretization methods
minimization heuristic to discretize the continuous or numeric were adopted for natural datasets in [16]. Nevertheless, there
data into a range or interval [11]. The process starts with sort a is a study which contradicts the use of discretization by
selected numeric attributes similar to the other top-down suggesting that using real data directly could yield better result
discretization after that every pair of adjacent are computed for for ID3 and C4.5 [17].
the entropy value to find the optimal cutting point to be a Therefore, it can be said that, in almost most cases,
boundary of the new intervals. MDLP is used as a criterion to discretization of numerical attributes leads to better
decide whether the given partition should accept or reject. If performance and higher accuracy in classification. While there
the condition in the equation 7 is true, the partition induced by have been some studies on comparison of discretization
a cut point is accepted or rejected otherwise.
methods previously, they tend to have specific objectives and
with only few types of classifiers. This work is the first work
log2 (N-1) ∆(A,T;S) which attempts to compare popular discretization methods
Gain A,T;S > + (7)
N N together with popular classifiers with particular objectives of
finding suitable combination of discretization method and type
of classifier.
where:
IV. DATASETS AND CLASSIFIERS USED INVESTIGATION
A is the selected attribute,
A. Datasets
T is the cut point,
In order to validate the study, the number of numerical
S is the set of examples, datasets tested ought to be as many as possible with different
variety. Also number of samples in a dataset must be
N is the number of examples.
sufficiently large. Datasets used in this study are taken from
the well known UCI Repository [18]. Attributes in these
III. RELATED WORKS datasets are of numerical values only, which are relatively
Many discretization methods have been introduced and fewer than those that are of nominal or mixed types. Number
widely used in classification. In fact, the use of different of samples in the datasets ranges from 106 to 20,000. Datasets
discretization methods for classification purposes have been vary from as few as 4 to as many as 90 attributes. The number
investigated with different objectives. The discretization of categories also varies from as few as 2 to as many as 30.
methods were compared by taxonomy without experiment Altogether, twenty five datasets are selected, they are
with the classifier in several aspects such as merge or split, use summarized in Table 1.
class information or not, stopping criterion, sensitive to the
outlier, and etc. to suggest the discretization method for the
users in various environment and datasets [3]. Extended Chi2
was found superior to Modified Chi2, and Boolean reasoning
with the VPRS classifier [7]. Chi2 was compared with original
discretized dataset with respect to accuracy and tree size in

52
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
TABLE I. DATASETS USED IN THE STUDY popular and open source classifiers available in public domain.
This study, therefore, selects popular and known classifiers
Dataset No. of No. of No. of which are likely to be chosen when classification is required.
samples attributes categories
Five popular classifiers are selected for the study, these are K-
Nearest Neighbor (K-NN), Naive Bayes, Decision Tree, Neural
1 letter-recognition 20000 16 26
Network, and Support Vector Machine (SVM). These
2 pen-based-recognition 7494 16 10 classifiers represent different characteristics from the basic
classification algorithm (i.e. K-NN) to a mathematical oriented
3 wall-following-robot- 5456 24 4 (i.e. SVM). As each of these five classifiers is available in
navigation public domain, users make an assumption that suitable
discretization is done to numerical attributes in the dataset
4 waveform 5000 21 3 used. While some of these five classifiers, such as neural
network is capable of dealing with numerical values directly. It
5 winequality-white 4898 11 7
is also an objective of this study, as stated in Section I, to
6 winequality-red 1599 11 6
investigate whether discretization of numerical attributes is
preferable to using numerical values directly.
7 yeast 1480 8 10
V. METHODOLOGY
8 banknote- 1372 4 2
authentication This study intentionally chooses publicly available for the
five classifiers stated in Section IV instead of implementing
9 pima-indians-diabetes 768 8 2 them. This is because it is an opportunity to test some of their
default discretization methods too. CRAN-R software is
10 transfusion 748 4 2 selected as the main tool [19] with additional library package
CARET [20] and another classification and regression library
11 user-knowledge- 403 5 5
modeling
e1071[21], klaR[22], and RWeka [23]. The study can be
summarized in three steps as follows :
12 movement-libras 360 90 15
Data Preparation : This is the initial step which comprises
13 leaf 340 15 30
two tasks, replacing missing values and normalization. Each
dataset was scanned and found that some samples had missing
14 e coli 336 7 8 attribute values. Missing values were replaced by the mean of
that attribute in the dataset. After this was done, normalization
15 vertebral-column-3C 310 6 3 was carried out to each attribute of each dataset in order to
reduce bias between the attributes in the datasets. The popular
16 vertebral-column-2C 310 6 2 Z-Score was adopted for normalization.
17 haberman 306 3 2 Discretization : The datasets are now ready for
discretization. For each dataset, eight duplications are made, so
18 glass 214 9 6 that each discretization method can then be applied to the
duplicate dataset. Therefore, for each type of dataset, there are
19 seeds 210 7 3 9 datasets of that type, one is the original with real numerical
attribute values and the other eight datasets are those whose
20 sonar-all-data 208 60 2
attributes are discretized by the eight discretization methods as
21 parkinson 195 22 2 described in Section II. As they are 25 different types of data
used in this study, so there are altogether 225 datasets.
22 planning-relax 182 12 2
Classification : This process is the application of five
23 wine 178 13 3 different type of classifiers as described in Section IV. As there
are 9 datasets for each type of data, each classifier is applied to
24 iris 150 4 3 each dataset, with original numerical attribute values and with
nominal attribute values from eight different discretization
25 breast-tissue 106 9 6 methods. To ensure validity of the classification, the ten-fold
cross validation is used and the process is repeated 5 times for
B. Classifiers each classifier. Therefore each discretization is applied to 25
A classifier in this study is taken to mean any algorithm or datasets and subjected to 5 classifiers. So, 5,625 different
technique which enables classification by means of supervised classifications are carried out in this study.
learning. Selecting appropriate classifier for the task at hand is
essential in order to ensure satisfactory result. At present, many
classifiers exist and new ones are continuingly developed.
Unless it is about developing and researching particular aspects
in classification, scientists and engineers tend to resort to

53
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
Among the six datasets which Chi2 yielded the best
VI. RESULTS AND DISCUSSION performance, classifiers that were used are shown in Table II.
As there are several perspectives where results of the study
can be shown, however it seems tedious to focus results to TABLE II. CLASSIFIERS WHICH CHI2 YIELDED BEST PERFORMANCE
detailed level such as best discretization for each dataset, best
classifier for each dataset and best combination of both for Classifier No. of datasets
each type of dataset. Hence, this Section reveals the key
findings pertaining to the study. C4.5 2

Classification accuracy (i.e. % of correct classification in


Naive Bayes 2
the test phase) is the simplest measure of each classification.
As stated in Section IV, the average of 5 classification SVM 1
accuracies is taken as the representative of the classification
performance. In fact, variation of accuracy among 5 Neural Network 1
classifications in all combinations revealed little discrepancy. It
was found that the default discretization of the C4.5 and Naïve
Bayes classifiers performed poorly and did not yield the best Among the five datasets which MDLP yielded the best
performance from any perspective, therefore, it is omitted in performance, classifiers that were used are shown in Table III.
the discussion. From discretization perspective and regardless
of classifier used, the number of datasets (among the total 25
datasets) where each discretization method yields the best TABLE III. CLASSIFIERS WHICH MDLP YIELDED BEST PERFORMANCE
classification performance is depicted in Figure 1.
Classifier No. of datasets
Fig. 1. Number of datasets where each discretization method yield the best
SVM 2

K-NN 1

C4.5 1

Neural Network 1

Among the four datasets which CACC yielded the best


performance, classifiers that were used are shown in Table IV.

TABLE IV. CLASSIFIERS WHICH CACC YIELDED BEST PERFORMANCE

performance. Classifier No. of datasets

Notice that there are altogether 20 datasets in Figure 1. This C4.5 1


is because there are 5 datasets where using original numerical
values directly yielded best performance than any of the eight Neural Network 1
discretization methods by some classifiers. They are, Neural
Naive Bayes 1
Network which yielded the best performance in three datasets,
while SVM and K-NN yielded the best for one each. Perhaps, SVM 1
because of these three classifiers are originally developed to
deal with numerical values in mind.
Referring to the result in Figure 1, Chi2 and MDLP and Referring to Tables II, III and IV, C4.5 appears to be most
CACC perform relatively well. Chi2 yielded the best suitable for datasets with nominal attributes. From
performance among all the χ2 based methods (AMEVA, discretization perspective, Chi2 and CACC discretization are
ChiMerge, ModChi2 and ExtendChi2). Chi2 also yielded the the most likely choice for discretization in classification using
best overall performance. This is probably because of feature C4.5 following by Naïve Bayes. Note that these two classifiers
selection embedded in the discretization. Since CAIM, were originally developed with nominal attributes in mind. It
AMEVA, ChiMerge, ModChi2 and ExtendChi2 did not mange is interesting to find that SVM appears to be quite a powerful
to yield best performance for more than two datasets, they are classifier, especially when it was developed for numerical
omitted in further discussion. It must be borne in mind that it is attributes.
not the objective of this study to discredit any of the selected
eight discretization methods, since they may well be very
suitable for discretization in other kind of tasks other than
classification.

54
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
[5] H. Liu and R. Setiono, “Chi2: feature selection and discretization of
VII. CONCLUSION numeric attributes,” in Proc. Seventh International Conference on Tools
with Artificial Intelligence (ICTAI’95), 1995, pp. 388–391.
This study aims to investigate eight discretization methods [6] F. E. H. Tay and L. Shen, “A modified Chi2 algorithm for
for the five popular classifiers. It was found that the top three discretization,” IEEE Trans. Knowledge and Data Engineering, vol. 14,
methods that yielded best performance are Chi2 and MDLP no. 3, pp. 666–670, May 2002.
and CACC. None of the eight discretization methods [7] C.-T. Su and J.-H. Hsu, “An extended Chi2 algorithm for discretization
outperformed others by significant margin. Note that of real value attributes,” IEEE Trans. Knowledge and Data Engineering,
discretization of numerical attributes may not always guarantee vol. 17, no. 3, pp. 437–441, Mar. 2005.
better performance, especially when classifiers used are those [8] L. A. Kurgan and K. J. Cios, “CAIM discretization algorithm,” IEEE
Trans. Knowledge and Data Engineering, vol. 16, no. 2, pp. 145–153,
which were originally developed to deal with numerical values
Feb. 2004.
(i.e. Neural Network, K-NN and SVM). Nevertheless, this
[9] C.-J. Tsai, C.-I. Lee, and W.-P. Yang, “A discretization algorithm based
study recommends Chi2 and MDLP as for discretization. on Class-Attribute Contingency Coefficient,” Information Science, vol.
Hence, discretization of numerical attributes by these two 178, no. 3, pp. 714–731, Feb. 2008.
methods ought to be carried out, especially when resort to [10] L. Gonzalez-Abril, F. J. Cuberos, F. Velasco, and J. A. Ortega, “Ameva:
public domain softwares. An autonomous discretization algorithm,” Expert Systems with
Applications, vol. 36, no. 3, Part 1, pp. 5327–5332, Apr. 2009.
With respect to classifiers, this study recommends C4.5 as [11] U. M. Fayyad and K. B. Irani, “Multi-interval discretization of
the first choice for nominal classifier together with either Chi2 continuous-valued attributes for classification learning,” in Proc. the
or MDLP as discretization method. It must be emphasized the 13th International Joint Conference on Artificial Intelligence, 1993, pp.
above are recommendations and are not meant to be absolute 1022–1027.
rule, as the study reveals that the nature of the dataset tends to [12] F. Kaya, “Discretizing continuous features for naïve Bayes and C4. 5
influence the classifier choice as well as discretization. If classifiers,” University of Maryland publications: College Park, MD,
USA, 2008.
discretization of numerical attributes cannot be carried out or is
[13] M. Mizianty, L. Kurgan, and M. Ogiela, “Comparative Analysis of the
disregarded, then classifiers such as Neural Network, SVM and Impact of Discretization on the Classification with Naive Bayes and
K-NN ought to be the preferred choice. Semi-Naive Bayes Classifiers,” in Proc. Seventh International
Conference on Machine Learning and Applications(ICMLA ’08), 2008,
As discretization is an additional process in data pp. 823–828.
preparation, computation complexity of each discretization [14] R. Abraham, J. B. Simha, and S. S. Iyengar, “A comparative analysis of
must be considered, especially in applications when response discretization methods for medical datamining with naïve Bayesian
time is critical as some methods are likely to be more resource classifier,” in Proc. Information Technology, 2006(ICIT’06), 2006, pp.
demanding than others. Future work can also be carried out on 235–236.
lesser known classifiers as it may reveal useful facts in [15] J. Dougherty, R. Kohavi, M. Sahami, and others, “Supervised and
discretization. unsupervised discretization of continuous features,” in Proc. Machine
learning: proceedings of the twelfth international conference(ML-95),
1995, vol. 12, pp. 194–202.
ACKNOWLEDGMENT [16] Y. Yang and G. I. Webb, “A comparative study of discretization
methods for naive-bayes classifiers,” in Proc. The 2002 Pacific Rim
The authors gratefully acknowledge the School of Knowledge Acquistion Workshop(PKAW 2002), 2002, vol. 2002.
Information Technology (SIT), King Mongkut’s University of
[17] D. Ventura and T. R. Martinez, “An empirical comparison of
Technology Thonburi (KMUTT) for partial scholarship for Mr. discretization methods,” in Proc. the Tenth International Symposium on
Supharoek Chattanachot of this research and for the support of Computer and Information Sciences(ISCIS X), 1995, pp. 443–450.
the computing facilities. [18] M. Lichman, UCI machine learning repository. (2013) [Online].
Available: [Link]
REFERENCES [19] R Core Team, R: A language and environment for statistical computing.
R Foundation for Statistical. (2016) [Online]. Computing, Vienna,
[1] J. Han, J. Pei, and M. Kamber, Data Mining: Concepts and Techniques, Austria. Available: [Link]
Third Edition, Elsevier, 2011. [20] M. Kuhn, The Caret Package. (2016) [Online]. Available: [Link]
[2] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning [Link]/web/packages/caret/[Link]
Tools and Techniques, Second Edition, Morgan Kaufmann, 2005. [21] D. Meyer et al., e1071. (2015) [Online]. Avaliable: [Link]
[3] L. Peng, W. Qing, and G. Yujia, “Study on Comparison of [Link]/web/packages/e1071
Discretization Methods,” in Proc. International Conference on Artificial [22] C. Roever et al., klaR. (2014) [Online]. Avaliable: [Link]
Intelligence and Computational Intelligence, 2009(AICI ’09), 2009, vol. [Link]/web/packages/klaR
4, pp. 380–384.
[23] K. Hornik, C. Buchta, and A. Zeileis, “Open-source machine learning: R
[4] R. Kerber, “ChiMerge: Discretization of Numeric Attributes,” in Proc. meets Weka,” Comput Stat, vol. 24, no. 2, pp. 225–232, May 2008.
The Tenth National Conference on Artificial Intelligence(Aaai-92), San
Jose, California, 1992, pp. 123–128.

55
Authorized licensed use limited to: Bauman Moscow State Technical University. Downloaded on September 27,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.

You might also like