0% found this document useful (0 votes)
17 views8 pages

2013 An Experimental Studyof Classification

This study evaluates classification algorithms for predicting crime categories in the USA using a dataset derived from various socio-economic and law enforcement sources. The research compares the performance of Naïve Bayesian and Decision Tree algorithms, finding that the Decision Tree algorithm outperforms Naïve Bayesian with an accuracy of 83.95%. The study highlights the importance of data preprocessing and attribute selection in achieving effective crime prediction models.

Uploaded by

exauce Mundala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

2013 An Experimental Studyof Classification

This study evaluates classification algorithms for predicting crime categories in the USA using a dataset derived from various socio-economic and law enforcement sources. The research compares the performance of Naïve Bayesian and Decision Tree algorithms, finding that the Decision Tree algorithm outperforms Naïve Bayesian with an accuracy of 83.95%. The study highlights the importance of data preprocessing and attribute selection in achieving effective crime prediction models.

Uploaded by

exauce Mundala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/256198632

An Experimental Study of Classification Algorithms for Crime Prediction

Article in Indian Journal of Science and Technology · March 2013


DOI: 10.17485/ijst/2013/v6i3.6

CITATIONS READS
133 8,872

5 authors, including:

Rizwan Iqbal Masrah Azrifah Azmi Murad


Universiti Putra Malaysia Universiti Putra Malaysia
36 PUBLICATIONS 847 CITATIONS 203 PUBLICATIONS 1,943 CITATIONS

SEE PROFILE SEE PROFILE

Aida Mustapha Payam Hassany


Universiti Malaysia Pahang Al-Sultan Abdullah University of Houston
411 PUBLICATIONS 5,613 CITATIONS 13 PUBLICATIONS 434 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Payam Hassany on 21 May 2014.

The user has requested enhancement of the downloaded file.


Indian Journal of Science and Technology

An Experimental Study of Classification


Algorithms for Crime Prediction
Rizwan Iqbal1*, Masrah Azrifah Azmi Murad2, Aida Mustapha3,
Payam Hassany Shariat Panahy4, and Nasim Khanahmadliravi5
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia,
43400 UPM Serdang, Selangor, Malaysia.
[email protected] , [email protected], [email protected],
1

[email protected], [email protected]

Abstract
Classification is a well-known supervised learning technique in data mining. It is used to extract meaningful informa-
tion from large datasets and can be effectively used for predicting unknown classes. In this research, classification is
applied to a crime dataset to predict ‘Crime Category’ for different states of the United States of America. The crime
dataset used in this research is real in nature, it was collected from socio-economic data from 1990 US Census, law
enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. This paper compares the
two different classification algorithms namely, Naïve Bayesian and Decision Tree for predicting ‘Crime Category’ for
different states in USA. The results from the experiment showed that, Decision Tree algorithm out performed Naïve
Bayesian algorithm and achieved 83.9519% Accuracy in predicting ‘Crime Category’ for different states of USA.

Keywords: Crime Prediction, Crime Category, Algorithm.

1. Introduction and society. These characteristics are — different races in


a society , different income groups, different age groups,
The primary goal of data mining is to discover interest- family structure (single, divorced, married, number of
ing and hidden knowledge in the data and summarize kids), level of education, the locality where people live
it in a meaningful form [6, 11, 14]. One of the most (cheap or expensive housing, size of houses, number of
commonly used and important technique in data min- rooms), number of police officers allocated to a locality,
ing is Classification. Classification is a supervised class number of employed and unemployed people and etc.
prediction technique. It allows predicting class labels In this research, a real crime dataset is used for data
which should be nominal [5]. Classification has been mining [13]. The attributes of this dataset are the char-
previously used for many domains including weather acteristics related to a community or a society, some of
forecasting, health care, medical, financial, homeland which already discussed above. The two different classi-
security and business intelligence [9, 11]. This research fication algorithms are used to perform classification on
will focus on applying different classification algorithms the dataset, namely–Decision Tree and Naïve Bayesian.
on the real crime data and compare the accuracy of their By experiment, results of both the algorithms will be
results in predicting the crime categories. compared and studied, and the most efficient algo-
The birth and growth of crime in a community is rithm in predicting the goal class (crime category) will
based on many characteristics related to the ­community be identified. There are many tools available for data

* Corresponding author:
Rizwan Iqbal ([email protected])
4220 An Experimental Study of Classification Algorithms for Crime Prediction

mining, for this research WEKA is chosen. It is an open understanding and intellect. It is practical, especially
source tool written in JAVA [16]. when dealing with a large number of attributes. It was
The organization of this paper is as follows. also taken in account that only those attributes are chosen,
Section 2, covers preliminary discussion and experiment which do not contain any missing values. Classification
preparation, classification methods, UCI communi- was applied using Decision Tree and Naïve Bayesian
ties and crime dataset, crime dataset pre-processing, classifier. In the first step, the model is built on the train-
and measures for performance evaluation. Section 3, ing set with known class label and in the second step;
discusses the experimental results of the classification the proposed model is applied by assigning class labels
algorithms for predicting the ‘Crime Category’ attribute, on the test set. After performing the experiment using
in different states of USA. Finally, section 4 covers con- the above mentioned classification algorithms, accuracy
clusion and future works. of both the algorithms is evaluated, for predicting the
‘Crime Category’ attribute.
2. Preliminary Discussion and
Experiment Preparation 2.1 Crime Dataset Collection
The dataset used for this experiment is real and authen-
Classification is a class prediction technique, which is
tic. The dataset is acquired from UCI machine learning
supervised in nature. This technique possesses the ability
repository website [13]. The title of the dataset is ‘Crime
to predict the label for classes, provided that sufficient
and Communities’. It is prepared using real data from
numbers of training examples are available [10]. There
socio-economic data from 1990 US Census, law enforce-
is a variety of classification algorithms available, includ-
ment data from the 1990 US LEMAS survey, and crime
ing Support vector machines, k Nearest Neighbours,
data from the 1995 FBI UCR [13]. This dataset contains
weighted voting and Artificial Neural Networks. All
a total number of 128 attributes and 1994 instances. All
these techniques can be applied to a dataset for discov-
data provided in this dataset is numeric and normalized.
ering set of models to predict the unknown class label.
The data in each instance belong to different states of
In classification, the dataset is divided into two sets,
the US. The states are represented in the form of number,
namely the training set (dependent set) and a test set
every number representing its respective US state [15].
(independent set). The data mining algorithm initially
The complete details of all 128 attributes can be acquired
runs on the training set, than later the predicting model
from the UCI machine learning repository website [13].
is applied on the test set [5, 12].
For the sake of saving space the list of attributes used in
The dataset used in this experiment contains 128
this experiment are mentioned in Table 1.
attributes. From a large list of attributes, only twelve
attributes are chosen. The chosen attributes are namely
US state, population of community, median household 2.2 Crime Dataset Pre-processing
income, median family income, per capita income, The dataset used for the experiment consists of a total
number of people under the poverty level, percentage of 1994 instances which contain some missing val-
of people 25 and over with less than a 9th grade educa- ues. In order to perform data processing, it is essential
tion, percentage of people 25 and over that are not high to improve the data quality [5]. There are a few tech-
school graduates, percentage of people 25 and over with niques in practice, which are employed for the purpose
a bachelor’s degree or higher education, percentage of of data pre-processing. The techniques are data clean-
people 16 and over in the labour force and unemployed, ing, integration, reduction and transformation [5, 11].
percentage of people 16 and over in the labour force and Before applying a classification algorithm usually some
unemployed, total number of violent crimes per 100K pre-processing is performed on the dataset.
population. In the first step, data reduction is performed by
There are different methods available for attribute or selecting the most informative attributes in a dataset,
feature selection. For this experiment, manual method while attempting to lose no critical information for
was chosen for attribute selection [16] based on human ­classification. Only twelve attributes are selected from

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645
Rizwan Iqbal et al 4221

Table 1. Crime Dataset Attributes

Attributes Data Type Description

State Numeric US state (by number)


population Numeric - decimal Population for community
medIncome Numeric - decimal Median household income
medFamInc Numeric - decimal Median family income (differs from household income for non-family households)
perCapInc Numeric - decimal Per capita income
NumUnderPov Numeric - decimal Number of people under the poverty level
PctLess9thGrade Numeric - decimal Percentage of people 25 and over with less than a 9th grade education
PctNotHSGrad Numeric - decimal Percentage of people 25 and over that are not high school graduates
PctBSorMore Numeric - decimal Percentage of people 25 and over with a bachelor’s degree or higher education
PctUnemployed Numeric - decimal Percentage of people 16 and over, in the labor force, and unemployed
PctEmploy Numeric - decimal Percentage of people 16 and over who are employed
ViolentCrimesPerPop Numeric - decimal Total number of violent crimes per 100K population.
Crime Category Nominal Crime categorization in to three categories, namely Low, Medium, High. GOAL attribute (to be predicted)

a large collection of 128 attributes. There are differ- Crimes Per Pop’ is equal to or greater than 40 percent
ent methods available for attribute or feature selection than the value of ‘Crime Category’ is ‘High’. All the
[16]. For this experiment, manual method was chosen values were added in the newly created attribute care-
for attribute selection [16] based on human understand- fully for the 1994 instances, and cross checked multiple
ing and intellect. It is practical, especially when dealing times by all authors, to eradicate any chances of errors.
with a large number of attributes. It was also taken in
account that only those attributes are chosen which do
not contain any missing values. 2.3 Building Classifiers and Measurements
In the second step, a new attribute was added in the for Performance Evaluation
dataset called ‘Crime Category’. This added attribute is Bayesian classifiers adopt a supervised learning
based on the values of ‘Violent Crimes Per Pop’ attri- approach. They have the ability to predict the prob-
bute, which depicts the total number of violent crimes ability that a given tuple belongs to a particular class
per 100K population. The reason to add this new attri- [1]. The strength of Naïve Bayesian classifier, as a
bute is that, in order to perform prediction, the class powerful probabilistic has been proven for solving clas-
(goal) attribute should be nominal in nature. In the sification tasks effectively [3]. For any given instance,
case of the original dataset, all the original attributes X = (X1 , X 2 , X n ), where, X1 is the value of attribute
are numeric [13], so a new attribute has to be added, to X1 , P C X is calculated by Bayesian classifier for all
enable prediction. As mentioned earlier, this attribute is possible class values C and predicts C* = argma xc p x c
based on the data values in ‘Violent Crimes per Pop’, as the class value for instance X. Hence, estimating a
this dependency also retains the integrity of the dataset. P X C which is proportional to P X C P (C) is the key
The new attribute is just acts, as a means for provid- step of a Bayesian classifier.
ing different nominal labels for the values in ‘Violent Decision Tree is also a famous and commonly used
Crimes Per Pop’, for prediction purposes. predictive model, following the supervised learning
The new added nominal attribute have three values, approach [5], [17]. As the name suggests, Decision Tree
which are ‘Low’, ‘Medium’, and ‘High’. If the value in forms a tree like structure, where each node in the tree
‘Violent Crimes Per Pop’ is less than 25 percent than denotes a test on an attribute value. The leaves repre-
the value of ‘Crime Category’ is ‘Low’, If the value sent classes or class distribution that predict model
in ‘Violent Crimes Per Pop’ is equal to or greater than for classification. The branches represent conjunctions
25 percent and less than 40 percent, than the value of of features, which lead to classes. The tree structure
‘Crime Category’ is ‘Medium’, If the value in ‘Violent ­carries high potential to easily produce classification

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645
4222 An Experimental Study of Classification Algorithms for Crime Prediction

rules for the applied dataset [1]. The algorithm treats Table 2. Confusion Matrix Using Naïve Bayesian
all the dataset as a large single set and then proceeds to Low High Medium
recursively split the set. The algorithm applies the top Low 1198 70 47 1315
down approach to construct the tree until some stopping High 183 172 31 386
criterion is met. Gain in entropy is used to guide the Medium 179 72 42 293
algorithm for the creation of nodes [4, 7].
Both the classifiers used for this experiment have some
Table 3. Confusion Matrix Using Decision Tree
pros and cons. Naïve Bayesian requires short training
time, fast evaluation and is more suitable for real world Low High Medium

problems. If talking about solving complex classification Low 1264 70 19 1315


problems, then Naïve Bayesian is not a recommended High 86 290 10 386
Medium 125 48 120 293
choice. In order to handle complex classification prob-
lems, Decision Tree is a better choice. It can produce
reasonable and interpretable classification trees, which higher performance. Moreover, confusion matrixes
can be used for making critical decisions. However, it for Naïve Bayesian and Decision Tree are shown in
does not work well on all datasets. The results from both Table 2 and Table 3. Figure 1 shows the comparison
the algorithms will be evaluated on three performance between the two algorithms.
measurements, which are defined below: Table 2 illustrates the classification of low, high and
medium classes using Naïve Bayesian algorithm. The
a. Precision and Recall are two significance performance result from the confusion matrix is discussed for each
measures for evaluating classification algorithms [2]. class below:
In this experiment, Precision refers to proportion of There are 1315 items classified in to class Low.
data which is classified correctly using classification
algorithm. Here, Recall refers to percentage of infor- • 1198 of these items are correctly classified into class
mation which is relevant to the class and is correctly Low.
classified. • 70 of these items are wrongly classified into class
b. Accuracy is the percentage of instances which is clas- High.
sified correctly by classifiers [5]. • 47 of these items are wrongly classified into class
c. F-Measure is another performance measure which Medium.
combines Recall and Precision into a single measure
[8]. This measure is commonly used in classification. There are 386 items classified in to class High.

• 172 of these items are correctly classified into class


3. Experiment Results, Analysis and High.
Performance Evaluation • 183 of these items are wrongly classified into class
Low.
In the experiment, a comparison between Naïve • 31 of these items are wrongly classified into class
Bayesian and Decision Tree algorithms was performed, Medium.
over the crime dataset [13]. During experiment, the pre-
processed dataset was converted to .ARFF file, which There are 293 items classified in to class Medium.
is the standard file type for WEKA input [16]. 10 fold
cross-validation was applied to the input dataset in the • 42 of these items are correctly classified into class
experiment, separately for both Naïve Bayesian and Medium.
Decision Tree algorithms. The Accuracy for 10 fold • 179 of these items are wrongly classified into class
cross-validation for Naïve Bayesian and Decision Tree is Low.
70.8124% and 83.9519%, respectively. Hence, Decision • 72 of these items are wrongly classified into class
Tree out performed Naïve Bayesian and manifested High.

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645
Rizwan Iqbal et al 4223

Figure 1. Comparison between different measures of the two algorithms.

Table 4. Accuracy, Incorrectly classified instances, Precision, Recall and F-measure for Both Algorithms

Method Accuracy (Correctly Incorrectly Precision Recall F-Measure


classified instances) classified instances
Decision Tree 83.9519% 16.0481% 0.835 0.84 0.826
Naïve Bayesian 70.8124% 29.1876% 0.664 0.708 0.675

Similarly, Table 3 illustrates the classification of low, There are 293 items classified in to class Medium.
high and medium classes using Decision Tree algorithm.
The result from the confusion matrix is discussed for • 120 of these items are correctly classified into class
each class below. Medium.
There are 1315 items classified in to class Low. • 125 of these items are wrongly classified into class
Low.
• 1264 of these items are correctly classified into class • 48 of these items are wrongly classified into class
Low. High.
• 32 of these items are wrongly classified into class High.
• 19 of these items are wrongly classified into class The results from the confusion matrixes and its
Medium. explanation above, clearly shows that Decision Tree
performed better than Naïve Bayesian. Decision
There are 386 items classified in to class High.
Tree performed better in predicting all the classes,
• 290 of these items are correctly classified into class namely Low, Medium and High. Table 4 illustrates the
High. Accuracy (correctly classified instances), incorrectly
• 86 of these items are wrongly classified into class classified instances, Precision, Recall and F-measure
Low. for both the algorithms used in the experiment. Figure 1
• 10 of these items are wrongly classified into class shows a comparison between different measures of the
Medium. two algorithms. The values of Accuracy, precision and

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645
4224 An Experimental Study of Classification Algorithms for Crime Prediction

Figure 2. Percentage comparison of correctly and incorrectly classified instances.

Recall for Decision Tree is almost the same. Whereas, set and evaluate their prediction performance. Another
in Naïve Bayesian the Accuracy and Recall is almost direction for future work is to use other techniques for
the same but the value of precision is a little less. The feature selection, and study their effect on the prediction
percentage comparison for correctly classified instances performance of different algorithms.
for the two algorithms is demonstrated in Figure 2.

5. References
4. Conclusions and Future Work
1. Batchu V, Aravindhar D J et al., (2011). A classifica-
This paper presents a comparison between two clas- tion based dependent approach for suppressing data,
sification algorithms namely, Decision Tree and Naïve IJCA Proceedings on Wireless Information Networks &
Bayesian for predicting the ‘Crime Category’ attri- Business Information System (WINBIS 2012),
bute, having labels, namely ‘Low’, ‘Medium’, and Foundation of Computer Science (FCS).
‘High’. For Decision Tree, the Accuracy, Precision and 2. Cios J, Pedrycz W et al., (1998). Data Mining in
Recall are 83.9519%, 83.5% and 84%. On the other Knowledge Discovery, Academic Publishers.
hand, Accuracy, Precision and Recall values for Naïve 3. Geenen P. L, van der Gaag L C et al., (2011). Constructing
Bayesian are 70.8124%, 66.4% and 70.8%, respectively. naive Bayesian classifiers for veterinary medicine: A case
Experimental results for both the algorithms manifest study in the clinical diagnosis of classical swine fever,
that, Decision Tree performed better than the Naïve Research in Veterinary Science, vol 91(1), 64–70.
Bayesian for the crime dataset, using WEKA. This 4. Hamou A, Simmons A et al., (2011). Cluster analysis of
experiment was performed using 10-fold cross- valida- MR imaging in Alzheimer’s disease using decision tree
tion. It is evident that law enforcing agencies can take refinement. International Journal of Artificial Intelligence,
great advantage, using machine learning algorithms like vol 6(S11), 90–99.
Decision Tree to effectively fight crime and war against 5. Han J, and Kamber M (2006). Data mining: concepts and
terrorism. For future research, there is a plan to further techniques, Morgan Kaufmann Publishers, San Francisco,
apply other classification algorithms on the crime data CA.

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645
Rizwan Iqbal et al 4225

6. Kochar B, and Chhillar R (2012). An Effective Data Journal for Advances in Computer Science, vol 2(1),
Warehousing System for RFID using Novel Data Cleaning, 26–31.
Data Transformation and Loading Techniques. Arab 12. Selvaraj S, and Natarajan J (2011). Microarray data
Journal of Information Technology, vol 9(3), 208–216 analysis and mining tools, Bioinformation, vol 6(3),
7. Kováč S (2012). Suitability analysis of data mining tools 95–99.
and methods, Bachelor’s Thesis. 13. UCI Machine Learning Repository (2012). Available
8. Kumar V, and Rathee N (2011). Knowledge discovery from: http://archive.ics.uci.edu/ml/datasets.html
from database using an integration of clustering and clas- 14. Wahbeh A H, Al-Radaideh Q A, et al., (2011). A comparison
sification, International Journal of Advanced Computer study between data mining tools over some classification
Science and Applications, vol 2(3), 29–32. methods, International Journal of Adv­anced Computer
9. Li G, and Wang Y (2012). A privacy-preserving Science and Applications, Special Issue, 18–26.
­classification method based on singular value decompo- 15. WikiPedia (2012). WIKIPEDIA, Available from: http://
sition, Arab Journal of Information Technology, vol 9(6), en.wikipedia.org/wiki/List_of_U.S._states_by_area.
529–534. 16. Witten I, Frank E et al. (2011), Data Mining: Practical
10. Ngai E W T, Xiu L et al., (2009). Application of data min- Machine Learning Tools and Techniques. Morgan
ing techniques in customer relationship management: A Kaufmann.
literature review and classification, Expert Systems with 17. Xu Y, Dong Z. Y et al., (2011) A decision tree-based
Applications, vol 36(2), 2592–2602. on-line preventive control strategy for power system
11. Santhi P, and Bhaskaran V. M (2010). Performance of transient instability prevention, International Journal of
clustering algorithms in healthcare database, International Systems Science.

www.indjst.org | Vol 6 (3) | March 2013 Indian Journal of Science and Technology | Print ISSN: 0974-6846 | Online ISSN: 0974-5645

View publication stats

You might also like