Cybercrime Forcasting Using Data Mining Technique
Preeti, Rajesh Kumar
Student, Asst. Professor
Gurgaon Institute of Technology and Management, MDU University, Rohtak, Haryana India
Gurgaon, Haryana, India.
Email: [email protected], [email protected]
critical, new and special problems of crime, although the
Abstract— In this paper authors presented about the
crime problem is as old as man himself. In addition to
Crime data mining a latest emerging area in field of this, the techniques employed to commit crime are new
information security. Paper also include the complete in the sense that they make use of modern knowledge
survey of all the mining methodologies available along and technique. The rise in crime both national and
with the of data mining steps involved in the crime. Crime international is generally thought as the result of
can be national or international but its always a interplay between socio-economic changes. The
distractive process in the society. circumstances surrounding the individual offender such
Index Terms :Crime data mining, Precision Recall , as his personality, physical characteristics intelligence,
Hotspots, Techniques ,CRISP-DM methodology. family background, environmental surrounding such as
peer groups, neighbors etc have been subject of the
study of crime. (Andargachew, 1988). So by,
I. INTRODUCTION understanding the attributes of criminals will be helpful
to design and implement proper crime prevention
Crime is identified as an act which is punishable by strategies. The Governments usually establish
legislation in accordance with Thakur. However, an organizations such as courts, prosecutions and police,
act that is considered as a crime in one place and time which are responsible for the maintenance of law and
may not be true in another place or time. According to order in their respective country. These agencies and
Andargachew (1988), a criminal is an individual person other related organizations are responsible to curb the
who has violated the legally forbidden act. In fact, there rate and occurrence of crimes. The crime prevention
are some factors that have to be taken into account to agencies need to issue and implement crime prevention
convict whether a person should be considered as a strategies:
criminal or not. Among these, an individual should be of
competent age in light with the law ; and there must be a ds the life and property of
well-predefined punishment for the particular act the society whom the authorities are in duty to
committed. protect.
Offense has increasingly become as complex as human
nature. Contemporary technological improvement and bodily and mental.
huge development in communication have facilitated
criminals of every place of the planet to spend a crime follows along the way of sensing a crime.
applying advanced equipment in one single place and
then escape to a different place. Now adays the globe
is facing the proliferation of problems such as for difficulty of producing crime at all strange
example illicit drug trafficking, smuggling, hijacking, hours of the afternoon and evening and of using
kidnapping, and terrorism. immediate activity for the investigation.
The level of crime also depends upon the situation and II. DATA MINING
also varies from state to state .
Data Mining could be the computational procedure for
Crime Prevention exploring patterns in large information sets involving
practices at the junction of synthetic intelligence,
The causes for the growing rate of crime include
machine understanding, data, and repository programs.
unemployment, economic backwardness, over
The entire goal of the info mining process is always to
population, illiteracy and inadequate equipment of the
remove information from a information collection and
police force. The form of seriousness and size of the
convert it in to an understandable framework for further
crime, may rely on the form of a society and thus its
nature changes with the growth and development of the
social system. In every generation it has its own most
_______________________________________________________________________________________________
_______________________________________________________________________________________________
use. Besides the organic examination step, it requires B. Empty Grid Cells
repository and information management aspects,
Empty grid cells need to be taken from the datasets
information pre-processing, product and inference
because they have a detrimental yet counter instinctive
considerations, interestingness metrics, complexity
area effect. They enhance the efficiency of the
considerations, post-processing of found structures,
classifiers. It is simple for almost any given classifier to
visualization, and on the web updating.
precisely estimate that nothing may happen in an empty
The actual information mining job could be the grid cell. That ―intelligence‖ is really artificial. An
automated or semi-automatic examination of large empty grid cell is defined as missing any rely for the
amounts of information to remove previously unknown reason that cell in some of the investigated classes
interesting patterns such as for instance categories of around the entire schedule being analyzed. Many empty
information records (cluster examination), unusual grid cells have two explanations. One, the limits of the
records (anomaly detection) and dependencies city aren't rectangular like the grid getting used is, and
(association concept mining). That usually requires two, there are many places within the city limits such as
applying repository methods such as spatial indices. for example airport runways, bodies of water, and
These patterns will then be seen as a kind of overview of community start spaces wherever these activities only
the feedback information, and may be used in further don't happen. The result is empty grid cells that have to
examination, for instance, in machine understanding and be removed.
predictive analytics. For example, the info mining step
2. Handling Information
may recognize multiple communities in the info, which
will then be properly used to acquire more precise One challenge in offense prediction, just like different
prediction effects by a choice support program. Neither unusual occasion prediction, is that locations and cool
the information variety, information preparation, or places are unbalanced. That's cool places are a whole lot
effect model and revealing are part of the information more widespread than hotspots. Inside our dataset, that
mining step, but do fit in with the general KDD process is especially true with the bigger quality 41-by-40 grid.
as additional steps. This research paper contain the It has the consequence of puzzling the necessary
following sections: Data Generation that describes the measures of detail, recall, and F1. In particular, the F1
data set ; Handling of information; techniques involved report of locations is far less than the F1 report of cool
in Data Mining. places as the classifiers are properly qualified on cool
spots. The computation on F1 report inside our examine
1. Data Generation
is defined the following:
The research data was gleaned from multiple cities
F1= (2*precision*Recall)/ Precision + Recall
agencies. Every real data entry is a record for an crime
or related event. Each record contains the type of crime , Where,
the location of crime in longitude and latitude, and time
Precision = TP / (TP+FP)
- date of the crime incident happened . Before beginning
with data mining , a preprocessing is required to make it Recall = TP / (TP+FN )
suitable for classification.
Where,
A. Data Grid
TP= predicts the true Hotspots i.e., no. Of true positives
For the deployment of this crime prediction model the
police-department requirement is to forecast the crime FP= predicts the false Hotspots i.e., no. Of false
such as residential burglary over space and time. positives
Accordingly, across a uniform grid the model classifies FN= predicts the false Coldspots i.e., no. Of false
burglaries monthly. The city is divided into negatives
checkerboard-like cells by the help of grid. Now each
To solve this matter, we adjust the weight of hotspots
cell contain data combined into six categories namely
Arrest, Residential Burglary, Commercial Burglary , and cold spots. By raising the weight of hotspots on the
basis of the proportion between hotspots and coldspots,
Motor Vehicle Larceny and Street Robbery,
Foreclosure. On a monthly basis each cell is populated. the information set may be balanced ahead of the
classification process. The weight function is identified
The researched data was of two resolutions . The first
measure is 24-by-20 square grid cells and the other by these:
measure is 41-by-40. The cells in the 24-by-20 grid
measure distance is one-half mile square. In 41-by-40
grid, the distance measure is over one-quarter mile
square. In both cases, data set is a matrix on monthly
basis of the six earlier mentioned categories. The two where,
resolutions as finer resolution make grid to be
C = Total number of coldspots and
interrogated with more detail toward the inherent spatial
information in the dataset. Conversely, lower resolution H = total number of hotspots
has effect of generalizing the spatial knowledge.
_______________________________________________________________________________________________
_______________________________________________________________________________________________
III. DATA MINING IN CRIME IV. CRIME DATA MINING TECHNIQUES
Most law enforcement agencies today are faced with By increasing performance and lowering errors, offense
large volume of data that must be processed and data mining practices can aid police function and permit
transformed into useful information (Brown, 2003). investigators to spend their time to other useful tasks. A
Data mining can greatly improve crime analysis and aid number of the practices are standard and some are
in reducing and preventing crime. Brown (2003) stated currently in used .The flow graph of practices is show
"no field is in greater need of data mining technology below that assist in showing the practice involved in
than law enforcement." One potential area of application Crime Data Mining as follows:
is spatial data mining tools which provides law
FLOW CHART:
enforcement agencies with significant capabilities to
learn crime trends on where, how and why crimes are
committed (Veenendaal and Houweling, 2003). Brown
(2003) developed a spatial data mining tool known as
the Regional Crime Analysis Program (ReCAP), which
is designed to aid local police forces (e.g. University of
Virginia (UVA), City of Charlottesville, and Albemarle
County) in the analysis and prevention of crime. This
system provides crime analysts with the capability to sift
on data to catch criminals. It provides spatial, temporal,
and attribute matching techniques for pattern
extraction.
Data mining is just a powerful software that permits
offender investigators who may possibly absence
considerable training as data analysts to investigate big
listings rapidly and efficiently.
Table 1explains some types of offense, such as for
example traffic violations and arson, primarily problem
police at the town, district, and state levels.
Table 1: Crime data national and international level
Entity Extraction determines unique styles from
knowledge such as for example text, images, or sound
materials. FIt has been used to instantly identify
individuals, addresses, vehicles, and particular faculties
from police narrative reports. In pc forensics, the
removal of pc software metrics including the
information design, program movement, organization
and level of remarks, and usage of variable names- can
help more research by, for instance, group related
applications published by hackers and searching their
behavior. Entity Extraction gives simple information for
crime analysis, but their performance depends greatly on
the availability of extensive levels of clear insight data.
Clustering methods group knowledge goods in to
courses with related faculties to maximize or reduce
intraclass similarity- for instance, to recognize suspects
who perform violations in related methods or separate
_______________________________________________________________________________________________
_______________________________________________________________________________________________
among groups belonging to different gangs. These and intangible things and information, and associations
methods do not have some predefined courses for among these entities. More analysis can show important
assigning items. Some experts utilize the statistics-based roles and subgroups and vulnerabilities in the network.
idea place algorithm to instantly connect different things This approach permits visualization of criminal
such as for example individuals, companies, and networks, but investigators however mightn't manage to
vehicles in crime records. Using link analysis methods uncover the network's true leaders when they hold a
to recognize related transactions, the Financial Crimes reduced profile.
Enforcement Network AI Program exploits Bank
Similarity Measures :Whether two entities are similar is
Secrecy Act knowledge to guide the detection and
semantically dependent on application and is defined by
analysis of money laundering and different economic
the user. There are different similarity measures for
crimes. Clustering crime incidents can automate a major
different types of data. For quantitative data, we can use
part of crime analysis but is limited by the high
Euclidian distance, Minkowski distance and other
computational depth an average of required.
measures to measure the similarity. For qualitative
Association rule mining finds often occurring product attributes, a simple and commonly used approach is
sets in a repository and gifts the styles as rules. That binary similarity measure. Suppose ai and bi are the
method has been applied in system intrusion detection to values of the i-th attributes of A and B respectively. Let
obtain association rules from consumers' connection si (A, B) denote the similarity on the i-th attribute
history. Investigators can also use this method to system between A and B. si(A, B)=1 if ai=bi and 0 if ai≠bi. In
criminals' users to help find possible future system this way, qualitative data can be converted into
attacks. Much like association rule mining, consecutive quantitative data and some similarity measures for
sample mining finds often occurring sequences of goods quantitative data can be used. If the sets have a weighted
around some transactions that happened at different structure, the similarity is defined by taking into account
times. In system intrusion detection, this approach can the values of weights wi:
identify intrusion styles among time-stamped data.
Featuring concealed styles benefits crime analysis, but
to obtain significant results involves rich and very
structured data.
Now We look into current methodologies for crime data
Deviation detection employs unique actions to study
knowledge that varies markedly from the remaining mining, which are available in current crime data mining
literature. CRISP-DM methodology (CRISP-DM: Cross-
portion of the data. Also called outlier detection,
investigators can use this method to fraud detection, Industry Standard Process for Data Mining) like
system intrusion detection, and different crime analyses. SEMMA methodology (SEMMA: Sample, Explore,
Nevertheless, such activities will often seem to be Modify, Model, Assess) refers to more general process
of data mining. CIA intelligence methodology refers to
standard, rendering it difficult to recognize outliers.
life cycle of converting data into intelligence, which is
Classification finds frequent attributes among different also a well-known methodology. Van der Hulst's
crime entities and organizes them in to predefined methodology is specifically developed for criminal
classes. That method has been used to recognize the networks, including specific steps for identifying and
origin of e-mail spamming based on the sender's analysing criminal networks. Last but not the least,
linguistic styles and structural features. Often used to AMP A(Actionable Mining and Predictive Analytics)
predict crime tendencies, classification can lower the methodology is developed by McCue for better
full time needed to recognize crime entities. understanding of crime data mining.Table2 include
Nevertheless, the method requires a predefined details of available methodologies.
classification scheme. Classification also involves fairly
complete instruction and testing knowledge must be
high degree of missing knowledge could restrict forecast
accuracy.
String comparator methods assess the textual fields in
sets of repository files and compute the likeness between
the records. These methods can find misleading
information—such as for example name, handle, and
Cultural Safety number-in criminal records.
Investigators may use string comparators to analyze
textual knowledge, but the methods often require
intensive computation.
Social system analysis explains the roles of and
relationships among nodes in a conceptual network.
Investigators can make use of this method to create a
system that shows thieves roles, the movement of real
_______________________________________________________________________________________________
_______________________________________________________________________________________________
V. CONCLUSION: [5] D.E. Brown, S.C. Hagen. 2003. ―Data
association methods with applications to law
In this paper author presented the systematic method of enforcement. Decision Support Systems‖, 34
crime detection at national and international level. As
(4): 369- 378.
crime data is increasing now to control the crime is
again become a difficult task so to solve this problem [6] Bao, H (2003). ―Knowledge Discovery And
author is presenting a systematic way of mining crime Data Mining Techniques And Practice‖.
data detection classification in such a way so that it’s http://www.netnam.vn/unescocourse/knowlegd
become easy to solve the crime problem throughout the e/3-1.htm
world. [7] George Kelling and Catherine Coles. Fixing
Broken Windows: Restoring Order and
REFERENCES Reducing Crime in Our Communities, ISBN: 0-
[1] U.M. Fayyad and R. Uthurusamy, ―Evolving 684-83738-2.
Data Mining into Solutions for Insights,‖
Comm. ACM, Aug. 2002, pp. 28-31. [8]. P. Chapman, J. Clinton, R. Kerber, T. Khabaza,
T. Reinartz, C. Shearer, R. Wirth, "CRISP-DM
[2] W. Chang et al., ―An International Perspective 1.0 step-by-step data mining guide", Technical
on Fighting Cybercrime,‖ Proc. 1st NSF/NIJ report, The CRISP-DM Consortium,
Symp. Intelligence and Security Informatics, http://www.crispdm. orglCRlSPWP-0800.pdf],
LNCS 2665, Springer-Verlag, 2003, pp. 379- August 2000.
384.
[9]. Thakur, C. (2003).‖ Crime Control‖, http://
[3] C. Morselli, inside Criminal Networks, New ncthakur. itgo.com /chand3c.htm
York - USA, Springer Science+Business Media
LLC.2009. [10]. S. Ruggieri, D. Pedreschi and F. Turini, ―Data
mining for discrimination discovery‖. ACM
[4] G. Wang, H. Chen, and H. Atabakhsh, Transactions on Knowledge Discovery and data
―Automatically Detecting Deceptive Criminal 4(2), Article 9, ACM,2010.
Identities,‖ Comm. ACM, Mar. 2004, pp. 70-
76.
_______________________________________________________________________________________________