0% found this document useful (0 votes)

10 views13 pages

Chapter 2

Chapter 2 discusses data mining (DM) and its applications, emphasizing the extraction of valuable knowledge from large datasets across various fields such as healthcare, finance, and education. It outlines the Knowledge Discovery in Databases (KDD) process, which includes steps like data selection, cleaning, transformation, and mining to uncover patterns. The chapter also highlights various data mining techniques, their applications, and challenges, along with an introduction to machine learning and its types.

Uploaded by

pankaj kumar singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

Chapter 2

Uploaded by

pankaj kumar singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

CHAPTER 2

DATA MINING AND ITS APPLICATION

2.1 Data Mining (DM)

“Knowledge shows the way to Power and Success”

The origin of data mining technology meets people’s necessities. DM sometimes also
called as Knowledge Discovery from the Database (KDD). A terrific amount of data and
information is being collected with the help of computing devices and latest technologies.
Now data is everywhere: from business transactions, government, healthcare, websites
and scientific data etc. Just retrieval is not enough for decision-making, so the DM come
into picture for summarization of data for valuable information i.e., Knowledge discovery
and the discovery of patterns in raw data [9].
In the beginning, we started storing all data. Unfortunately, these gigantic collections of
data accumulated on dissimilar data structures very rapidly became devastating. DM can
extract implicit but potentially useful information and knowledge, which people do not
know in advance, from a lot of noisy, incomplete, random and fuzzy data in practical
application. The DM is happening field and powerful means to extract useful knowledge
from massive amounts of data to bridge the gap between knowledge and data.

Another definition of DM is the investigation and analysis of huge quantities of data in

order to discover legitimate, narrative, potentially useful, and eventually understandable
patterns in data. Process of analyzing through intelligent algorithms from large databases
to find patterns that are:

 Valid: The true patterns that holds in common.

 Novel: the pattern we do not know beforehand.
 Valuable: From the patterns we can invent actions.
 Understandable: We can deduce and figure out the patterns.
1
DM and KDD is a new interdisciplinary field, merging ideas from statistics, machine
learning databases and parallel computing.
Researchers have defined the term ‘data mining’ in many ways.
Few definitions of DM or KDD, which are available in literature, are given below.

2.2 KDD (Knowledge Data Discovery)

KDD process is a type of data mining methodology which used to extract hidden
knowledge from a large database, by implementing pre-processing step and data
transformation step.

Identification of Goal Definition of Problem Application Goal Known

Prior

Target of Data Set Data Set Selection Data set Creation

Data Pre-Processing Removing Noisy Data Handling Missing Data

Data Transformation Find Useful Feature Find Weighted Value

Data Mining Choosing DM Fun. Search for Presentation

Presentation Visualization Replace Redundant Pattern

Figure 2.1: KDD Process

This research will predict diabetes by using the Knowledge Discovery in Database
(KDD) methodology. KDD is the process of extracting knowledge from large database
2
and emphasize “high-level" application of particular data mining methods. KDD process
consists of nine step, the steps are iterative and interactive in nature 9. Note that the
process is iterative at each step, meaning that one might have to move back to previous
step. The process starts with determining the KDD goals, and ends with the
implementation of the discover knowledge.

KDD Steps:

1. Developing an understanding of

 The appropriate prior knowledge

 The Aim of the end-user

2. Creating a target data set or selecting a data set, on which detection is to be

accomplish.
3. Data cleaning and pre-processing.

 Removal of noise in dataset.

 Plan of action for handling missing data.

4. Data reduction

 Finding useful features to represent the data depending on the aim of the
task.
 Use of dimensionality reduction methods to reduce the decrease number of
variables for the representations for the data.

5. Choosing the data mining task.

 Choose the Aim of the KDD process is classification, regression,

clustering or any other.

6. Choosing the data algorithms.

 Selecting methods to be used for searching for patterns in the data.

 Deciding which models and parameters may be appropriate.

7. Data mining.

 A set of such representations as classification rules or trees, regression,

clustering.

3
8. Define mined patterns.
9. Combine founded knowledge.

2.3 Data mining process

Data mining is the process of extracting hidden, previously unknown patterns from huge
database or data warehouse. Data mining is also known as knowledge discovery from
data (KDD). Data mining play important role in the various area like banking, education,
health care, medical etc. Many organizations use data mining technique to analyses large
dataset, to support decision making process and to get better result for their long-term
need.

Data Data Data Data

Data
Processing Trans-formation Mining Evaluation
Selection

Figure 2.2: Data Mining Process Steps

Health organization use data mining technique in order to identify hidden patterns from
disease, drugs dataset and used for prediction and detection of different disease and also it
supports decision making process in clinical diagnosis. Different data mining technique is
used prediction and detection of different disease, some of the technique is listed below.
[24]

4
2.3.1 Data Mining Techniques

Classification is the process of finding a model which describes and distinguishes data
classes or concepts based on a class label. There are different classification algorithms
some of this are Artificial Neural Network (ANN), Decision tree, Bayesian network,
naïve bays etc.

Clustering is the process of analysing data objects without consulting a class label. It is
process of grouping new class based on maximizing the intra class similarity and
minimizing the interclass similarity. There are different clustering algorithms some of
this are K nearest neighbour and k mean clustering.

Association rule learning is machine learning method which used for finding frequent
patterns. Some of the association algorithm is Apriori algorithm, Eclat algorithm and FP
growth algorithm.

2.3.2 Applications of Data mining

A Traffic Prediction
P
P
Videos Surveillance
L
I
C Search Engine Result Refining
A
T
I Online Fraud Detection
O
N
N Product Recommendations
O
N

Figure 2.3: Area where DM Used

5
Traffic Predictions: Google uses the DM algorithm n the traffic prediction we all used
the GPS navigation system because of this navigation system the data is saved is a central
database and update the location of a vehicle. The underlying problem is that there are a
minimum number of cars that are equipped with GPS. Machine learning in such scenarios
helps to estimate the regions where congestion can be found on the basis of daily
experiences. [7]
Videos Surveillance: Imagine a single person monitoring multiple video cameras, a
difficult job to do and boring as well. This is why the idea of training computers to do this
job makes sense.

The video surveillance device nowadays is powered by way of AI that makes it viable to
hit upon crime earlier than they happen. They song uncommon behaviour of people like
status immobile for a long term, stumbling or snoozing on benches.

Search Engine Result Refining: Google and other search engines use DM to improve
the search results for you. Every time you execute a search, the algorithms at the backend
keep a watch at how you respond to the results. If you open the top results and stay on the
web page for long, the search engine assumes that the results it displayed were in
accordance to the query. Similarly, if you reach the second or third page of the search
results but do not open any of the results, the search engine estimates that the results
served did not match requirement. This way, the algorithms working at the backend
improve the search results.[7]

Online Fraud Detection: DM is proving its potential to make cyberspace a secure place
and tracking monetary frauds online is one of its examples. For example: PayPal is using
ML for protection against money laundering.

Product Recommendations: DM algorithm is used in product recommendations User

got the same product on his social media account that he saw on a e-commerce website.

Future Healthcare: Data mining improve health systems. It uses data and analytics to
verify best practices that improve supervision and reduce costs. Researchers use data
mining algorithms like multi-dimension
6
l databases, machine learning, soft computing, data visualization and statistics. Mining
can be useful to predict the volume of patients in every class. Methods are developed that
make sure that the patients get appropriate supervision at the right place and at right time.

Market Basket Analysis: Market basket analysis is a modelling algorithm based on

theory that if you buy a certain group of items, you are more likely to buy another group
of items. This method may allow the shopkeeper to know the purchase behaviour of a
purchaser. This information can help the shopkeeper to understand the purchaser’s
requirements and change the shop’s layout accordingly.

Education: There is new emerging field, known as Educational Data Mining, concerns
with developing techniques that discover knowledge from data obtained from the
educational Environments. The objectives of EDM are identified as predicting the
students’ future studying behaviour, understanding the effects of educational help, and
improving scientific knowledge about learning. Data mining can be used by an institution
to take correct decisions and also for predicting the Progress Report of the student. With
the results the institution can focus on how to teach and what to teach.[7]

CRM: Customer Relationship Management, it is about acquiring and retaining

customers, also advancing customers’ loyalty and developing customer focused
strategies. To maintain a proper relationship with the customer.

Product Recommendations DM algorithm are used in product recommendations User

got the same product on his social media to account that he saw on an e-commerce
website.

2.3.3 Data Mining Challenges:

 Developing a Unifying Theory of Data Mining.

 Scaling Up for High Dimensional Data/High Speed Streams.

 Mining Sequence Data and Time Series Data.

7
2.4 Introduction to Machine Learning

Machine learning works on a very simple concept understanding with experiences.

Machine learning is the process that comes from humans and animals teaches computer
that learning from the experience. Machine learning contains algorithms that learn from
past data and predicts the future data. In machine learning we train computer by
algorithm on some data and predicted the future results. The algorithms adaptively
improve their performance as the number of samples available for learning increases.

2.4.1 Types of Techniques of Machine Learning

Supervised ML

Unsupervised ML

Semi supervised ML

Reinforcement ML

Machine Learning Multitasking Learning

Ensemble Learning

Neural Network

Instance Based Learning

Figure 2.4: Types of Machine Learning

8
Supervised Learning: In supervised learning mechanism we have to educate the model
with some prior knowledge so that they can behave like intelligent program. Here we
have to give training as well as we can use this program for further use.

Unsupervised Learning: In unsupervised learning mechanism we have to educate the

model without any prior knowledge means this is typical to make a program behaves
intelligently.

Reinforcement Learning: In this learning all programs learn their steps on the basis of
their experiences. This comes in between supervised & unsupervised. Here a terms agent
comes in picture which has very important work. Here agent will take action or learn
decisions on the basis of prior working.

Multitasking Learning: Multitask Learning (MTL) is an initial changing tool whose

main motto to enhance generalization conduct. MTL improves the above mechanism by
averaging the domain related advice containing in the training indicator of related works.

Decision Tree Model: A decision tree model is one of the most common data mining
models. It is popular because the resulting model is easy to understand. The algorithms
use a recursive partitioning approach. Decision tree is a type of supervised learning
algorithm that is mostly used in classification problems.

Types of decision tree is based on the type of target variable; it can be of two types:

Categorical Decision

Decision Tree
Continuous Decision

Figure 2.5: Types of Decision Tree

Categorical Variable Decision Tree: Decision Tree which has categorical target
variable then it called as categorical variable decision tree.

9
Example: In above scenario of student problem, where the target variable was “It will
rain today” YES or NO.

Continuous Variable Decision Tree: Decision Tree has continuous target variable then
it is called as Continuous Variable Decision Tree. Example: - Salary of a person.

Support Vector Machine Model: A Support Vector Machine (SVM) searches for so
called support vectors which are data points that are found to lie at the edge of an area in
space which is a boundary from one class of points to another. In the terminology of
SVM we talk about the space between regions containing data points in different classes
as being the margin between those classes. The support vectors are used to identify a
hyperplane (when we are talking about many dimensions in the data, or a line if we were
talking about only two-dimensional data) that separates the classes.[6]

Y-Axis

X-Axis

Figure 2.6: Model of Support Vector Machine

Artificial neural network

Artificial neural network is one of prediction algorithm which use learning rate and
momentum to classify data accurately. ANN predict the output by adjusting weight. It
consists of three layers

10
OUTPUT
INPUT

HIDDEN LAYER
LAYER

LAYER
Figure 2.7: Layers of Artificial Neural Network
Back propagation algorithm is type of Artificial neural network algorithm by which each
neuron is learned by adjusting the weighted associated with it in order to correct or
reduce the error. It is supervised learning algorithm which used gradient descent
optimization algorithm in order to adjust the weight on the neurons by computing the
gradient of loss function. [6]

Advantage of Artificial neural network

This study chooses ANN algorithm because of the following advantages some of them
are:

1) Ability to classify nonlinear data and Complex relationship.

2) It has high ability tolerance to Noisy data and missing value.

3) It has ability to classify untrained data.

Clustering: Clustering is the process of grouping the physical and abstract objects into
classes of the similar objects. Clustering is a process of partitioning a set of data (or
objects) into a set of meaningful sub-classes, called clusters. It is an unsupervised
learning method there are no predefined classes. Clustering technique will generate high
quality clusters that intra-class similarity is high and inter-class similarity is low. The
characteristic of a clustering result also relies upon both the similarity measure used by
the technique and its implementation. The aspect of a clustering technique is measured
by its performance to find some or all of the unseen patterns.

11
Boosting: Boosting is very important classification method in the recent development. It
works by applying a classification algorithm sequentially to reweighted version of
training dataset, then choosing the weighted majority vote of sequence of classifiers
produced this simple algorithm results in dramatic improvement in performance for many
classification algorithms. This seems that phenomenon can be understood in terms of
statistical principles, namely additive modelling on logistic scale which uses Bernoulli
criterion as much as it can.

Association Rule Mining: Association rules analysis is a technique to uncover how

items are associated to each other. Association rule mining „ Finding frequent patterns,
associations, correlations, or causal structures among sets of items in transaction
databases. What customer buying in his basket by finding associations and correlations
between the different items that customers place in their baskets. „

Applications of association rule mining

1) Basket data analysis.

2) Cross-marketing.

3) Catalog design.

4) Loss-leader analysis.

2.5 Importance of Boosting Method

Boosting is Machine learning Meta algorithm for reducing bias and variance in
supervised learning and machine learning which converts weak learner to strong learner.
A question is posed by Kearn and Valiant “Can a group of weak learners make a strong
learner? “Here a weak learner is defined as classifier i.e., slightly correlated with the
right classification (it can provide example which are better than random guessing) on
contrary. a strong learner is a classifier which is arbitrarily well correlated with the right
classification.

12
2.6 Types of Classification Algorithms

Naïve Bayes

Support Vector Machine

Logistic Regression

Decision Tree

Random Forest
Classification Algorithms
K-Mean

Neural Network

Fuzzy k-NN

Genetic Algorithm

Figure 2.8: Types of Classification Algorithms

Topic 1b - History, Evolution and DM Classification
No ratings yet
Topic 1b - History, Evolution and DM Classification
16 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
39 pages
Mohammad Adnan Sheikh, Div C, Roll No 42
No ratings yet
Mohammad Adnan Sheikh, Div C, Roll No 42
48 pages
L1 CH 1 Introd
No ratings yet
L1 CH 1 Introd
97 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Data Mining & BI Lecture Overview
No ratings yet
Data Mining & BI Lecture Overview
50 pages
Data Mining
No ratings yet
Data Mining
63 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Data Mining (DM)
No ratings yet
Data Mining (DM)
45 pages
Data Mining Basics and Techniques
No ratings yet
Data Mining Basics and Techniques
98 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Unit 1 - Data Science BCA
No ratings yet
Unit 1 - Data Science BCA
16 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
Data Mining
No ratings yet
Data Mining
14 pages
Data Mining: Process and Techniques
No ratings yet
Data Mining: Process and Techniques
45 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining & Preprocessing Guide
No ratings yet
Data Mining & Preprocessing Guide
59 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
DM GTU Study Material Presentations Unit-1 18022021081301AM
No ratings yet
DM GTU Study Material Presentations Unit-1 18022021081301AM
39 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
38 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
CH 1 - Introduction
No ratings yet
CH 1 - Introduction
13 pages
Data Mining: Issues and Motivations
No ratings yet
Data Mining: Issues and Motivations
23 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Ch1 Overview KDD - ML
No ratings yet
Ch1 Overview KDD - ML
23 pages
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
66 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Introduction To Data Mining Unit1
100% (1)
Introduction To Data Mining Unit1
37 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
23 pages
Data Mining Overview
No ratings yet
Data Mining Overview
14 pages
Data Mining
No ratings yet
Data Mining
31 pages
DB 14
No ratings yet
DB 14
97 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
14 pages
DM Chapter 1
No ratings yet
DM Chapter 1
37 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Data Mining Techniques for Business Insights
No ratings yet
Data Mining Techniques for Business Insights
78 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
52 pages
Chapter 4 - IS 466 - Fall Semester 24-25
No ratings yet
Chapter 4 - IS 466 - Fall Semester 24-25
57 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Lecture 1 and 2 - Introduction and Background
No ratings yet
Lecture 1 and 2 - Introduction and Background
28 pages
DMDW Unit1
No ratings yet
DMDW Unit1
31 pages
Data Mining
No ratings yet
Data Mining
20 pages
01 Intro
No ratings yet
01 Intro
45 pages
Chapter 4 - IS 466 - Spring Semester 23-24 Final
No ratings yet
Chapter 4 - IS 466 - Spring Semester 23-24 Final
57 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
机器学习读书会嘉宾分享计算机视觉目标检测
No ratings yet
机器学习读书会嘉宾分享计算机视觉目标检测
52 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Review
No ratings yet
Review
34 pages
Data Science 100 MCQs
100% (1)
Data Science 100 MCQs
16 pages
Overfitting & Underfitting in Machine Learning
No ratings yet
Overfitting & Underfitting in Machine Learning
9 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
AIML Project Report On Predicting Blood Glucose in Diabetic Patients Using RandomForest Classifier (1
No ratings yet
AIML Project Report On Predicting Blood Glucose in Diabetic Patients Using RandomForest Classifier (1
25 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Winning Strategies in Kaggle Competitions
No ratings yet
Winning Strategies in Kaggle Competitions
22 pages
1 s2.0 S1877050923001102 Main
No ratings yet
1 s2.0 S1877050923001102 Main
7 pages
Novel Framework for Malicious URL Detection
No ratings yet
Novel Framework for Malicious URL Detection
26 pages
CH 7 Ensemble Learning
No ratings yet
CH 7 Ensemble Learning
34 pages
200 DT and RF MCQs
No ratings yet
200 DT and RF MCQs
28 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
Ensemble Techniques in ML Guide
No ratings yet
Ensemble Techniques in ML Guide
13 pages
Tata Steel Iot Sample Paper Questions Solutions
No ratings yet
Tata Steel Iot Sample Paper Questions Solutions
43 pages
Module 4 ML
No ratings yet
Module 4 ML
33 pages
Performance Evaluation On Resolution Time Predicti
No ratings yet
Performance Evaluation On Resolution Time Predicti
9 pages
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
No ratings yet
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
23 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
Preprints202403 0585 v3
No ratings yet
Preprints202403 0585 v3
10 pages
Sales Prediction
100% (1)
Sales Prediction
37 pages
Boosting Algorithms As Gradient Descent
No ratings yet
Boosting Algorithms As Gradient Descent
7 pages
Enhanced Neutrosophic Set and Machine Learning Approach For Breast Cancer Prediction
No ratings yet
Enhanced Neutrosophic Set and Machine Learning Approach For Breast Cancer Prediction
12 pages
PG Diploma in Machine Learning & AI
No ratings yet
PG Diploma in Machine Learning & AI
23 pages
Notes Unit 1-3 Part-I
No ratings yet
Notes Unit 1-3 Part-I
20 pages
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
100% (1)
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
672 pages
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
No ratings yet
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
16 pages
Achine Learning Based Disease Diagnosis Comprehensive Review
No ratings yet
Achine Learning Based Disease Diagnosis Comprehensive Review
30 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

CHAPTER 2

DATA MINING AND ITS APPLICATION

2.1 Data Mining (DM)

“Knowledge shows the way to Power and Success”

Another definition of DM is the investigation and analysis of huge quantities of data in

 Valid: The true patterns that holds in common.

2.2 KDD (Knowledge Data Discovery)

Identification of Goal Definition of Problem Application Goal Known

Target of Data Set Data Set Selection Data set Creation

Data Pre-Processing Removing Noisy Data Handling Missing Data

Data Transformation Find Useful Feature Find Weighted Value

Data Mining Choosing DM Fun. Search for Presentation

Presentation Visualization Replace Redundant Pattern

Figure 2.1: KDD Process

 The appropriate prior knowledge

2. Creating a target data set or selecting a data set, on which detection is to be

 Removal of noise in dataset.

5. Choosing the data mining task.

 Choose the Aim of the KDD process is classification, regression,

6. Choosing the data algorithms.

 Selecting methods to be used for searching for patterns in the data.

 A set of such representations as classification rules or trees, regression,

2.3 Data mining process

Data Data Data Data

Figure 2.2: Data Mining Process Steps

2.3.2 Applications of Data mining

Figure 2.3: Area where DM Used

Product Recommendations: DM algorithm is used in product recommendations User

Market Basket Analysis: Market basket analysis is a modelling algorithm based on

CRM: Customer Relationship Management, it is about acquiring and retaining

Product Recommendations DM algorithm are used in product recommendations User

2.3.3 Data Mining Challenges:

 Developing a Unifying Theory of Data Mining.

 Scaling Up for High Dimensional Data/High Speed Streams.

 Mining Sequence Data and Time Series Data.

Machine learning works on a very simple concept understanding with experiences.

2.4.1 Types of Techniques of Machine Learning

Machine Learning Multitasking Learning

Instance Based Learning

Figure 2.4: Types of Machine Learning

Unsupervised Learning: In unsupervised learning mechanism we have to educate the

Multitasking Learning: Multitask Learning (MTL) is an initial changing tool whose

Figure 2.5: Types of Decision Tree

Figure 2.6: Model of Support Vector Machine

Advantage of Artificial neural network

1) Ability to classify nonlinear data and Complex relationship.

2) It has high ability tolerance to Noisy data and missing value.

3) It has ability to classify untrained data.

Association Rule Mining: Association rules analysis is a technique to uncover how

Applications of association rule mining

1) Basket data analysis.

2.5 Importance of Boosting Method

Support Vector Machine

Figure 2.8: Types of Classification Algorithms

You might also like