Data Mining
Tasks and Techniques
Data Mining Tasks
Descriptive
Classi cation and Prediction
fi
Descriptive Function
• Deals with the general properties of data in the database.
➡ Class/Concept Description
➡ Mining of Frequent Pattern
➡ Mining of Associations
➡ Mining of Correlations
➡ Mining of Clusters
Descriptive Function
Class/Concept Description
• refers to the data to be associated with classes or concepts
• these descriptions can be derived in the following ways
➡ Data Characterization - refers to summarizing data of class under study. Class
understudy is called Target Class
➡ Data Discrimination - refers to the mapping or classi cation of a class with some
prede ned group or class
fi
fi
Descriptive Function
Mining of Frequent Patterns
• Frequent patterns occur in transactional data
➡ Frequent Item Set - refers to a set of items that frequently appear together
➡ Frequent Subsequence - a sequence of patterns that occur frequently such as
purchasing a camera is followed by memory card
➡ Frequent Sub Structure - sub structure refers to a different structural forms, such as
graphs, trees, or lattices, which may be combined with item-sets or subsequences
Descriptive Function
Mining of Associations
• Associations are used in retail to identify patterns that are frequently
purchased together.
• This process refer to the process of uncovering the relationship among data
and determining the association rules
• example: a retailer generates an association rule that shows that 70% of the
time milk is sold with bread and only 30% of the time biscuits are sold with
bread
Descriptive Function
Mining of Correlation
• It is a kind of additional analysis performed to uncover interesting statistical
correlation between associate-attribute-value pairs or between item sets to
analyze if they have positive, negative, or no effect on each other
Descriptive Function
Mining of Clusters
• Clusters refers to forming groups of objects that are very similar to each
other but are highly different from the objects in other clusters
Classi cation and Prediction
• Classi cation is the process of nding a model that describes the data into
classes or concepts
• The main purpose is to be able to use this model to predict class of objects
whose class label is unknown.
• The derived model is based on the analysis of sets of training data and can
be presented in the following forms:
Classification (If-Then) Rules
Decision Trees
Mathematical Formulae
Neural Networks
fi
fi
fi
Data Mining Techniques
Data mining includes utilization of re ned data analysis
tools to nd previously unknown, valid patterns and
relationships in huge data sets. These tools can
incorporate statistical model, machine learning
techniques, mathematical algorithms such as neural
networks and decision trees. Thus, data mining
incorporates analysis and prediction
fi
fi
Data Mining Techniques
• Classi cation
➡ This technique is used to obtain
important relevant information about
data and metadata
➡ This Data Mining technique helps
classify data in different classes
fi
Data Mining Techniques
• Clustering
➡ It is a division of information into groups
of connected objects
➡ Describing the data by a few clusters
mainly loses certain con ne details, but
accomplishes improvement. It model
data by clusters
➡ Clustering analysis is a data mining
technique to identify similar data
fi
Data Mining Techniques
• Regression
➡ Regression analysis is a data mining
process used to identify and analyze
relationship between variables because
of the presence of the other factor.
➡ Regression, primarily is a form of
planning and modeling
➡ it gives exact relationship between two
or more variables in the given data set
Data Mining Techniques
• Association Rules
➡ This is a technique that helps discover
the link between two or more items.
➡ It nds hidden pattern in the data set
➡ These are if-then statements that support
to show the probability of interactions
between data items within large data sets
in different types of database
➡ Association Rule mining has several
applications and is commonly used to
help sales correlation in data or medical
data sets
fi
Data Mining Techniques
• Outlier
➡ This technique relates to the observation
of data items in a data set, which do not
match an expected pattern or behavior
➡ This technique may be used in various
domains like intrusion, detection, fraud,
etc.
➡ Outlier is a data point that diverge too
much from the rest of the dataset
➡ It is valuable in elds like network
interruption, identi cation, credit or debit
fraud detection, etc.
fi
fi
Data Mining Techniques
• Sequential Patterns
➡ It is a data mining technique specialized
in evaluating sequential data to discover
sequential patterns
➡ It comprises of nding interesting
subsequences in a set of subsequence,
where the stake of a sequence can be
measured in terms of different criteria
like length, occurrence frequency, etc.
➡ This technique helps to discover or
recognize similar patterns in transaction
data over some time
fi
Data Mining Techniques
• Prediction
➡ This technique uses a combination of
other data mining technique such as
trends, clustering, classi cation, etc.
➡ It analyzes past events or instances in
the right sequence to predict future
events
fi