0% found this document useful (0 votes)

42 views75 pages

2-Concept Hierarchy To Classification of DMS

The document discusses concept hierarchies, which organize data into levels of abstraction for improved analysis and understanding. It outlines various types of hierarchies, such as schema, set-grouping, operation-derived, and rule-based hierarchies, and their applications in data mining. Additionally, it covers OLAP operations, the KDD process, and the functionalities of data mining, including classification, regression, and association analysis.

Uploaded by

chris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views75 pages

2-Concept Hierarchy To Classification of DMS

Uploaded by

chris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 75

Concept Hierarchy

• A concept hierarchy defines a sequence of mappings from a set of

low-level concepts to higher-level, more general concepts.
• Hierarchical organization -more efficient and effective data analysis.
• Ability to drill down to more specific levels of detail when needed.
• Use - to organize and classify data in a way that makes it more understandable and
easier to analyze.

• Main idea behind –

• the same data can have different levels of granularity or levels of detail
• By organizing the data in a hierarchical fashion, it is easier to understand and
perform analysis.
Types of Concept Hierarchies
Schema Hierarchy
• Used to organize the schema of a database in a
logical and meaningful way, grouping similar
objects together.

• Can be used to organize different types of data,

such as tables, attributes, and relationships, in a
logical and meaningful way.

• Useful in data warehousing, where data from

multiple sources needs to be integrated into a
single database.
Types of Concept Hierarchies
Set-Grouping Hierarchy

• Based on set theory

• Each set in the hierarchy is defined in
terms of its membership in other sets.
• Can be used for data cleaning, data
pre-processing and data integration.
• Can be used to
• identify and remove outliers,
noise, or inconsistencies from the
data.
• to integrate data from multiple
sources.
Types of Concept Hierarchies
Operation-Derived Hierarchy
• Organize data by applying a series of operations or transformations to the data.
• The operations are applied in a top-down fashion.
• Each level of the hierarchy representing a more general or abstract view of the
data than the level below it.
• Typically used in data mining tasks such as clustering and dimensionality
reduction.
• The operations applied can be mathematical or statistical operations such as
aggregation, normalization
• Eg: email address: login name< department< university< Country
• [email protected]
Types of Concept Hierarchies
Rule-based Hierarchy
• Used to organize data by applying a set of rules or conditions to the data.
• Useful in data mining tasks such as classification, decision-making, and data
exploration.
• It allows to the assignment of a class label or decision to each data point based on
its characteristics
• Identifies patterns and relationships between different attributes of the data.
Need of Concept Hierarchy in
Data Mining
• There are several reasons why a concept hierarchy is useful in data
mining:
1. Improved Data Analysis
2. Improved Data Visualization and Exploration
3. Improved Algorithm Performance
4. Data Cleaning and Pre-processing
5. Domain Knowledge
Applications of Concept
Hierarchy
There are several applications of concept hierarchy in data mining,
some examples are:
• Data Warehousing
• Business Intelligence
• Online Retail
• Healthcare
• Natural Language Processing
• Fraud Detection
OLAP Operations
• OLAP ONLINE ANALYTICAL PROCESSING (OLAP) provides a user-
friendly environment for Interactive data analysis.
• In the multidimensional model, data are organized into multiple
dimensions, and each dimension contains multiple levels of
abstraction defined by concept hierarchies.
OLAP operations
• ROLL-UP (aka DRILL UP):summarize data
• ROLL DOWN or DRILL-DOWN : reverse of roll up
• SLICING AND DICING : project and select
• PIVOT (ROTATE): reorient the cube

• Additional
• Drill across
• Drill through
Roll Up/Drill Up/Aggregation
• Performs aggregation on a data cube, either by climbing up a concept
hierarchy for a dimension or by dimension reduction.
Roll Down/Drill-down
• Drill-down is the reverse of roll-up.
• Drill-down is like zooming-in on the data cube.
• It navigates from less detailed data to more detailed data.
• Drill-down can be realized by either stepping down a concept
hierarchy for a dimension or introducing additional dimensions.
Slice
• A slice is a subset of the cubes corresponding to a single value for one
or more members of the dimension.
• Eg: when the customer wants a selection on one dimension of a three-
dimensional cube resulting in a two-dimensional site.
• Slice operations perform a selection on one dimension of the given
cube, thus resulting in a subcube.
• A slice operation where the sales
data are selected from the central
cube for the dimension time using
the criterion time = “Q1.”
Dice
• The dice operation defines a subcube by
performing a selection on two or more
dimensions.
• A dice operation on the central cube
based on the following selection criteria
that involve three dimensions:
(location = “Toronto” or “Vancouver”)
and (time = “Q1” or “Q2”)
and (item = “mobile ” or “modem”.
Pivot

• The pivot operation is also called

a rotation.
• Pivot is a visualization operation. .
• Rotates the data axes in view to
provide an alternative presentation
of the data.
• May swap the rows and columns
or move one of the row-
dimensions into the column
dimensions.
Other OLAP Operations
• Drill-across executes queries involving (i.e., across) more than one
fact table.
• The drill-through operation uses relational SQL facilities to drill
through the bottom level of a data cube down to its back-end relational
tables.
Introduction to KDD
process
KDD- Knowledge Discovery in Datases
1. Data cleaning (to remove noise and
inconsistent data)
2. Data integration (where multiple data
sources may be combined)
3. Data selection (where data relevant to the
analysis task are retrieved from the
database)
4. Data transformation (where data are
transformed and consolidated into forms
appropriate for mining by performing
summary or aggregation operations)
5. Data mining (an essential process where
intelligent methods are applied to extract
data patterns)
6. Pattern evaluation (to identify the truly
interesting patterns representing knowledge
based on interestingness measures)
7. Knowledge presentation (where
visualization and knowledge representation
techniques are used to present mined
knowledge to users)
Advantages of KDD
• Improves decision-making
• Increased efficiency
• Better customer service
• Fraud detection
• Predictive modeling
Disadvantages of KDD

• Privacy concerns
• Complexity
• Unintended consequences
• Data Quality
• High cost
• Overfitting
Data Mining
Definition
Data mining is the process of discovering interesting patterns and
knowledge from large amounts of data.
• The data sources can include databases, data warehouses, the Web, other
information repositories, or data that are streamed into the system
dynamically.
Key Outcomes of Data Mining
• Automatic discovery of patterns
• Prediction of likely outcomes
• Creation of actionable information
• Focus on large datasets and databases
What is Data Mining?
• The process of extracting knowledge or insights from large amounts of data
using various statistical and computational techniques.
• The data can be structured, semi-structured or unstructured.
• Data can be stored in various forms such as databases, data warehouses,
and data lakes.
• Primary goal - to discover hidden patterns and relationships in the data
that can be used to make informed decisions or predictions.
• How? – By exploring the data using various techniques such as clustering,
classification, regression analysis, association rule mining, and anomaly
detection.
• Applications- marketing, finance, healthcare, and telecommunications.
• Eg: in marketing, data mining can be used to identify customer segments
and target marketing campaigns, while in healthcare, it can be used to
identify risk factors for diseases and develop personalized treatment plans.
Alternative names for Data
Mining
1. Knowledge discovery (mining) in databases (KDD)
2. Knowledge extraction
3. Data/pattern analysis
4. Data archaeology
5. Data dredging
6. Information harvesting
7. Business intelligence
Data Mining on what kinds of data?
• Flat Files
• Relational Databases
• Data Warehouse
• Transactional Database
• Multimedia Database
• Spatial Database
• Time-series database
• WWW
Parameter KDD Data Mining

KDD refers to a process of identifying valid, Data Mining refers to a process of extracting
Definition novel, potentially useful, and ultimately useful and valuable information or patterns
understandable patterns and relationships in data. from large data sets.

Objective To find useful knowledge from data. To extract useful information from data.

Data cleaning, data integration, data selection,

Association rules, classification, clustering,
data transformation, data mining, pattern
Techniques Used regression, decision trees, neural networks, and
evaluation, and knowledge representation and
dimensionality reduction.
visualization.

Patterns, associations, or insights that can be

Structured information, such as rules and models,
Output used to improve decision-making or
that can be used to make decisions or predictions.
understanding.

Focus is on the discovery of useful knowledge, Data mining focus is on the discovery of
Focus
rather than simply finding patterns in data. patterns or relationships in data.

Domain expertise is important in KDD, as it

Domain expertise is less critical in data mining,
Role of domain helps in defining the goals of the process,
as the algorithms are designed to identify
expertise choosing appropriate data, and interpreting the
patterns without relying on prior knowledge.
results.
Data Mining Functionalities
• Data mining functionalities specify the kind of patterns to be found in
data mining tasks.
• In general, data mining tasks can be classified into two categories:
descriptive and predictive.
• Descriptive mining tasks characterize the general properties of the
data in the target data set.
• Predictive mining tasks perform inference on the current data in
order to make predictions.
Concept/Class Description
Data can be associated with classes or concepts.
Class : A collection of things sharing a common attribute
Classes of items for sale include computers and printers
Concept: An abstract or general idea inferred or derived from specific instances
Concepts of customers include bigSpenders and budgetSpenders.
Summarized, concise and precise descriptions of individual classes and concepts are
called class/concept descriptions.
These descriptions can be derived using
(1) data characterization, by summarizing the data of the class under study
(often called the target class) in general terms, or
(2) data discrimination, by comparison of the target class with one or a set of
comparative classes (often called the contrasting classes), or
(3) both data characterization and discrimination.
Concept/Class Description
• Data characterization is a summarization of the general characteristics or
features of a target class of data.
• The data corresponding to the user-specified class are typically collected by a
query.

For example, to study the characteristics of software products with sales that
increased by 10% in the previous year, the data related to such products can be
collected by executing an SQL query on the sales database.
• Simple data summaries can be done based on statistical measures and plots.
• The data cube–based OLAP roll-up operation can be used to perform data
summarization along a specified dimension.
• An attribute-oriented induction technique can be used to perform data
generalization and characterization without step-by-step user interaction
Concept/Class Description
• The output of data characterization can be presented in various forms.
• Eg: pie charts, bar charts, curves, multidimensional data cubes, and
multidimensional tables, including crosstabs.

• The resulting descriptions can also be presented as generalized relations or in rule form
(called characteristic rules).
Eg: Data characterization
A customer relationship manager at AllElectronics may order the following data
mining task:
“Summarize the characteristics of customers who spend more than $5000 a year at
AllElectronics.”

The result is a general profile of these customers, such as that they are 40 to 50
years old, employed, and have excellent credit ratings.

The data mining system should allow the customer relationship manager to drill
down on any dimension, such as on occupation to view these customers according
to their type of employment.
Data discrimination
• Data discrimination is a comparison of the general features of the target class
data objects against the general features of objects from one or multiple
contrasting classes.
• The target and contrasting classes can be specified by a user, and the
corresponding data objects can be retrieved through database queries.
• For example, a user may want to compare the general features of software products with sales
that increased by 10% last year against those with sales that decreased by at least 30%
during the same period.
• The methods used for data discrimination are similar to those used for data
characterization.
Data discrimination
• The forms of output presentation are similar to those for characteristic
descriptions, although discrimination descriptions should include comparative
measures that help to distinguish between the target and contrasting classes.
• Discrimination descriptions expressed in the form of rules are referred to as
discriminant rules.
Eg:Data discrimination
A customer relationship manager at AllElectronics may want to compare two groups of
customers—those who shop for computer products regularly (e.g., more than twice a
month) and those who rarely shop for such products (e.g., less than three times a year).
The resulting description provides a general comparative profile of these customers, such
as that
• 80% of the customers who frequently purchase computer products are between 20 and
40 years old and have a university education,
• whereas 60% of the customers who infrequently buy such products are either seniors
or youths, and have no university degree.
Drilling down on a dimension like occupation,or adding a new dimension like income
level, may help to find even more discriminative features between the two classes.
Mining Frequent Patterns
Frequent patterns are patterns that occur frequently in data.

Frequent itemset - refers to a set of items that often appear together in a transactional data set;
Eg: milk and bread, which are frequently bought together in grocery stores by many customers.
Sequential pattern A frequently occurring subsequence.
Eg:customers, tend to purchase first a laptop, followed by a digital camera, and then a memory card
Frequent substructure refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with itemsets or subsequences. If a substructure occurs frequently, it is called a (frequent)
structured pattern.

Mining frequent patterns leads to the discovery of interesting associations and correlations within
data.
Frequent itemset mining is a fundamental form of frequent pattern mining.
Association analysis.
Suppose that, as a marketing manager at AllElectronics, you want to know which
items are frequently purchased together (i.e., within the same transaction).

An example of such a rule, mined from the AllElectronics transactional database, is:

where X is a variable representing a customer.

A confidence, or certainty, of 50% means that if a customer buys a computer, there
is a 50% chance that she will buy software as well.
A 1% support means that 1% of all the transactions under analysis show that
computer and software are purchased together.
Association analysis.
• The association rule involves a single attribute or predicate (i.e., buys) that
repeats.
• Association rules that contain a single predicate are referred to as single-
dimensional association rules.
• Dropping the predicate notation, the rule can be written simply as
Example: Multi dimensional
Association rules
AllElectronics relational database related to purchases, a data mining
system may find association rules like
Example: Multi dimensional
Association rules
Of the AllElectronics customers under study
• 2% are 20 to 29 years old with an income of $40,000 to $49,000 and have
purchased a laptop (computer) at AllElectronics.
• There is a 60% probability that a customer in this age and income group will
purchase a laptop.

• An association involving more than one attribute or predicate (i.e., age, income,
and buys).
• Each attribute is referred to as a dimension-> referred to as a multidimensional
association rule.
Classification and Regression for Predictive Analysis

Classification

• Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts.
• The model are derived based on the analysis of a set of training data (i.e., data objects for
which the class labels are known).
• The model is used to predict the class label of objects for which the the class label is
unknown.
• How is the derived model presented?
• classification rules (i.e., IF-THEN rules)
• Decision trees
• Mathematical formulae
• neural networks
Decision tree
• A flowchart-like tree structure
• Each node denotes a test on an attribute value
• Each branch represents an outcome of the test
• Tree leaves represent classes or class distributions.
• Decision trees can easily be converted to classification rules.

Neural network
• used for classification
• A collection of neuron-like processing units with weighted connections between the units.

Other Classification Models:

Naïve Bayesian classification
Support Vector Machines
k-nearest-neighbor classification.
Regression

• Regression models continuous-valued functions.

• Used to predict missing or unavailable numerical data values rather than
(discrete) class labels.
• Prediction -> both numeric prediction and class label prediction.
• Regression analysis - a statistical methodology that is most often used for
numeric prediction.
• Regression also encompasses the identification of distribution trends based on the
available data.
Classification and Regression for
Predictive Analysis

Relevance analysis
• Classification and regression may need to be preceded by relevance analysis.
• Attempts to identify attributes that are significantly relevant to the classification
and regression process.
• Other attributes, which are irrelevant, can then be excluded from consideration.
Eg: Classification
Suppose as a sales manager of AllElectronics you want to classify a large set of
items in the store, based on three kinds of responses to a sales campaign:good
response, mild response and no response.
Derive a model for each of these three classes based on the descriptive features of
the items, such as price, brand, place made, type, and category.
The resulting classification should maximally distinguish each class from the others,
presenting an organized picture of the data set.
Eg: Regression
• Predict the amount of revenue that each item will generate during an upcoming
sale at AllElectronics, based on the previous sales data.
• An example of regression analysis because the regression model constructed will
predict a continuous function (or ordered value.)
Cluster Analysis
• Clustering analyzes data objects without consulting class labels.
• Clustering can be used to generate class labels for a group of data.
• The objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity.
• Objects within a cluster have high similarity in comparison to one another, but are
rather dissimilar to objects in other clusters.
• Each cluster so formed can be viewed as a class of objects, from which rules can
be derived.
• Clustering facilitate taxonomy formation -> the organization of observations into
a hierarchy of classes that group similar events together.
Example
• Cluster analysis can be performed on AllElectronics customer data to identify
homogeneous subpopulations of customers.
• These clusters may represent individual target groups for marketing.
• Three clusters of data points are evident.
Outlier Analysis
• A data set may contain objects that do not comply with the general behavior or model of
the data- Outliers.
• Many data mining methods discard outliers as noise or exceptions.
• In some applications the rare events can be more interesting than the more regularly
occurring ones.
• The analysis of outlier data is referred to as outlier analysis or anomaly mining.
• Detected using:
• statistical tests that assume a distribution or probability model for the data
• distance measures where objects that are remote from any other cluster are considered
outliers.
• Density-based methods may identify outliers in a local region, although they look
normal from a global statistical distribution view.
Example -Outlier analysis
• Outlier analysis may uncover fraudulent usage of credit cards by detecting
purchases of unusually large amounts for a given account number in comparison
to regular charges incurred by the same account.
• Outlier values may also be detected with respect to the locations and types of
purchase, or the purchase frequency.
Are All Patterns Interesting?
• No—only a small fraction of the patterns potentially generated would
actually be of interest to a given user.

“What makes a pattern interesting?

Can a data mining system generate all of the interesting patterns?
Can the system generate only the interesting ones?”
“What makes a pattern interesting?

A pattern is interesting if it is
(1) easily understood by humans
(2) valid on new or test data with some degree of certainty
(3) potentially useful
(4) novel.

A pattern is also interesting if it validates a hypothesis that the user

sought to confirm.
An interesting pattern represents knowledge.
Objective measures of pattern
interestingness
• Based on the structure of discovered patterns and the statistics underlying them.

• An objective measure for association rules of the form X =>Y is rule Support,
• Represent the percentage of transactions from a transaction database that the given rule
satisfies.
• This is taken to be the probability P(X U Y), where X U Y indicates that a transaction
contains both X and Y, that is, the union of itemsets X and Y.

• Confidence, which assesses the degree of certainty of the detected association.

• This is taken to be the conditional probability P(Y|X), that is, the probability that a
transaction containing X also contains Y.
• More formally, support and confidence are defined as
Objective measures of pattern
interestingness
• Accuracy -the percentage of data that are correctly classified by a rule.
• Coverage is similar to support- the percentage of data to which a rule applies.
• Although objective measures help identify interesting patterns, they are often
insufficient unless combined with subjective measures that reflect a particular
user’s needs and interests.
• For example, patterns describing the characteristics of customers who shop
frequently at AllElectronics should be interesting to the marketing manager, but
may be of little interest to other analysts studying the same database for patterns
on employee performance.
• Many patterns that are interesting by objective standards may represent common
sense and, therefore, are actually uninteresting.
Subjective interestingness
measures
• Based on user beliefs in the data.
• These measures find patterns interesting if the patterns are unexpected (contradicting a
user’s belief) or offer strategic information on which the user can act(actionable
patterns).
• For example, patterns like “a large earthquake often follows a cluster of small quakes”
may be highly actionable if users can act on the information to save lives.
• Patterns that are expected can be interesting if they confirm a hypothesis that the user
wishes to validate or they resemble a user’s hunch.
• Eg: During a clinical trial for a new medication, researchers might expect the medication
group to show improvement in certain symptoms compared to the placebo group.
Observing this expected pattern strengthens the evidence for the medication's
effectiveness.
Can a data mining system generate all of the
interesting patterns?

• Refers to the completeness of a data mining algorithm.

• It is often unrealistic and inefficient for data mining systems to generate all
possible patterns.
• Instead, user provided constraints and interestingness measures should be used to
focus the search.
• For some mining tasks, such as association, this is often sufficient to ensure the
completeness of the algorithm.
• Association rule mining is an example where the use of constraints and
interestingness measures can ensure the completeness of mining.
Can a data mining system generate
only interesting patterns?
• An optimization problem in data mining.
• It is highly desirable for data mining systems to generate only interesting patterns.
• Users and data mining systems would have to search through the patterns generated
to identify the truly interesting ones.
• Progress made but optimization remains a challenging issue in data mining.
• Measures of pattern interestingness are essential for the efficient discovery of
patterns by target users.
• Such measures can be used after the data mining step to rank the discovered patterns
according to their interestingness, filtering out the uninteresting ones.
• Can be used to guide and constrain the discovery process, improving the search
efficiency by pruning away subsets of the pattern space that do not satisfy pre-
specified interestingness constraints.
Data Mining System Classification
• A data mining system can be classified according to the following criteria −

• Database Technology
• Statistics
• Machine Learning
• Information Science
• Visualization
• Other Disciplines

• Apart from these, a data mining system can also be classified based on the kind of
(a) databases mined
(b) knowledge mined
(c) techniques utilized
(d) applications adapted
Classification Based on the
Databases Mined
• We can classify a data mining system according to the kind of databases mined.
• Database system can be classified according to different criteria such as data
models, types of data, etc.
• The data mining system can be classified accordingly.

• For example, if we classify a database according to the data model, then we may
have a relational, transactional, object-relational, or data warehouse mining
system.
Classification Based on the kind of Knowledge Mined

• We can classify a data mining system according to the kind of knowledge mined.
• Data mining system is classified on the basis of functionalities such as −
• Characterization
• Discrimination
• Association and Correlation Analysis
• Classification
• Prediction
• Outlier Analysis
• Evolution Analysis
Classification Based on the Techniques Utilized
• We can classify a data mining system according to the kind of techniques used.
• We can describe these techniques according to the degree of user interaction involved or the
methods of analysis employed.
• Machine learning, visualization, pattern recognition, neural networks, database-oriented or data-
warehouse oriented techniques.

• Classification by User Interaction:

• Supervised Learning: Decision Trees, Support Vector Machines (SVMs)
• Unsupervised Learning: Clustering, Association Rule Learning

• Classification by Analysis Methods:

• Statistical Techniques: Linear Regression, Logistic Regression
• Machine Learning Techniques: Artificial Neural Networks (ANNs), Random Forests
Classification Based on the Applications Adapted

• We can classify a data mining system according to the applications adapted.

• Eg:
• Finance
• Telecommunications
• DNA
• Stock Markets
• E-mail

CST 466
No ratings yet
CST 466
24 pages
Business Analytics For Decision Making 3-6
No ratings yet
Business Analytics For Decision Making 3-6
31 pages
DM Unit-1
No ratings yet
DM Unit-1
14 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Data Mining for Business Analysts
No ratings yet
Data Mining for Business Analysts
21 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
25 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Data Mining Process Overview
100% (1)
Data Mining Process Overview
51 pages
HAJJATII
No ratings yet
HAJJATII
11 pages
OLAP Operations and Data Mining Techniques
No ratings yet
OLAP Operations and Data Mining Techniques
9 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
14 pages
KDD and Data Mining Concepts Explained
No ratings yet
KDD and Data Mining Concepts Explained
6 pages
Data Mining A Conceptual Overview
No ratings yet
Data Mining A Conceptual Overview
32 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
19 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
27 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Cube in Relational Databases
No ratings yet
Data Cube in Relational Databases
121 pages
Data Mining System Architecture Overview
No ratings yet
Data Mining System Architecture Overview
26 pages
DMC 1628 Data Warehousing and Data Mining
No ratings yet
DMC 1628 Data Warehousing and Data Mining
192 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
27 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Data Mining
No ratings yet
Data Mining
46 pages
DMDW Mid 1 Solution
No ratings yet
DMDW Mid 1 Solution
29 pages
Data Mining for Business Growth
No ratings yet
Data Mining for Business Growth
7 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining
No ratings yet
Data Mining
35 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Data Mining Lecture 1 Arabic
No ratings yet
Data Mining Lecture 1 Arabic
41 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
12 pages
Connecting The Dots To Make Sense of Data
No ratings yet
Connecting The Dots To Make Sense of Data
8 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
44 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Data Mining: Issues and Motivations
No ratings yet
Data Mining: Issues and Motivations
23 pages
Data Mining
No ratings yet
Data Mining
395 pages
Unit III
No ratings yet
Unit III
101 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
9 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
MR22-DM 1
No ratings yet
MR22-DM 1
21 pages
Data Mining: Tasks, Models, and Issues
No ratings yet
Data Mining: Tasks, Models, and Issues
19 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Data Warehouse and Mining Overview
No ratings yet
Data Warehouse and Mining Overview
3 pages
"Connecting The Dots To Make Sense of Data": Contents
No ratings yet
"Connecting The Dots To Make Sense of Data": Contents
14 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Past PPR
No ratings yet
Past PPR
31 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
Data Mining - Concepts and Techniques
No ratings yet
Data Mining - Concepts and Techniques
224 pages
Data Minng
No ratings yet
Data Minng
20 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Chap1 Introduction
No ratings yet
Chap1 Introduction
58 pages
DM Module1
No ratings yet
DM Module1
15 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
17 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Ec Appendix 20
No ratings yet
Ec Appendix 20
70 pages
C R Assgt Solns v5
No ratings yet
C R Assgt Solns v5
6 pages
Association Between Cannabis Use and Blood Pressure Levels According To Comorbidities and Socioeconomic Status
No ratings yet
Association Between Cannabis Use and Blood Pressure Levels According To Comorbidities and Socioeconomic Status
13 pages
Exp 2
No ratings yet
Exp 2
7 pages
Scikit-Learn Overview and Algorithms
100% (2)
Scikit-Learn Overview and Algorithms
12 pages
100 Python Interview Questions
No ratings yet
100 Python Interview Questions
60 pages
Load Allocation in Power Systems
No ratings yet
Load Allocation in Power Systems
18 pages
Pengaruh Insight Media Sosial Instagram Terhadap Penjualan Produk Online
No ratings yet
Pengaruh Insight Media Sosial Instagram Terhadap Penjualan Produk Online
15 pages
The Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work
No ratings yet
The Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work
10 pages
Econometrics II
No ratings yet
Econometrics II
15 pages
MBA - Digital Marketing
No ratings yet
MBA - Digital Marketing
20 pages
Quality Analysis of Liquid Soap with VCO
No ratings yet
Quality Analysis of Liquid Soap with VCO
5 pages
Financial Stability in Companies With High ESG Scores Evidence From North America Using The Ohlson OScoreSustainability Switzerland
No ratings yet
Financial Stability in Companies With High ESG Scores Evidence From North America Using The Ohlson OScoreSustainability Switzerland
13 pages
Applied Machine Learning Exam Guide
No ratings yet
Applied Machine Learning Exam Guide
1 page
Group 25 Capstone Project Proposal
No ratings yet
Group 25 Capstone Project Proposal
4 pages
TADESSE DE MBA Thesis
No ratings yet
TADESSE DE MBA Thesis
79 pages
Mathematics for Machine Learning Guide
No ratings yet
Mathematics for Machine Learning Guide
416 pages
INST627 Spring2018 Syllabus
No ratings yet
INST627 Spring2018 Syllabus
6 pages
Logistic Regression Model Analysis
No ratings yet
Logistic Regression Model Analysis
3 pages
The Effect of Promotion, Product Quality and Brand Image Towards Purchase Intention of Fiesta Chicken Meat in West Jakarta
No ratings yet
The Effect of Promotion, Product Quality and Brand Image Towards Purchase Intention of Fiesta Chicken Meat in West Jakarta
5 pages
Exercise 6
No ratings yet
Exercise 6
2 pages
Regression MCQuestions
No ratings yet
Regression MCQuestions
8 pages
Rainfall Prediction for India
No ratings yet
Rainfall Prediction for India
3 pages
(Ebook PDF) Business Analytics 4th Edition by Jeffrey D. Camm Download
100% (1)
(Ebook PDF) Business Analytics 4th Edition by Jeffrey D. Camm Download
53 pages
Bass - A New Product Growth For Model Consumer Durables - 1969
No ratings yet
Bass - A New Product Growth For Model Consumer Durables - 1969
14 pages
Scatter Diagrams Exam Questions
No ratings yet
Scatter Diagrams Exam Questions
21 pages
MODERATION
No ratings yet
MODERATION
2 pages
ACC2706 Cheatsheet
No ratings yet
ACC2706 Cheatsheet
10 pages
1.08 Hypothesis Testing
No ratings yet
1.08 Hypothesis Testing
7 pages
Samuel Abotowuro
No ratings yet
Samuel Abotowuro
84 pages

2-Concept Hierarchy To Classification of DMS

Uploaded by

2-Concept Hierarchy To Classification of DMS

Uploaded by

Concept Hierarchy

• A concept hierarchy defines a sequence of mappings from a set of

• Main idea behind –

• Can be used to organize different types of data,

• Useful in data warehousing, where data from

• Based on set theory

• The pivot operation is also called

Data cleaning, data integration, data selection,

Patterns, associations, or insights that can be

Domain expertise is important in KDD, as it

where X is a variable representing a customer.

Other Classification Models:

• Regression models continuous-valued functions.

“What makes a pattern interesting?

A pattern is also interesting if it validates a hypothesis that the user

• Confidence, which assesses the degree of certainty of the detected association.

• Refers to the completeness of a data mining algorithm.

• Classification by User Interaction:

• Classification by Analysis Methods:

• We can classify a data mining system according to the applications adapted.

You might also like