0% found this document useful (0 votes)

22 views22 pages

Unit-4 Transaction Processing

Uploaded by

G.Akshaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views22 pages

Unit-4 Transaction Processing

Uploaded by

G.Akshaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 4

TRANSACTION PROCESSING

Data Mining Tasks, OLAP and Multidimensional data analysis, Basic concept of
Association Analysis and Cluster Analysis - Transaction processing v/s Analytic
Processing- OLTP v/s OLAP- OLAP Operations - Data models for OLTP (ER model) and
OLAP (Star & Snowflake Schema)

Data Mining Tasks:

Data mining tasks are designed to be semi-automatic or fully automatic, targeting large datasets
to uncover patterns such as groups or clusters, anomalies, and dependencies like associations
and sequential patterns. Once these patterns are uncovered, they can serve as summaries of the
input data. Further analysis can then be conducted using machine learning and predictive
analytics techniques.
Data mining tasks are majorly categorized into two categories: descriptive and predictive.
1. Descriptive Data Mining: This involves extracting knowledge to understand what is
happening within the data without prior assumptions. It highlights common features
within the dataset, such as count, average, etc.
2. Predictive Data Mining: This type of data mining provides insights into future
outcomes by using historical data to make predictions about critical business metrics.
For example, it can predict the business volume for the next quarter based on
performance in previous quarters over several years, or determine if a patient is
suffering from a particular disease based on findings from their medical examinations.
Key Data Mining Tasks
1) Characterization and Discrimination
 Data Characterization: The characterization of data is a description of the general
characteristics of objects in a target class which creates what are called characteristic
rules.
A database query usually computes the data applicable to a user-specified class and runs
through a description component to retrieve the meaning of the data at various
abstraction levels.
Eg;-Bar maps, curves, and pie charts.
 Data Discrimination: Data discrimination creates a series of rules called discriminate
rules that is simply a distinction between the two classes aligned with the goal class and
the opposite class of the general characteristics of objects.
2) Prediction
To detect the inaccessible data, it uses regression analysis and detects the missing numeric
values in the data. If the class mark is absent, so classification is used to render the prediction.
Due to its relevance in business intelligence, the prediction is common. If the class mark is
absent, so the prediction is performed using classification. There are two methods of predicting
data. Due to its relevance in business intelligence, a prediction is common. The prediction of
the class mark using the previously developed class model and the prediction of incomplete or
incomplete data using prediction analysis are two ways of predicting data.
3) Classification
Classification is used to create data structures of predefined classes, as the model is used to
classify new instances whose classification is not understood. The instances used to produce
the model are known as data from preparation. A decision tree or set of classification rules is
based on such a form of classification process that can be collected to identify future details,
for example by classifying the possible compensation of the employee based on the
classification of salaries of related employees in the company.
4) Association Analysis
The link between the data and the rules that bind them is discovered. And two or more data
attributes are associated. It associates qualities that are transacted together regularly. They work
out what are called the rules of partnerships that are commonly used in the study of stock
baskets. To link the attributes, there are two elements. One is the trust that suggests the
possibility of both associated together, and another helps, which informs of associations' past
occurrence.
5) Outlier Analysis
Data components that cannot be clustered into a given class or cluster are outliers. They are
often referred to as anomalies or surprises and are also very important to remember. Although
in some contexts, outliers can be called noise and discarded, they can disclose useful
information in other areas, and hence can be very important and beneficial for their study.
6) Cluster Analysis
Clustering is the arrangement of data in groups. Unlike classification, however, class labels
are undefined in clustering and it is up to the clustering algorithm to find suitable classes.
Clustering is often called unsupervised classification since provided class labels do not execute
the classification. Many clustering methods are all based on the concept of maximizing the
similarity (intra-class similarity) between objects of the same class and decreasing the similarity
between objects in different classes (inter-class similarity).
7) Evolution & Deviation Analysis
We may uncover patterns and shifts in actions over time, with such distinct analysis, we can
find features such as time-series results, periodicity, and similarities in patterns. Many
technologies from space science to retail marketing can be found holistically in data processing
and features.
OLAP and Multidimensional Data Analysis:
OLAP (Online Analytical Processing): OLAP is a category of software tools that provides
analysis of data stored in a database. It enables users to perform complex queries and analyses
on large datasets quickly and interactively. OLAP systems are optimized for read-heavy
operations and are used primarily for data analysis and business intelligence. They support
operations such as slicing and dicing, drilling down into data, and rolling up data to provide
summaries.
Multidimensional Data Analysis: This involves analyzing data from multiple perspectives
and dimensions. A multidimensional data model is typically implemented using data cubes,
which allow data to be modeled and viewed in multiple dimensions. Each dimension represents
a different attribute of the data, such as time, geography, product lines, etc. Multidimensional
analysis enables users to uncover patterns, trends, and insights by exploring data across various
dimensions and hierarchies.
Key Operations in OLAP and Multidimensional Data Analysis:
1. Slicing: Extracting a subset of data by selecting a single dimension of the data cube,
resulting in a new sub-cube.
2. Dicing: Creating a sub-cube by selecting specific values for multiple dimensions.
3. Drill-Down: Navigating from summary data to more detailed data.
4. Roll-Up: Aggregating data by climbing up a hierarchy or reducing dimensions.
5. Pivoting: Re-orienting the multidimensional view of data to explore different
perspectives.
Slice:
A slice is a subset of the cubes corresponding to a single value for one or more members of the
dimension. For example, a slice operation is executed when the customer wants a selection on
one dimension of a three-dimensional cube resulting in a two-dimensional site. So, the Slice
operations perform a selection on one dimension of the given cube, thus resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the following cube:

The following diagram illustrates how Slice works.

Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".
It will form a new sub-cubes by selecting one or more dimensions.
Dice:
The dice operation describes a subcube by operating a selection on two or more dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature =
cool OR temperature = hot) to the original cubes we get the following subcube (still two-
dimensional)
Consider the following diagram, which shows the dice operations.

The dice operation on the cubes based on the following selection criteria involves three
dimensions.
o (location = "Toronto" or "Vancouver")
o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")
Drill-Down:
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-
down is like zooming-in on the data cube. It navigates from less detailed record to more
detailed data. Drill-down can be performed by either stepping down a concept hierarchy for a
dimension or adding additional dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a
concept hierarchy which is defined as day, month, quarter, and year. Drill-down appears by
descending the time hierarchy from the level of the quarter to a more detailed level of the month.
Because a drill-down adds more details to the given data, it can also be performed by adding a
new dimension to a cube. For example, a drill-down on the central cubes of the figure can occur
by introducing an additional dimension, such as a customer group.
Example
Drill-down adds more details to the given data. The following diagram illustrates how Drill-
down works.
Roll-Up:
The roll-up operation (also known as drill-up or aggregation operation) performs
aggregation on a data cube, by climbing down concept hierarchies, i.e., dimension reduction.
Roll-up is like zooming-out on the data cubes. Figure shows the result of roll-up operations
performed on the dimension location. The hierarchy for the location is defined as the Order
Street, city, province, or state, country. The roll-up operation aggregates the data by ascending
the location hierarchy from the level of the city to the level of the country. When a roll-up is
performed by dimensions’ reduction, one or more dimensions are removed from the cube.
Example: Consider the following cubes illustrating temperature of certain days recorded
weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1
Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature
from the above cubes. To do this, we have to group column and add up the value according to
the concept hierarchies. This operation is known as a roll-up.
By doing this, we contain the following cube:
The roll-up operation groups the information by levels of temperature.
The following diagram illustrates how roll-up works.

Pivot:
The pivot operation is also called a rotation. Pivot is a visualization operation which rotates the
data axes in view to provide an alternative presentation of the data. It may contain swapping
the rows and columns or moving one of the row-dimensions into the column dimensions.

Consider the following diagram, which shows the pivot operation.

Multidimensional model (MOALP):
The databases that are configured for OLAP use multidimensional data model, enabling
complex analysis and ad hoc queries at a rapid rate. The multidimensional data model is
analogous to relational database model with a variation of having multidimensional structures
for data organization and expressing relationships between the data. The data is stored in the
form of cubes and can be accessed within the confines of each cube. Mostly, data warehousing
supports two or three-dimensional cubes; however, there are more than three data dimensions
depicted by the cube referred to as Hybrid cube.
As per the formal definition, “Each cell within a multidimensional structure contains
aggregated data related to elements along each of the dimensions.” The multidimensional
analytical databases are helpful in providing data-related answers to complex business queries
quickly and accurately. Further, unlike other data models, OLAP in data warehousing enables
users to view data from different angles and dimensions, thereby presenting a broader analysis
for business purposes.
It has been observed that the OLAP cubes answers a query in 0.1% of the time consumed for
the similar query by an OLTP (Online Transaction Processing) relational database.
OLAP systems are mainly classified into three :
 MOLAP (Multi-dimensional OLAP)
 ROLAP (Relational OLAP) : works with relational databases
 HOLAP (Hybrid OLAP): database divides data between relational and specialized
storage

Basic operations of OLAP

Basic Concept of Association Analysis and Cluster Analysis

Association Analysis:
Association mining aims to extract interesting correlations, frequent patterns, associations or
casual structures among sets of items or objects in transaction databases, relational database or
other data repositories. Association rules are widely used in various areas such as
telecommunication networks, market and risk management, inventory control, cross-
marketing, catalog design, loss-leader analysis, clustering, classification, etc.
Examples:
Rule Form: Body->Head [Support, confidence]
Buys (X, “Computer”) -> Buys (X, “Software”) [40%, 50%]
Association Rule:
Basic Concepts:
Given: (1 database) of transaction, (2) each transaction is a list of items (purchased by a
customer in visit)
Find: all rules that correlate the presence of one set of items with that of another set of items. 
E.g., 98% of people who purchase tires and auto accessories also get automotive services
done.
 E.g., Market Basket Analysis  This process analyzes customer buying habits by finding
associations between the different items that customers place in their “Shopping Baskets”.
The discovery of such associations can help retailers develop marketing strategies by gaining
insight into which items are frequently purchased together by customer.
Applications:
Maintenance agreement (what the store should do to boost maintenance agreement sales) Home
Electronics (what other products should the store stocks up?)
Attached mailing in direct marketing
Association Rule:
An association rule is an implication expression of the form XY, where X and Y are disjoint
itemsets, i.e., X ∩ Y = ∅. The strength of an association rule can be measured in terms of its
support and confidence. Support determines how often a rule is applicable to a given data set,
while confidence determines how frequently items in Y appear in transactions that contain X.
The formal definition of these metrics are
Support, s(X->Y) = 𝜎(𝑋∪Y) 𝑁
Confidence, c(X->Y) =𝜎(𝑋∪Y) 𝜎(𝑋)
Cluster Analysis:
 The process of partitioning a set of data objects (or observations) into subsets
(clusterS).
 Similar objects in a same cluster,
 Objects in different clusters are supposed to be different.
Clustering is known as unsupervised learning because the class label information is not
present. For this reason, clustering is a form of learning by observation, rather than
learning by examples.
Requirements for Cluster Analysis:
1. Scalability (high)
Clustering on only a sample of a given large data set may lead to biased results
2. Ability to deal with different types of attributes
e.g. graphs, sequences, images, and documents.
3. Discovery of clusters with arbitrary shape :
a cluster could be of any shape.
4. Requirements for domain knowledge to determine input parameters
It’s hard to determine the parameter
5. Deal with noisy data (Outliers)
Need clustering methods that are robust to noise.
6. Incremental clustering: incremental update, avoid re-computing a new clustering
from scratch
insensitive to input order: the change of input order doesn’t change output
7. Capability of clustering high-dimensionality data
Finding clusters of data objects in a high- dimensional space is challenging,
especially considering that such data can be very sparse and highly skewed.
8. Constraint-based
9. Interpretability and usability
It is important to study how an application goal may influence the selection of
clustering features and clustering methods.
Clustering Methods
1. The partitioning criteria
e.g. hierarchical or not
2. Separation of clusters
e.g. clusters are mutually exclusive or not
3. Similarity measure
e.g.
o distance
 often take advantage of optimization techniques
 e.g. Euclidean space, road network, vector space,
o connectivity based on density or continuity
 can often find clusters of arbitrary shape
4. Clustering space
Search for clusters within the entire given data space or subspace. Subspace
clustering discovers clusters and subspaces (often of low dimensionality) that
manifest object similarity.
Transaction Processing V/S Analytic Processing
Transaction processing:
 Each transaction involves a relatively small amount of data
 There are inserts and updates to one or more tables
 The database should be normalized, ie, any piece of information should be in one
place only, with a very few exceptions
 There are often requirements for audit data: who created the transaction when
 The data typically requires validation checks before processing (valid customer,
product, account #, etc)
Analytical Processing
 Read-only, unless you need to build a temporary table, or populate a results table for
multiple reports
 Often large volumes of data
 Database may be denormalized for faster performance
 No validations required unless the source transaction system has been sloppy

Transactional processing and Analytical Processing

OLTP V/S OLAP:
OLAP stands for On-Line Analytical Processing. It is used for analysis of database information
from multiple database systems at one time such as sales analysis and forecasting, market
research, budgeting and etc. Data Warehouse is the example of OLAP system.
OLTP stands for On-Line Transactional processing. It is used for maintaining the online
transaction and record integrity in multiple access environments. OLTP is a system that
manages very large number of short online transactions for example, ATM.

Architecture of OLTP and OLAP

Sr. Key OLAP OLTP

No.

1 Basic It is used for data analysis It is used to manage very

large number of online short
transactions

2 Database It uses data warehouse It uses traditional DBMS

Type

3 Data It manages all insert, It is mainly used for data

Modification update and delete reading
transaction

4 Response Processing is little slow In Milliseconds

time

5 Normalization Tables in OLAP database Tables in OLTP database are

are not normalized. normalized.
OLAP Operations:
OLAP stands for Online Analytical Processing Server. It is a software technology that
allows users to analyze information from multiple database systems at the same time. It is
based on multidimensional data model and allows the user to query on multi-dimensional
data (eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one or more cubes
and these cubes are known as Hyper-cubes.

OLAP operations:
There are five basic analytical operations that can be performed on an OLAP cube:
1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
 Moving down in the concept hierarchy
 Adding a new dimension
In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
 Climbing up in the concept hierarchy
 Reducing the dimensions. In the cube given in the overview section, the roll-up operation
is performed by climbing up in the concept hierarchy of Location dimension (City->
Country).

3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In
the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
 Location = “Delhi” or “Kolkata”
 Time = “Q1” or “Q2”
 Item = “Car” or “Bus”

4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube
creation. In the cube given in the overview section, Slice is performed on the dimension
Time = “Q1”.

5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation, performing
pivot operation gives a new view of it.
Data Models for OLTP (ER MODEL):
ER Model is used to model the logical view of the system from data perspective which
consists of these components:

Entity, Entity Type, Entity Set –

An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course. An Entity is an object of Entity Type and set of all entities is called as
entity set. e.g.; E1 is an entity having Entity Type Student and set of all students is called
Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No, Name,
DOB, Age, Address, Mobile_No are the attributes which defines entity type Student. In ER
diagram, attribute is represented by an oval.

1. Key Attribute –

The attribute which uniquely identifies each entity in the entity set is called key
[Link] example, Roll_No will be unique for each student. In ER diagram, key
attribute is represented by an oval with underlying lines.

2. Composite Attribute –

An attribute composed of many other attribute is called as composite attribute. For

example, Address attribute of student Entity type consists of Street, City, State, and
Country. In ER diagram, composite attribute is represented by an oval comprising of
ovals.

3. Multivalued Attribute

An attribute consisting more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, multivalued
attribute is represented by double oval.

4. Derived Attribute

An attribute which can be derived from other attributes of the entity type is known
as derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, derived
attribute is represented by dashed oval.

The complete entity type Student with its attributes can be represented as:

Relationship Type and Relationship Set:

A relationship type represents the association between entity types.
For example, ‘Enrolled in’ is a relationship type that exists between entity type Student and
Course. In ER diagram, relationship type is represented by a diamond and connecting the
entities with lines.

A set of relationships of same type is known as relationship set. The following relationship
set depicts S1 is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.

Degree of a relationship set:

The number of different entity sets participating in a relationship set is called as degree of
a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called as
unary relationship. For example, one person is married to only one person.
2. Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as
binary [Link] example, Student is enrolled in Course.

3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One to one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one to one. Let us assume that a male can marry to one female
and a female can marry to one male. So the relationship will be one to one.

Using Sets, it can be represented as:

2. Many to one – When entities in one entity set can take part only once in the relationship
set and entities in other entity set can take part more than once in the relationship
set, cardinality is many to one. Let us assume that a student can take only one course but one
course can be taken by many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.

Using Sets, it can be represented as:

In this case, each student is taking only 1 course but 1 course has been taken by many
students.
3. Many to many – When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than
one course and one course can be taken by many students. So the relationship will be many
to many.

Using sets, it can be represented as:

In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3 and
S4. So it is many to many relationships.
Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the student, the participation of course
will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.

Using set, it can be represented as,

Every student in Student Entity set is participating in relationship but there exists a course
C4 which is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
As discussed before, an entity type has a key attribute which uniquely identifies each entity
in the entity set. But there exists some entity type for which key attribute can’t be
defined. These are called Weak Entity type.
For example, A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existence without the employee. So
Dependent will be weak entity type and Employee will be Identifying Entity type for
Dependent.
A weak entity type is represented by a double rectangle. The participation of weak entity type
is always total. The relationship between weak entity type and its identifying strong entity
type is called identifying relationship and it is represented by double diamond.

Component Description of ER Diagram

OLAP (STAR & SNOWFLAKE SCHEMA):
Star Schema vs. Snowflake Schema: The Main Difference
The two main elements of the dimensional model of the star and snowflake schema are:
1. Facts table. A table with the most considerable amount of data, also known as a cube.
2. Dimension tables. The derived data structure provides answers to ad hoc queries or
dimensions, often called lookup tables.
Connecting chosen dimensions on a facts table forms the schema. Both the star and snowflake
schemas make use of the dimensionality of data to model the storage system.
The main differences between the two schemas are:
Star Schema Snowflake Schema
Fact table Dimension tables Sub
Elements Fact table Dimension tables
dimension tables
Structure Star-shaped Snowflake shaped
Dimensions One table per dimension Multiple tables for each dimension
Model Direction Top-down Bottom-up
Storage space Uses more storage Uses less space
Normalization Denormalized dimension tables Normalized dimension tables
Query Fast, fewer JOINs needed because Slow, more JOINs required because
Performance of fewer foreign keys of more foreign keys
Query Complicated and more challenging
Simple and easier to understand
Complexity to understand
Data
High Low
Redundancy
Dimension tables with several rows, Dimension tables with multiple rows
Use case
typical with data marts found with data warehouses
Due to the complexity of the snowflake schema and the lower performances, the star schema
is the preferred option whenever possible. One typical way to get around the problems in the
snowflake schema is to decompose the dedicated storage into multiple smaller entities with a
star schema.
What Is a Star Schema?
A star schema is a logical structure for the development of data marts and simpler data
warehouses. The simple model consists of dimension tables connected to a facts table in the
center.

The facts table typically consists of:

 Quantifiable numerical data, such as values or counts.
 References to the dimensions through foreign keys.
The lookup tables represent descriptive information directly connected to the facts table.
For example, to model the sales of an ecommerce business, the facts table for purchases
might contain the total price of the purchase. On the other hand, dimensional tables have
descriptive information about the items, customer data, the time or location of purchase.

The star schema for the analysis of purchases in the example has four dimensions. The facts
table connects to the dimensional tables through the concept of foreign and primary keys. Apart
from the numerical data, the facts table therefore also consists of foreign keys to define relations
between tables.
Characteristics of a Star Schema
The main characteristics of the star schema are:
 Simplified and fast queries. Fewer JOIN operations due to denormalization make
information more readily available.
 Simple relationships. The schema works great with one-to-one or one-to-many
relationships.
 Singular dimensionality. One table describes each dimension.
 OLAP friendly. OLAP systems widely use star schema to design data cubes.
Drawbacks of a Star Schema
The disadvantages of using the star schema are:
 Redundancy. The dimensional tables are one-dimensional, and data redundancy is
present.
 Low integrity. Due to denormalization, updating information is a complex task.
 Limited queries. The set of questions is limited, which also narrows down the
analytical power.
What Is a Snowflake Schema?
The snowflake schema has a branched-out logical structure used in large data warehouses.
From the center to the edges, entity information goes from general to more [Link] from
the dimensional model's common elements, the snowflake schema further decomposes
dimensional tables into subdimensions.
The ecommerce sales analysis model from the previous example further branches
("snowflakes") into smaller categories and subcategories of interest.

The four dimensions decompose into sub dimensions. The lookup tables further normalize
through a series of connected objects.
Characteristics of a Snowflake Schema
The main features of the snowflake schema include:
 Small storage. The snowflake schema does not require as much storage space.
 High granularity. Dividing tables into subdimensions allows analysis at various depths
of interest. Adding new subdimensions is a simple process as well.
 Integrity. Due to normalization, the schema has a higher level of data integrity and low
redundancies.
Drawbacks of a Snowflake Schema
The weaknesses of the snowflake schema are:

 Complexity. The database model is complex, and so are the executed queries. Multiple
multidimensional tables make the design complicated to work with overall.
 Slow processing. Many lookup tables require multiple JOIN operations, which slows
down information retrieval.
 Hard to maintain. A high level of granularity makes the schema hard to manage and
maintain.
PART-A

[Link] out the two categories of data mining tasks.

2. Discuss on the clustering methods.

3. Define Entity.

[Link] out the characteristics of a star schema.

[Link] are the two main elements of the dimensional model in star and snowflake schema?

6. State the different types of cardinality.

7. List out the drawbacks of snowflake schema.

8. Discuss on Analytical processing.

9. Discuss on transactional processing.

[Link] the requirements for cluster analysis

PART-B

[Link] in detail on the various Data Mining Techniques.

[Link] in detail about the various OLTP operations.

3. Explain the concept of Association Analysis and clustering Analysis.

4. Differentiate between OLTP and OLAP.

[Link] in detail about the ER model with suitable diagrams.

[Link] in detail about the star and snowflake schema with suitable diagrams.

Unit 4
No ratings yet
Unit 4
27 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
OLAP Operations in The Multidimensional Data Model
No ratings yet
OLAP Operations in The Multidimensional Data Model
10 pages
Business Analytics For Decision Making 3-6
No ratings yet
Business Analytics For Decision Making 3-6
31 pages
MR22-DM 1
No ratings yet
MR22-DM 1
21 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
DM Module 1
No ratings yet
DM Module 1
13 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Understanding KDD in Data Mining
No ratings yet
Understanding KDD in Data Mining
20 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
19 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
12 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
27 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
3 pages
Data Mining
No ratings yet
Data Mining
14 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Data Mining & Agent Selection Guide
No ratings yet
Data Mining & Agent Selection Guide
8 pages
Module 1
No ratings yet
Module 1
41 pages
Section 06: Information-Centered Systems
No ratings yet
Section 06: Information-Centered Systems
21 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Lect 2
No ratings yet
Lect 2
35 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
14 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
15 pages
DM Unit-1
No ratings yet
DM Unit-1
14 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Unit 1 DM
No ratings yet
Unit 1 DM
62 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Mining and OLAP Insights
No ratings yet
Data Mining and OLAP Insights
4 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
6 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining Essentials Guide
No ratings yet
Data Mining Essentials Guide
23 pages
DM Unit 1 Part1
No ratings yet
DM Unit 1 Part1
46 pages
Comprehensive Data Mining Guide
No ratings yet
Comprehensive Data Mining Guide
52 pages
Data Mining Tasks and Techniques Overview
No ratings yet
Data Mining Tasks and Techniques Overview
10 pages
Data Mining: Knowledge Discovery Overview
No ratings yet
Data Mining: Knowledge Discovery Overview
6 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
DWH Unit 1
No ratings yet
DWH Unit 1
12 pages
DMT Unit1
No ratings yet
DMT Unit1
46 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
52 pages
Unit 2
No ratings yet
Unit 2
37 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
HAJJATII
No ratings yet
HAJJATII
11 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
9 pages
Data Mining
No ratings yet
Data Mining
6 pages
Current Trends
No ratings yet
Current Trends
35 pages
L1 - Overwiew of Data Mining
No ratings yet
L1 - Overwiew of Data Mining
10 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
21 pages
Data Mining Important
No ratings yet
Data Mining Important
15 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Unit-1 Overview of Business Analytics Word
No ratings yet
Unit-1 Overview of Business Analytics Word
19 pages
Unit-3 Introduction To Data Mining
No ratings yet
Unit-3 Introduction To Data Mining
26 pages
Unit - 4
No ratings yet
Unit - 4
120 pages
Unit-2 Types of Digital Data
No ratings yet
Unit-2 Types of Digital Data
41 pages
Unit 3 (CPS)
No ratings yet
Unit 3 (CPS)
5 pages
Scsa3002 May23
No ratings yet
Scsa3002 May23
2 pages
U1&2 2marks Cps
No ratings yet
U1&2 2marks Cps
5 pages
Unit-5 Business Performance Management
No ratings yet
Unit-5 Business Performance Management
22 pages
QE Sem QP
No ratings yet
QE Sem QP
2 pages
Types of Sensors Unit 1 and Unit 2
No ratings yet
Types of Sensors Unit 1 and Unit 2
5 pages
Unit-II Data Warehousing
No ratings yet
Unit-II Data Warehousing
98 pages
CPS Ex4
No ratings yet
CPS Ex4
3 pages
UNIT 1 Fundamentals of Networks Design
No ratings yet
UNIT 1 Fundamentals of Networks Design
55 pages
Use-Case Driven Object Analysis
No ratings yet
Use-Case Driven Object Analysis
53 pages
8086 Microprocessor Guide
No ratings yet
8086 Microprocessor Guide
126 pages
LOCAL AREA NETWORKS Unit 2
No ratings yet
LOCAL AREA NETWORKS Unit 2
52 pages
Scsa1401 - Ooase - Unit 4
No ratings yet
Scsa1401 - Ooase - Unit 4
51 pages
8051 Microcontroller Memory Structure
No ratings yet
8051 Microcontroller Memory Structure
33 pages
JESU DAA-Unit 1
No ratings yet
JESU DAA-Unit 1
106 pages
Scsa1301 Dbms Unit-3
No ratings yet
Scsa1301 Dbms Unit-3
59 pages
Kendriya Vidyalaya Sangathan, Lucknow Region
No ratings yet
Kendriya Vidyalaya Sangathan, Lucknow Region
9 pages
Unit 5-Data Transfer Instruction
No ratings yet
Unit 5-Data Transfer Instruction
34 pages
KVS Chandigarh Region Paper
No ratings yet
KVS Chandigarh Region Paper
9 pages
SCSA1104 Unit 1
No ratings yet
SCSA1104 Unit 1
25 pages
Professional Jewelry Making Manual
100% (4)
Professional Jewelry Making Manual
748 pages
Project Report
No ratings yet
Project Report
85 pages
5511545-0014 - 202505 - Cargo Sales Report
No ratings yet
5511545-0014 - 202505 - Cargo Sales Report
13 pages
San-el Wind Power Solutions
No ratings yet
San-el Wind Power Solutions
20 pages
Manual de Usuario - MAX T115+ - v.1
No ratings yet
Manual de Usuario - MAX T115+ - v.1
22 pages
SoloA5 Flyer
No ratings yet
SoloA5 Flyer
1 page
Assignment 3
No ratings yet
Assignment 3
4 pages
Solution Manual For Human Resource Management 3rd Edition Stewart and Brown 1118582802 9781118582800
No ratings yet
Solution Manual For Human Resource Management 3rd Edition Stewart and Brown 1118582802 9781118582800
20 pages
Clinical Development: Document Type: Abbreviated Clinical Study Report Development Phase: IV
No ratings yet
Clinical Development: Document Type: Abbreviated Clinical Study Report Development Phase: IV
26 pages
Manual Xentaur XPDM
No ratings yet
Manual Xentaur XPDM
26 pages
The Scalar Tensor Theory of Gravitation 1st Edition Yasunori Fujii Instant Access 2025
No ratings yet
The Scalar Tensor Theory of Gravitation 1st Edition Yasunori Fujii Instant Access 2025
119 pages
128832485
No ratings yet
128832485
18 pages
Understanding the Solar Wind Dynamics
No ratings yet
Understanding the Solar Wind Dynamics
3 pages
FGFGF
No ratings yet
FGFGF
17 pages
AP 2 Module 1
No ratings yet
AP 2 Module 1
31 pages
2024-2025 Es7 Ass Equilibrium
No ratings yet
2024-2025 Es7 Ass Equilibrium
3 pages
India's Resources & Agriculture
No ratings yet
India's Resources & Agriculture
11 pages
GDPR Compliance Steps for Organizations
No ratings yet
GDPR Compliance Steps for Organizations
14 pages
Literary Translation - Problems & Strategies
No ratings yet
Literary Translation - Problems & Strategies
51 pages
LIC
No ratings yet
LIC
33 pages
Rotor Dynamics Test for Mechanical Engineering
100% (1)
Rotor Dynamics Test for Mechanical Engineering
3 pages
Syllabus 2011 Revised Final Physics
No ratings yet
Syllabus 2011 Revised Final Physics
51 pages
Image Processing Course Intro
No ratings yet
Image Processing Course Intro
49 pages
Metro Wholesale Product Catalogue
No ratings yet
Metro Wholesale Product Catalogue
7 pages
Mohan Ram STS 1982 by HTTP
100% (2)
Mohan Ram STS 1982 by HTTP
7 pages
Embedded System & IOT Internship Summery
No ratings yet
Embedded System & IOT Internship Summery
28 pages
2024 Camp Program Overview
No ratings yet
2024 Camp Program Overview
11 pages
GSI Data For Rock Mass Classification
No ratings yet
GSI Data For Rock Mass Classification
18 pages
ST Olaves Grammar School 11 Stage 2 Entrance Test Sample
No ratings yet
ST Olaves Grammar School 11 Stage 2 Entrance Test Sample
18 pages
NTPC Equal Opportunity Policy Overview
No ratings yet
NTPC Equal Opportunity Policy Overview
14 pages

Unit-4 Transaction Processing

Uploaded by

Unit-4 Transaction Processing

Uploaded by

UNIT 4

Data Mining Tasks:

The following diagram illustrates how Slice works.

Temperature cool mild hot

Consider the following diagram, which shows the pivot operation.

Basic operations of OLAP

Basic Concept of Association Analysis and Cluster Analysis

Transactional processing and Analytical Processing

Architecture of OLTP and OLAP

Sr. Key OLAP OLTP

1 Basic It is used for data analysis It is used to manage very

2 Database It uses data warehouse It uses traditional DBMS

3 Data It manages all insert, It is mainly used for data

4 Response Processing is little slow In Milliseconds

5 Normalization Tables in OLAP database Tables in OLTP database are

Entity, Entity Type, Entity Set –

An attribute composed of many other attribute is called as composite attribute. For

Relationship Type and Relationship Set:

Degree of a relationship set:

Using Sets, it can be represented as:

Using Sets, it can be represented as:

Using sets, it can be represented as:

Using set, it can be represented as,

Component Description of ER Diagram

The facts table typically consists of:

[Link] out the two categories of data mining tasks.

2. Discuss on the clustering methods.

[Link] out the characteristics of a star schema.

6. State the different types of cardinality.

7. List out the drawbacks of snowflake schema.

8. Discuss on Analytical processing.

9. Discuss on transactional processing.

[Link] the requirements for cluster analysis

[Link] in detail on the various Data Mining Techniques.

[Link] in detail about the various OLTP operations.

3. Explain the concept of Association Analysis and clustering Analysis.

4. Differentiate between OLTP and OLAP.

[Link] in detail about the ER model with suitable diagrams.

You might also like