0% found this document useful (0 votes)

55 views11 pages

Data Cube Computation and Data Generalization: Lesson Introduction

This document discusses data cube computation and data generalization in the context of data warehousing and OLAP. It covers efficient methods for data cube computation, including multiway array cube computation and bottom-up computation, as well as the concept of attribute-oriented induction for data generalization. The document outlines the importance of these methods in enhancing performance and facilitating data analysis by summarizing data at various levels of abstraction.

Uploaded by

gihanisa.singersl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views11 pages

Data Cube Computation and Data Generalization: Lesson Introduction

Uploaded by

gihanisa.singersl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

8.

Data Cube Computation and Data Generalization

Lesson Introduction

Data warehousing and OLAP perform data generalization by summarizing data at varying levels of
abstraction. In this lesson, we will cover the efficient methods of data cube computation including
multiway array cube computation and bottom-up approach. Finally, another method of data
generalization, known as attribute-oriented induction.

Learning Outcomes:
After successful completion of this lesson, the students will be able to understand the different way of
efficient data cube computation, what methods can be used for attribute reduction and data
generalization.

Lesson Outline:
• Data cube computation.
• Attribute reduction.
• Data generalization.

8.1 Efficient Methods for Data Cube Computation.

Data cube computation is an essential task in data warehouse implementation. The precomputation of all
or part of a data cube can greatly reduce the response time and enhance the performance of on-line
analytical processing. However, such computation is challenging because it may require substantial
computational time and storage space.

8.1.1 Cube Materialization

Figure 8-1 shows a 3-D data cube for the dimensions A, B, and C, and an aggregate measure, M. A data
cube is a lattice of cuboids. Each cuboid represents a group-by. ABC is the base cuboid, containing all three
of the dimensions. Here, the aggregate measure, M, is computed for each possible combination of the
three dimensions. The base cuboid is the least generalized of all the cuboids in the data cube. The most
generalized cuboid is the apex cuboid, commonly represented as all. It contains one value. It aggregates
measure M for all the tuples stored in the base cuboid. To drill down in the data cube, we move from the
apex cuboid, downward in the lattice. To roll up, we move from the base cuboid, upward. For the purposes
of our discussion in this lesson, we will always use the term data cube to refer to a lattice of cuboids rather
than an individual cuboid.
A cell in the base cuboid is a base cell. A cell from a nonbase cuboid is an aggregate cell. An aggregate cell
aggregates over one or more dimensions, where each aggregated dimension is indicated by a “*” in the
cell notation. Suppose we have an n-dimensional data cube. Let a = (a1, a2, ….. , an, measures) be a cell
from one of the cuboids making up the data cube. We say that a is an m-dimensional cell (that is, from an
m-dimensional cuboid) if exactly m (m ≤ n) values among {a 1, a2 ,…… , an } are not “*”. If m = n, then a is a
base cell; otherwise, it is an aggregate cell (i.e., where m < n).

Figure 8-1: A 3-D data cube with the dimensions A, B, and C for some aggregate measure, M.

Base and aggregate cells: Consider a data cube with the dimensions month, city, and customer group, and
the measured price. (Jan, *, * , 2800) and (*, Toronto, *, 1200) are 1-D cells, (Jan, * , Business, 150) is a 2-
D cell, and (Jan, Toronto, Business, 45) is a 3-D cell. Here, all base cells are 3-D, whereas 1-D and 2-D cells
are aggregate cells. An ancestor-descendant relationship may exist between cells. In an n-dimensional
data cube, an i-D cell a = (a1, a2, …….. , an, measuresa ) is an ancestor of a j-D cell b = (b1, b2, ……., bn,
measuresb ), and b is a descendant of a, if and only if (1) i < j, and (2) for 1 < m < n, a m = bm whenever am ≠
“*”. In particular, cell a is called a parent of cell b, and b is a child of a, if and only if j = i+1 and b is a
descendant of a.

In order to ensure fast on-line analytical processing, it is sometimes desirable to precompute the full cube
(i.e., all the cells of all of the cuboids for a given data cube). This, however, is exponential to the number
of dimensions. That is, a data cube of n dimensions contains 2 n cuboids. There are even more cuboids if
we consider concept hierarchies for each dimension. In addition, the size of each cuboid depends on the
cardinality of its dimensions. Thus, precomputation of the full cube can require huge and often excessive
amounts of memory.

In many cases, a substantial amount of the cube’s space could be taken up by a large number of cells with
very low measure values. This is because the cube cells are often quite sparsely distributed within a
multiple dimensional space. For example, a customer may only buy a few items in a store at a time. Such
an event will generate only a few non-empty cells, leaving most other cube cells empty. In such situations,
it is useful to materialize only those cells in a cuboid (group-by) whose measure value is above some
minimum threshold. In a data cube for sales, say, we may wish to materialize only those cells for which
count ≥ 10 (i.e., where at least 10 tuples exist for the cell’s given a combination of dimensions), or only
those cells representing sales ≥ $100. This not only saves processing time and disk space, but also leads
to a more focused analysis. The cells that cannot pass the threshold are likely to be too trivial to warrant
further analysis. Such partially materialized cubes are known as iceberg cubes.
8.2 Strategies for Cube Computation

With different kinds of cubes as described above, we can expect that there are a good number of methods
for efficient computation. In general, there are two basic data structures used for storing cuboids.
Relational tables are used as the basic data structure for the implementation of relational OLAP (ROLAP),
while multidimensional arrays are used as the basic data structure in multidimensional OLAP (MOLAP). In
this subsection, we introduce several two methods for efficient cube computation that explore some or
all the above optimization strategies. They are multiway array aggregation (MultiWay) method and
Bottom-Up Computation (BUC) for computing full cubes.

8.2.1 Multiway array cube computation.

Consider a 3-D data array containing the three dimensions A, B, and C. The 3-D array is partitioned into
small, memory-based chunks. In this example, the array is partitioned into 64 chunks as shown in Figure
8-2. Dimension A is organized into four equal-sized partitions, a 0, a1, a2, and a3. Dimensions B and C are
similarly organized into four partitions each. Chunks 1, 2, . . . , 64 correspond to the subcubes a 0b0c0,
a1b0c0, . . . , a3b3c3, respectively. Suppose that the cardinality of the dimensions A, B, and C is 40, 400, and
4000, respectively. Thus, the size of the array for each dimension, A, B, and C, is also 40, 400, and 4000,
respectively. The size of each partition in A, B, and C is therefore 10, 100, and 1000, respectively. Full
materialization of the corresponding data cube involves the computation of all the cuboids defining this
cube.

Figure 8-2: A 3-D array for the dimensions A, B, and C, organized into 64 chunks .
The resulting full cube consists of the following cuboids:

 The base cuboid denoted by ABC (from which all of the other cuboids are directly or indirectly
computed). This cube is already computed and corresponds to the given 3-D array.
 The 2-D cuboids, AB, AC, and BC, which respectively correspond to the group-by’s AB, AC, and BC.
These cuboids must be computed.
 The 1-D cuboids, A, B, and C, which respectively correspond to the group-by’s A, B, and C. These
cuboids must be computed.
 The 0-D (apex) cuboid, denoted by all, which corresponds to the group-by (); that is, there are no
group-by here. This cuboid must be computed. It consists of one value. If, say, the data cube
measure is count, then the value to be computed is simply the total count of all of the tuples in
ABC.

Consider the following example for Multiway array aggregation

Table 8-1: Ordered Set Representation of a Data Cube

According to table 8-1. When we consider the first base cuboid {P1, Calgary, Vance}, the corresponding
cube will get the value= 2 as indicated in figure 8-3. When the {P1, Calgary, Vance} cell is filled at the same
time, the following cells will also get aggregated with value = 2.

2-D cuboids: {P1, Calgary, },{P1, , Vance},{*, Calgary, Vance}

1-D cuboids: {, , Vance}, {, Calgary, },{P1, , }

Apex cuboid: {, , *}

Same computation will be applied to all the rows given in table 8-1, and the final data cube is given in
figure 8-3.
Figure 8-3: Data Cube computation

8.2.2 Bottom-Up Computation (BUC)

Unlike Multiway array aggregation, BUC constructs the cube from the apex cuboid toward the base cuboid.
Figure 8-1 shows a lattice of cuboids, making up a 3-D data cube with the dimensions A, B, and C. The apex
(0-D) cuboid, representing the concept all (that is, (*, *, *)), is at the top of the lattice. This is the most
aggregated or generalized level. The 3-D base cuboid, ABC, is at the bottom of the lattice. It is the least
aggregated (most detailed or specialized) level. This representation of a lattice of cuboids, with the apex
at the top and the base at the bottom, is commonly accepted in data warehousing. It consolidates the
notions of drill-down (where we can move from a highly aggregated cell to lower, more detailed cells) and
roll-up (where we can move from detailed, low-level cells to a higher level, more aggregated cells). BUC
stands for “Bottom-Up Construction.” However, according to the lattice convention described above and
used throughout course, the order of processing of BUC is top-down! However, because we adopt the
application worldview where drill-down refers to drilling from the apex cuboid down toward the base
cuboid, the exploration process of BUC is regarded as top-down. BUC’s exploration for the computation
of a 3-D data cube is shown in Figure 8-4.
Figure 8-4: BUC exploration for the computation of a 3-D data cube

8.2.3 Star Attributes and Star Nodes

If the single dimensional aggregate on an attribute value p does not satisfy the iceberg condition, it is
useless to distinguish such nodes in the iceberg cube computation. Thus, the node p can be replaced by
“ * ” so that the cuboid tree can be further compressed. We say that the node p in an attribute A is a star-
node if the single dimensional aggregate on p does not satisfy the iceberg condition; otherwise, p is a non-
star-node. A cuboid tree that is compressed using star-nodes is called a star-tree.

A base cuboid table is shown in Table 4.1. There are 5 tuples and 4 dimensions. The cardinalities for
dimensions A, B, C, D are 2, 4, 4, 4, respectively. The one-dimensional aggregates for all attributes are
shown in Table 8-2.

Table 8-2: Base Cuboids before star reduction.

Suppose min sup = 2 in the iceberg condition. Only attribute values a1, a2, b1, c3, d4 satisfy the condition.
All the other values are below the threshold and thus become star-nodes. We can replace the attribute
value which does not satisfy with the minimum support by “ * “ and its shown in table 8-3. By collapsing
star-nodes, the reduced base table is Table 8-4. Notice that the table contains two fewer rows and also
fewer distinct values than Table 8-2. We use the reduced base table to construct the cuboid tree because
it is smaller. The resultant star-tree is shown in Figure 8-5. To help identify which nodes are star-nodes, a
star-table is constructed for each star-tree. Figure 8-5 also shows the corresponding star table for the star-
tree (where only the star-nodes are shown in the star-table).

Table 8-3: Base cuboids after replacement by “ * “.

Table 8-4: Base Cuboids after star reduction.

Figure 8-5: Star-tree and star-table.

8.3 Data Generalization: Attribute-Oriented Induction.

Data generalization summarizes data by replacing relatively low-level values (such as numeric values for
an attribute age) with higher-level concepts (such as young, middle-aged, and senior). Given a large
amount of data stored in databases, it is useful to be able to describe concepts in concise and succinct
terms at generalized (rather than low) levels of abstraction. Allowing datasets to be generalized at
multiple levels of abstraction facilitates users in examining the general behavior of the data.

The attribute-oriented induction (AOI) approach to concept description was first proposed in 1989, a few
years before the introduction of the data cube approach. The data cube approach is essentially based on
materialized views of the data, which typically have been precomputed in a data warehouse. In general,
it performs off-line aggregation before an OLAP or data mining query is submitted for processing. On the
other hand, the attribute-oriented induction approach is a query-oriented, generalization-based, on-line
data analysis technique.

The general idea of attribute-oriented induction is to first collect the task-relevant data using a database
query and then perform generalization based on the examination of the number of distinct values of each
attribute in the relevant set of data. The generalization is performed by either attribute removal or
attribute generalization. Aggregation is performed by merging identical generalized tuples and
accumulating their respective counts. This reduces the size of the generalized dataset. The resulting
generalized relation can be mapped into different forms for presentation to the user, such as charts or
rules.

Consider the following example for attribute-oriented induction

Suppose that a user would like to describe the general characteristics of graduate students in the Big
University database, given the attributes name, gender, major, birthplace, birth date, residence, phone#
(telephone number), and GPA (grade point average). A data mining query for this characterization can be
expressed in the data mining query language, DMQL, as follows:

use Big University DB

mine characteristics as “Science Students”

in relevance to name, gender, major, birth place, birth date, residence,

phone#, GPA

from student

where status in “graduate.”

The first step is initial relation: Data focusing should be performed before attribute-oriented induction.
This step corresponds to the specification of the task-relevant data (i.e., data for analysis). The data are
collected based on the information provided in the data mining query in given in table 8-5 .

Table 8-5: Initial relation

The second step is to prepare for generalization : Attribute removal is based on the following rule: If there
is a large set of distinct values for an attribute of the initial working relation, but either (1) there is no
generalization operator on the attribute (e.g., there is no concept hierarchy defined for the attribute), or
(2) its higher-level concepts are expressed in terms of other attributes, then the attribute should be
removed from the working relation.

Attribute generalization is based on the following rule: If there is a large set of distinct values for an
attribute in the initial working relation, and there exists a set of generalization operators on the attribute,
then a generalization operator should be selected and applied to the attribute. This rule is based on the
following reasoning.

For each attribute of the relation, the generalization proceeds as follows:

 Name: Since there are a large number of distinct values for name and there is no generalization
operation defined on it, this attribute is removed.
 Gender: Since there are only two distinct values for gender, this attribute is retained, and no
generalization is performed on it.
 Major: Suppose that a concept hierarchy has been defined that allows the attribute major to be
generalized to the values {arts & science, engineering, business}. Suppose also that the attribute
generalization threshold is set to 5, and that there are more than 20 distinct values for major in
the initial working relation. By attribute generalization and attribute generalization control, major
is therefore generalized by climbing the given concept hierarchy.
 Birthplace: This attribute has a large number of distinct values; therefore, we would like to
generalize it. Suppose that a concept hierarchy exists for the birthplace, defined as “city < province
or state < country”. If the number of distinct values for the country in the initial working relation
is greater than the attribute generalization threshold, then birthplace should be removed,
because even though a generalization operator exists for it, the generalization threshold would
not be satisfied. If instead, the number of distinct values for a country is less than the attribute
generalization threshold, then birth_place should be generalized to birth_country.
 birth_date: Suppose that a hierarchy exists that can generalize birth_date to age, and age to
age_range, and that the number of age ranges (or intervals) is small with respect to the attribute
generalization threshold. Generalization of birth_date should, therefore, take place.
 Residence: Suppose that residence is defined by the attributes number, street, residence city,
residence_province_or_state, and residence_country. The number of distinct values for number
and street will likely be very high since these concepts are quite a low level. The attributes number
and street should, therefore, be removed, so that residence is then generalized to residence_city,
which contains fewer distinct values.
 phone#: As with the attribute name above, this attribute contains too many distinct values and
should, therefore, be removed in generalization.
 GPA: Suppose that a concept hierarchy exists for GPA that groups values for grade point average
into numerical intervals like {3.75–4.0, 3.5–3.75,. . .}, which in turn are grouped into descriptive
values, such as {excellent, very good,. . .}. The attribute can therefore be generalized.

Table 8-6 shows how the attribute-oriented induction is performed on the initial working relation given
in table 8-5.

Table 8-6: Selecting attribute removal and attribute generalization

Step 3 derives the prime relation, P. This is performed by inserting generalized tuples into P. the result is
given in table 8-7.

Table 8-7: Prime relation

Finally, the presentation: User interaction and adjust levels by drilling, pivoting and mapping into rules.
The end user should get the control of how it should be visualized. Table 8-8 show one such output to the
end user.
Table 8-8: Visualization and Presentation

DWDM Unit 2 Part 2 by Jithender Tulasi
No ratings yet
DWDM Unit 2 Part 2 by Jithender Tulasi
63 pages
Lecture 4
No ratings yet
Lecture 4
2 pages
CSE Data Mining & Warehousing Notes
No ratings yet
CSE Data Mining & Warehousing Notes
30 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
DM Module 2
No ratings yet
DM Module 2
47 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit-4 Finalized
No ratings yet
Unit-4 Finalized
7 pages
Data Cube Computation and Cuboids
No ratings yet
Data Cube Computation and Cuboids
76 pages
DM and DW Notes-Module2
No ratings yet
DM and DW Notes-Module2
18 pages
P7 CubeTech
No ratings yet
P7 CubeTech
34 pages
Concepts and Techniques: - Chapter 5
No ratings yet
Concepts and Techniques: - Chapter 5
95 pages
Module 2
No ratings yet
Module 2
19 pages
BMW M-2
No ratings yet
BMW M-2
41 pages
Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Shell Cube
No ratings yet
Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Shell Cube
3 pages
Data Cube Computation
No ratings yet
Data Cube Computation
5 pages
Data Warehousing Implementation
No ratings yet
Data Warehousing Implementation
18 pages
Concepts and Techniques: - Chapter 5
No ratings yet
Concepts and Techniques: - Chapter 5
95 pages
Chapter 3 Topic - 4
No ratings yet
Chapter 3 Topic - 4
29 pages
Data Warehousing & Modeling: Module - 2
No ratings yet
Data Warehousing & Modeling: Module - 2
144 pages
Implementation: Data Warehouse
No ratings yet
Implementation: Data Warehouse
56 pages
Unit 4 - Data Cube Technology
No ratings yet
Unit 4 - Data Cube Technology
27 pages
Understanding Cube Shell in Data Mining
No ratings yet
Understanding Cube Shell in Data Mining
95 pages
Data Cube
No ratings yet
Data Cube
55 pages
Data Warehousing for ISE Students
No ratings yet
Data Warehousing for ISE Students
41 pages
1.6 Efficient Data Cube Computation & Indexing OLAP
No ratings yet
1.6 Efficient Data Cube Computation & Indexing OLAP
25 pages
05 Chapter
No ratings yet
05 Chapter
95 pages
Data Cube Technology
No ratings yet
Data Cube Technology
66 pages
Data Cube Insights for Analysts
No ratings yet
Data Cube Insights for Analysts
14 pages
Disjoint Sparse Table Analysis
86% (7)
Disjoint Sparse Table Analysis
7 pages
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
No ratings yet
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
4 pages
2.a Multidimensional Model Slides
No ratings yet
2.a Multidimensional Model Slides
27 pages
Data Reduction Techniques Guide
No ratings yet
Data Reduction Techniques Guide
39 pages
Data Ware House Concept 2019 (Compatibility Mode) PDF
No ratings yet
Data Ware House Concept 2019 (Compatibility Mode) PDF
25 pages
Understanding Data Cubes in Warehousing
No ratings yet
Understanding Data Cubes in Warehousing
17 pages
DMDW Co1 Session 5
No ratings yet
DMDW Co1 Session 5
39 pages
Data Warehousing Explained
No ratings yet
Data Warehousing Explained
21 pages
OLAP Operations in The Multidimensional Data Model
No ratings yet
OLAP Operations in The Multidimensional Data Model
1 page
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Unit1 Dwbi
No ratings yet
Unit1 Dwbi
59 pages
BUC Algorithm For Iceberg Cubes: Implementation and Sensitivity Analysis
No ratings yet
BUC Algorithm For Iceberg Cubes: Implementation and Sensitivity Analysis
5 pages
DW Seminar
No ratings yet
DW Seminar
13 pages
Data Warehousing & OLAP Insights
No ratings yet
Data Warehousing & OLAP Insights
53 pages
2 - Data Exploration
No ratings yet
2 - Data Exploration
8 pages
DM Unit 2
No ratings yet
DM Unit 2
12 pages
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
No ratings yet
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
28 pages
3-Data Warehouse Modeling - Data Cube and OLAP-18!12!2024
No ratings yet
3-Data Warehouse Modeling - Data Cube and OLAP-18!12!2024
25 pages
Understanding OLAP and Data Cubes
No ratings yet
Understanding OLAP and Data Cubes
42 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
132 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
29 pages
DMDW 1 2nd Module
No ratings yet
DMDW 1 2nd Module
29 pages
Alwajidi 2018
No ratings yet
Alwajidi 2018
5 pages
OLAP Operations in The Multidimensional Data Model
No ratings yet
OLAP Operations in The Multidimensional Data Model
10 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
12 pages
Data Cube Technology in Data Mining
No ratings yet
Data Cube Technology in Data Mining
4 pages
Difference Between Column-Stores and OLAP Data Cubes
No ratings yet
Difference Between Column-Stores and OLAP Data Cubes
3 pages
Data Cube for SQL Aggregation
No ratings yet
Data Cube for SQL Aggregation
16 pages
Data Cube Technology
No ratings yet
Data Cube Technology
20 pages
JSC Syllabuses Com Aug2016
No ratings yet
JSC Syllabuses Com Aug2016
54 pages
First Yr Syllabus 23 24
No ratings yet
First Yr Syllabus 23 24
39 pages
#1269-1 Workbook Gigo Microbit Compatible Robot
No ratings yet
#1269-1 Workbook Gigo Microbit Compatible Robot
128 pages
AI Chatbots: Enhancing Customer Experience
No ratings yet
AI Chatbots: Enhancing Customer Experience
13 pages
NETCENTER Network - Hard - Drive
No ratings yet
NETCENTER Network - Hard - Drive
63 pages
Serveriron Adx: Firewall Load Balancing Guide
No ratings yet
Serveriron Adx: Firewall Load Balancing Guide
189 pages
fp10sh Technical Manual
No ratings yet
fp10sh Technical Manual
36 pages
Junos Configuration Guide and Procedures
No ratings yet
Junos Configuration Guide and Procedures
53 pages
How To Query For Approval Comments Directly
No ratings yet
How To Query For Approval Comments Directly
14 pages
B.tech Artificial Intelligence Curriculum and Syllabus
No ratings yet
B.tech Artificial Intelligence Curriculum and Syllabus
282 pages
Laser Cutter Software Manual
No ratings yet
Laser Cutter Software Manual
5 pages
Sudhanshu Resume
No ratings yet
Sudhanshu Resume
1 page
DCS vs PLC: Structure and Benefits
No ratings yet
DCS vs PLC: Structure and Benefits
10 pages
Funds Management - Cehck List: Maintain Financial Management Area Assign Financial Management Area
No ratings yet
Funds Management - Cehck List: Maintain Financial Management Area Assign Financial Management Area
10 pages
NBI Documentation Update Principles
No ratings yet
NBI Documentation Update Principles
2 pages
8051 Microcontroller Assignment
No ratings yet
8051 Microcontroller Assignment
3 pages
Application Events For Developers
No ratings yet
Application Events For Developers
136 pages
Mobile Developer & CS Graduate Resume
No ratings yet
Mobile Developer & CS Graduate Resume
2 pages
Navneetsinghoff@
No ratings yet
Navneetsinghoff@
1 page
Hacking Roadmap
No ratings yet
Hacking Roadmap
4 pages
Developer's Diverse Tech Portfolio
No ratings yet
Developer's Diverse Tech Portfolio
3 pages
ADC Port CCS Compiler
No ratings yet
ADC Port CCS Compiler
5 pages
HKNC-500 Manual (1) - Compressed
No ratings yet
HKNC-500 Manual (1) - Compressed
93 pages
Internet and Its Functionality
No ratings yet
Internet and Its Functionality
9 pages
CLI Command rbs-6000 PDF
100% (1)
CLI Command rbs-6000 PDF
9 pages
End of Topic Quiz Topic 1.2 Memory and Storage
No ratings yet
End of Topic Quiz Topic 1.2 Memory and Storage
22 pages
Student ERP Overview for Educators
No ratings yet
Student ERP Overview for Educators
15 pages
Iot Seminar
No ratings yet
Iot Seminar
13 pages
DWR-730 - B1 - Manual - v1.00 (WW)
No ratings yet
DWR-730 - B1 - Manual - v1.00 (WW)
75 pages
Руководство Rocky 3 DEM
No ratings yet
Руководство Rocky 3 DEM
211 pages

Data Cube Computation and Data Generalization: Lesson Introduction

Uploaded by

Data Cube Computation and Data Generalization: Lesson Introduction

Uploaded by

8.

Data Cube Computation and Data Generalization

8.1 Efficient Methods for Data Cube Computation.

8.1.1 Cube Materialization

8.2.1 Multiway array cube computation.

Consider the following example for Multiway array aggregation

Table 8-1: Ordered Set Representation of a Data Cube

2-D cuboids: {P1, Calgary, *},{P1, *, Vance},{*, Calgary, Vance}

1-D cuboids: {*, *, Vance}, {*, Calgary, *},{P1, *, *}

Apex cuboid: {*, *, *}

8.2.2 Bottom-Up Computation (BUC)

8.2.3 Star Attributes and Star Nodes

Table 8-2: Base Cuboids before star reduction.

Table 8-3: Base cuboids after replacement by “ * “.

Table 8-4: Base Cuboids after star reduction.

Figure 8-5: Star-tree and star-table.

Consider the following example for attribute-oriented induction

use Big University DB

mine characteristics as “Science Students”

in relevance to name, gender, major, birth place, birth date, residence,

where status in “graduate.”

Table 8-5: Initial relation

For each attribute of the relation, the generalization proceeds as follows:

Table 8-6: Selecting attribute removal and attribute generalization

Table 8-7: Prime relation

You might also like

2-D cuboids: {P1, Calgary, },{P1, , Vance},{*, Calgary, Vance}

1-D cuboids: {, , Vance}, {, Calgary, },{P1, , }

Apex cuboid: {, , *}