0% found this document useful (0 votes)
57 views37 pages

An Automatic Method For The Design of Multidimensional Schemas From Object Oriented Databases

data warehousing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views37 pages

An Automatic Method For The Design of Multidimensional Schemas From Object Oriented Databases

data warehousing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

International Journal of Information Technology & Decision Making

Vol. 12, No. 6 (2013) 12231259


c World Scientic Publishing Company

DOI: 10.1142/S0219622013500351

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

AN AUTOMATIC METHOD FOR THE DESIGN


OF MULTIDIMENSIONAL SCHEMAS FROM
OBJECT ORIENTED DATABASES

YASSER HACHAICHI
Multimedia, InfoRmation Systems and Advanced
Computing Laboratory, University of Sfax
Higher Institute of Business Administration
BP 1013, Sfax, 3018, Tunisia
[email protected]
JAMEL FEKI
Multimedia, InfoRmation Systems and Advanced
Computing Laboratory, University of Sfax
Faculty of Economics and Management
BP 1018, Sfax, 3018, Tunisia
[email protected]

A data warehouse (DW) is a large data repository system designed for decision-making
purposes. Its design relies on a specic model called multidimensional. This multidimensional
model supports analyses of huge volumes of data that trace the enterprise's activities over
time. Several design methods were proposed to build multidimensional schemas from either
the relational data model or the entity-relationship data model. Almost all proposals that
treated the object-oriented data model assume the presence of the data source UML classdiagram. However, in practice, either such a diagram does not exist or is obsolete due to
multiple changes/evolutions of the information system. Furthermore, these few proposals
require an intense manual intervention of the designer, which requires a high expertise both in
the DW domain and in the object database domain. To overcome these disadvantages, this
work proposes an automatic DW schema design method starting from an object database
(schema and its instances). This method applies a set of extraction rules to identify multidimensional concepts and to generate star schemas. It is dened for the standard ODMG
model and, thus, can be adapted with slight changes for other object database models. In
addition, its extraction rules have the merit of being independent of the domain semantics.
Furthermore, they automatically generate schemas classied according to their analytical
potential; this classication helps the DW designer in selecting the most relevant schemas
among the generated ones. Finally, being automatic, our method is supported by a toolset that also prepares for the automatic generation of the Extract Transform and Load
procedures used to load the DW.
Keywords: Multidimensional model; data warehouse; automatic design; object-oriented
database.

1223

1224

Y. Hachaichi & J. Feki

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1. Introduction
Organizations generate huge volumes of data pertinent to various aspects of their
business processes, such as customers' management, suppliers' management and
procurement. In addition, they need to access and analyze this data to support either
their daily operations or business decisions.52 Thus, today's organizations have two
types of information systems: operational information systems and decision support
systems (DSS). The operational information system supports the execution of the
daily business operations. On the other hand, a DSS manages historical information
used to analyze the business performance over time in order to take appropriate
business decisions.
Data warehousing is a technology that intends to provide decision makers access to
various levels of information.52 A data warehouse (DW) provides an architectural
model for the ow of data from the operational information system to decision support
environments.16 It is periodically populated with data from operational information
systems, e.g., for equipment management, accounting, inventory and customer
management. Essentially, a DW collects all relevant data into one central system,
organizes data eciently so it is consistent and convenient for many purposes such as
retrieving/using or keeping old data for historical analyses.52 In addition, a DW is a
central data repository used to build and load data marts (DM); each DM contains an
extract of the DW and is oriented to a specic subject of decisional analyses.
On the other hand, given their functional dierences, designing a DW/DM
requires a methodology dierent from those commonly adopted for operational
information systems. In fact, current commercial software tools only assist the
administrators in the DW/DM structure specication and production of analytical
results; hence, they suppose that the DW/DM schemas are designed beforehand.
This shortage motivated the proposition of several design methods for DW/DM
schema.8,10,12,13,18,20,22,24,29,30,32,36,37,42,45,50,51,55,61 These methods dier in three
aspects: their approaches (top-down,11,37,45,51 bottom-up12,13,18,24,30,32,36,42,50,61 versus
mixed7,8,12,18,20,24,49,55) the data model they assume (relational,8,12,13,18,29,31,42,55
object-oriented22,50,61,62 versus XML32,36,60), and their degree of automation (automatic,13,18,20,29,30,32 semi-automatic24,36,42,45,50,55,61 versus manual11,12,22,37,51). The
current state-of-the-art shows that several DW/DM design methods have been
proposed for relational databases with various degree of automation. However, XML
and object-oriented databases still interest researchers. In this paper, we focus on
object-oriented databases (ODB) as they have better treated complex objects
increasingly in use in today's computer applications.9 In addition, as set forth by
Barry,3 \generally, an ODB is a good choice when you have all three factors: business
need, high performance, and complex data. Recently, people have also been considering an ODB even when their data is not particularly complex. An ODB can allow
for a smaller team and faster development because there is only one data model."
Examples of ODBMS (Object Data Base Management System) in use in industrial
applications are given in ODBMS FAQ44; for instance, British Airways uses the

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1225

Versant Object Database for its Origin and Destination Revenue Management
System.
To our knowledge, there are very few DW/DM schema design methodologies that
start from an ODB.22 Almost all methods rely on an object-oriented data source start
from a UML class diagram.50,61,62 However, in practice, the organization either does
not always have such a diagram or even when it has one it is often obsolete: not up to
date to reect the evolution/maintenance of the operational information system. In
addition, the few proposed methods for the design of DW/DM based on objects
require human intervention with high expertise in both domains OLAP (On-Line
Analytical Processing) and ODB. Furthermore, in the design process, these works do
not address some specicities of the object data sources, such as methods and
structured attributes.
In this paper, we propose a DM schema design method from ODB (schema and its
instances). This method is supported by a software engineering tool and is domain
independent because it relies on the structural properties of the data source independently of its semantics. It automatically applies a set of rules to extract, from the
ODB, multidimensional concepts (i.e., candidate facts, dimensions) and generates
star schemas. Moreover, it keeps track of the origin (e.g., object name, attribute
name, data type, length) of each multidimensional concept in the generated DM
schemas; this traceability helps the automatic generation of ETL (Extract Transform and Load) procedures to load the DM. Finally, to ensure its adaptability to
various ODBMS, we base our method on the standard object-model ODMG14
(Object Database Management Group); thus, any ODBMS compliant with this
standard can benet from our method for the design of DM schemas.
The remainder of this paper is organized as follows. Section 2 presents the ODMG
object model as a standard, and the multidimensional model. Section 3 puts our
method in its scope by overviewing DW/DM bottom-up design approaches based on
object-oriented data models. Section 4 describes our design method of DM from an
ODB. Section 5 overviews the CAME-BDO toolset which supports our DM schema
design method. The evaluation of CAME-BDO is discussed in Sec. 6. Finally, the
conclusion summarizes the presented work and outlines its perspectives.

2. Background and Terminology


To ensure a generic DM schema design method, we relied on the object data model
standardized by the ODMG, a consortium of the leading ODB vendors.14 In this
section, we present a conceptual overview for each of the ODMG object model and
the multidimensional model.
2.1. The ODMG object data model
We overview the object model supported by the ODBMS compliant to the last
release of the standard ODMG 3.0.14

1226

Y. Hachaichi & J. Feki

The object model species the kind of semantics that can be dened explicitly for
an ODBMS. Among other things, the semantics of the object model determines the
characteristics of objects, how objects can be related to one another, and how objects
can be named and identied. The object model species the constructs that are
supported by an ODBMS.58
The basic modeling concepts are the object and the literal. Each object has a
unique identier whereas a literal has no identier. In addition,

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Objects and literals can be categorized by their types. All elements of a given type
have a common range of states (i.e., the same set of properties) and a common
behavior (i.e., the same set of operations). An object is sometimes referred to as an
instance of its type.
. The state of an object is dened by the values it carries for the set of properties.
These properties can be attributes of the object itself or relationships with other
objects.
. The behavior of an object is dened by the set of operations that can be executed
on or by the object. Operations may have a list of input and output typed parameters and may return a typed result; and
. An ODBMS has an appropriate meta-model according to which instances of ODBs
are stored and managed.
The ODMG Object meta-model (Fig. 1) species what is meant by objects, literals, types, operations, properties, attributes, relationships, and so forth. It includes
signicantly richer semantics than the relational model does, by declaring relationships and operations explicitly. In addition, the ODMG standard is based on a
common object model and uses several aspects of OMG's object model. It supports
types (interface) and classes (implementation), encapsulation, inheritance and
polymorphism.

support

Class
Instantiate

key_list
extent_name *
super_class 1

extends

Operation
signature

has

invoke
return
return_abnormally

Property

Object

OID
has_name?
names
class
create
delete
exits
same_has?

Attribute
attr_name
attr_type
set_value
get_value

Traversal path
path_name
to_cardinality
to_type
traverse
creator_iterator
2

Relationship
define
1

Fig. 1. ODMG standard Meta-model: main concepts.14

An Automatic Method for the Design of Multidimensional Schemas from ODB

1227

We next shortly explain the basic concepts of the ODMG Model; for more details
the reader is referred to the Object Data Standard: ODMG 3.0.14

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

A Relationship is a property of an object. The ODMG Model supports only binary


relationships, i.e., relationships between two types (i.e., object or literal). A
relationship is dened explicitly by declaring their traversal paths which enable
applications to use logical connections between objects. Traversal paths are
declared in pairs, one for each traversal direction of the relationship.
. A Key uniquely identies an instance of a type. Simple and compound keys are
supported.
. Object IDentiers (OID) are unique within a storage domain. The value of an
object OID never changes over its lifetime and is never re-assigned.
For the inheritance of state and behavior, the ODMG object model denes the
extends relationship. The extends relationship applies only to object types; thus, only
classes (but not literals) may inherit state. It is a single inheritance relationship
between two classes whereby the subordinate class inherits all of the properties and
all of the behavior of the class that it extends.
In terms of typing, as illustrated in Fig. 2, the ODMG object model denes:
Collection Types are available to build complex objects. They include Set< t >,
Bag< t >, List< t > and Array< t > collections.
. The Atomic Literals supported are: Long, Short, Unsigned Long, Unsigned Short,
Float, Double, Boolean, Octet, Char, String and Enum.
. The Structured Literals supported are: Date, Time, Timestamp and Interval.
.

Figure 3 depicts an example of ODB modeling the `Estimate management for an


enterprise'. In this example, object classes are shown as rectangles; operations are
distinguished from attributes by the symbol ; attributes and operations are linked

Type
Object

Atomic Obj.

Collection Obj.

Set <>
Bag <>
List <>
Array <>
Dictionary<>

Literal

Structured Obj.

Date
Time
Timestamp
Interval

Atomic Lit.

Long
Short
Ulong
Ushort
Float
Double
Character
Boolean
string
octet
enum<>

Collection Lit.

Structured Lit.

Set <>
Bag <>
List <>
Array <>
Dictionary<>

Structure<>
Date
Time
Timestamp
Interval

Fig. 2. The ODMG full set of Built-in types.

1228

Y. Hachaichi & J. Feki

M_id (K, N)
Name (S)
Description (S)

Material

Client

C_id (K,N)
Name (S)
First_Name (S)
struct Address {number (N), street (S), city_name (S)}

requests
used_by
P_id (K, N)
Name (S)
Description (S)

Article

belongs_to

Model

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

concern
S_id (K, N)
Description (S)
Set_Model Price (void)

Machine
executes

MO_id (K,S)
Name (S)
Price (N)

estimated_for

designed_for
needs

used_for

study

Elaborated_for

requires
uses

Estimate
detail

contains

related_to
Quantities (N)
Price (N)
Total_detail (N)
Total_detail_TI (N)

MC_id (K, S)
Name (S)
State (B)
Hour_cost (N)
Purchase_date(D)

E_id (K,N)
E_date (D)
Total_price (N)
Total_price_TI (N)

Estimate

associated_to

Task

P_id (K,S)
Name
Affectaion_date (D)
Wage(N)
set Struct qualification {Name (S),
specialty (S), date (D)}
carries out

O_id (K,N)
Description(S)

Personnel

carried out_by

Automatic

A_Duration(I)
A_cost (N)

Manual

Semi-automatic

SA_Duration(I)
SA_cost (N)

M_Duration(I)
MO_cost (N)

executed_by

Fig. 3. An ODB example for Estimate Management.

to their object by horizontal lines; attribute and operation types are specied
between parentheses \(N)umeric, (S)tring, (B)oolean, (D)ate, (I)nterval"; key
attributes are tagged with K; relationships are drawn by arrowed lines between
objects as follows:
: one-to-one,
: one-to-many,
: many-tomany and
: extends relationship.
In this example, an Estimate is elaborated for a Client and contains Estimate
details. Each Estimate detail concerns one Model, needs a Study to dene associated
Materials, Articles and Tasks. A Task can be Automatic, and therefore is executed by
Machine, Manual i.e., carried out by Personnel, or Semi-automatic requiring both
Automatic and Manual interventions. We will refer to this example to illustrate our
DM design method.
2.2. The multidimensional model
A conceptual schema is used to dene logical schemas supported by a class of software systems (e.g., relational DBMS). Then, a logical schema is translated into a
physical one, supported by a specic software system (e.g., Oracle).34
In the data warehousing context, it was early realized that traditional conceptual
models for database modeling, such as the E/R model,4 do not provide a suitable means
to describe the fundamental aspects of such applications. The crucial point is that, in
designing a DW, there is the need to represent explicitly certain important characteristics of the information contained therein. These characteristics are not related to
the abstract representation of real-world concepts, but rather to the nal goal of the
data warehouse: supporting data analyses oriented to decision making.57
In the last few years, multidimensional modeling has attracted the attention
of several researchers who dened various solutions each focusing on the set of
information they considered strictly relevant. Some of these solutions have not2,46 or

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1229

Fig. 4. A simplied Meta-model for multidimensional schemas.18

have limited12 graphical support, and aim at establishing a formal foundation for
representing cubes, hierarchies and an algebra for querying them. On the other hand,
we believe that a distinguishing feature of DW conceptual models is that of providing
a graphical support to be easily understood by both designers and decision makers
when discussing requirement and validating conceptual solutions. Hence, we opted to
use the Dimensional Fact Model (DFM) which is a graphical conceptual model,
specically devised for multidimensional design. DFM was rst proposed in 1998 by
Golfarelli et al.25 and has been continuously enriched and rened in order to suit
optimally the variety of modeling situations that may be encountered in real projects
from small to large complexity.27
A conceptual design according to the DFM consists of a set of fact schemas
(thereafter multidimensional schemas) where the basic concepts are facts, measures,
dimensions and hierarchies.27 Figure 4 is a meta-model grouping these concepts.
A formal denition of these concepts can be found in the DFM.26 Here, we
informally present these concepts through the example shown in Fig. 5 which depicts
an introductory multidimensional schema according to the DFM model. This schema
allows decision makers to analyze the Sales fact.

year
month
dateID
Hierarchy of store

country state

Dimensions

STORE
STORE
sname
Dimensional Attributes

brand

quantity
quantity
unitPrice
unitPrice

Non-Dimensional Attribute

managerG

Fact

DATE
DATE
SALES
SALES

city storeID

week

productID type
PRODUCT
PRODUCT

pname
customerID

CUSTOMER
CUSTOMER

cname
phone

Measures

Fig. 5. A basic multidimensional model for the SALES fact.

markGroup
category

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1230

Y. Hachaichi & J. Feki

A fact is a focus of interest for the decision-making process; typically, it models a set
of events occurring in the organization and/or its environment. A fact is graphically
represented by a box divided into two compartments, the upper compartment is for
the fact name and the bottom one is for the fact measures. Examples of facts in
the commercial domain are sales, shipments, purchases; others in the nancial domain
are stock exchange transactions, contracts for insurance policies, etc.
A measure is a numerical property of a fact; it describes a quantitative aspect of
interest for decisional analyses. For instance, the fact Sales has two measures:
quantity and unitPrice. Measures should be numerical because they are used for
computing aggregated values using aggregate functions (e.g., Sum, Average, Count).
Rarely, a fact may have no measures, this happens when the only interesting thing to
be recorded is the occurrence of events; in this case the fact is said to be empty and is
typically queried to count the events that occurred.
A dimension is directly linked to a fact considered as an association linking
dimensions. Dimensions of a fact set the context of recording its measures and,
therefore represent the fact analysis coordinates. Graphically, dimensions are represented as rectangles attached to the fact by straight lines. Typical dimensions for
the Sales fact (Fig. 5) are Product, Customer, Store and Date. Usually one of the fact
dimensions represents the time which is necessary to extract time series from the DW
data. In addition, note that, in the multidimensional schema of Fig. 5, each measure
depends on all dimensions; i.e., concerns one product, sold to one customer, delivered
from one store at a given date.
Aggregation is the basic OLAP operation since it produces summarized information from large amounts of detailed data. An aggregation is carried out on
measures thanks to dimensions. For instance, we can compute the total amount of
sales (sum of quantity * unit price) by Product (or even by any combination of
dimensions). In addition, this total can be obtained at dierent levels of details (by
Product type or category) thanks to the denition of dimensional attributes organized
into hierarchies.
A dimensional attribute (also called parameter) is a property of a dimension. It is
graphically represented by a circle. Relationships between dimensional attributes are
expressed by hierarchies. A hierarchy of a dimension d is a directed graph, rooted
from the identier of d, whose all nodes are dimensional attributes of d and, whose
arcs model many-to-one associations between pairs of dimensional attributes. Hierarchies determine how primary events can be aggregated into secondary events and
selected signicantly for the decision-making process. In Fig. 5, the hierarchies of the
Product dimension enable us to aggregate measures by type and markGroup or by
type and category.
Note that each dimensional attribute may functionally determine some nondimensional attributes; these latter are called descriptive (or weak) attributes. They
are linked to their corresponding dimensional attributes by a line. For example, in
Fig. 5, the dimensional attribute StoreID has one descriptive attribute called sname
(for store name).

An Automatic Method for the Design of Multidimensional Schemas from ODB

1231

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

3. Related Works
In the literature, DW/DM development approaches are classied into three categories: (1) Data-driven approaches12,13,18,24,29,30,32,36,42,50,61 which rely on the
analysis of the corporate data model of the OLTP (On-Line Transaction Processing)
system and its relevant transactions; (2) User-driven approaches11,37,45,51 which start
from a set of analytical requirements dened by the decision makers of the future
(DSS); and (3) Mixed approaches7,8,12,18,20,24,49,55 which combine data-driven and
user-driven approaches in order to prot of their oered advantages.
User-driven (aka. top-down) approaches presume that users have enough expertise in expressing their analytical requirements in order to design schemas loadable
from the organization OLTP system. Mixed approaches are advisable when the data
source model (i.e., logical schema) is well known and has substantial size and complexity.28 Finally, data-driven (aka. bottom-up) approaches benet from two major
advantages: rst, they help decision makers since they oer potential multidimensional schemas built on the source data model and, second they guarantee that
the organization's OLTP system can feed the user-selected schemas with the needed
data. In data-driven approaches, user-requirements elicitation is voluntary neglected. In fact, Inmon35 argued that user-requirements are the last thing to be considered in a DSS development since they are well understood after the DW is
populated with data and query-results are analyzed by decision makers. Considering
these advantages, we elected to propose a DM design method within the data-driven
approaches category.
Current DW/DM design methods consider either E/R,8,12,42,55 relational,18,29,31
XML32,36,60 or object models.22,50,61,62 Since this paper treats ODB, in this section we
restrict our review to works that consider object models.
From our point of view, the research combining the DW and object paradigm
elds follows three main lines.
The rst research line applies the object paradigm for the multidimensional
modeling of the DW schema. Works of this line propose multidimensional conceptual
object-oriented models based on UML (Unied Modeling Language)1,17,23,39,43,48;
some of them focus their model on a specic area of the DW such as temporal
aspects47 security or access control,21 requirements11,56 and association rules mining
models.63
The second research line addresses the problem of how to build a DW from an
object model. More precisely, it aims at the design of multidimensional conceptual
models from object data sources.22,50,61,62
In the third research line, we gather works that implement a DW by means of an
object or object-relational language/databases.15,38,53,59 They mainly address the
ecient acquisition, storage, query, change control, and schema integration of data
into an object/object-relational DW.
Since, our work aims to design multidimensional conceptual models from object
data sources therefore it ts into to the second research line. Due to space limitation,

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1232

Y. Hachaichi & J. Feki

in the remainder of this section, we limit ourselves to the context of the second
research line where we review the main referred works.
In this context, Gandhi and Jain22 propose an approach called object-oriented
methodology for DW design. It is a two-step approach:
(1) Transformation of an object model into a relational schema; the authors use
three trivial transformation rules: (i) an object class maps to a table, (ii) an association maps to a table, (iii) a generalization maps to a super class plus a series of sub
class tables. They also supply manually details that are missing from an object
model, such as primary key. (2) Transformation of the obtained relational schema
into a DW schema; to do so, the authors dene ve transformations to be applied
manually; nevertheless, they do not precise in which case each transformation should
be applied.
On their side, Prat et al.50 propose a conceptual design phase that starts with the
denition of a UML class diagram representing the decision makers' initial requirements. This denition uses no multidimensional concepts, thereby enabling maximal
reuse of traditional methods commonly used for OLTP systems engineering. The
designed UML model is then enriched/transformed in order to facilitate subsequent
mapping into a logical multidimensional schema. To do so, the authors dene four
transformations applied on the UML conceptual model: (1) determination of identifying attributes for the classes; (2) manual determination of attributes representing
measures to distinguish them from qualitative attributes (i.e., descriptive); (3)
migration of attributes of 1-1 and N -1 associations into one of the participating
classes; and nally, (4) transformation of the generalizations into aggregations to
enable their automatic mapping into multidimensional hierarchies. After these four
transformations, the logical multidimensional schema is generated, from the enriched
UML conceptual model, through the semi-automatic transformations detailed in
Prat et al.50 \These transformations are semi-automated, more specically in the
logical phases human interaction is required to validate the step-by-step application of
the transformations or to provide information".50 The result of this step is a UML
class diagram extended with the multidimensional concepts. This diagram can be
mapped semi-automatically into a physical multidimensional schema.
Also, Zribi et al.61 proposed a method for the construction of DM schemas,
starting from a UML class diagram, in ve semi-automatic steps: (1) identication of
transaction entities on which they build facts; (2) construction of decisional UMLpackages each containing a transaction entity and its associated classes (directly or
indirectly linked to the transaction entity); (3) graphical annotation of packages to
identify multidimensional concepts resulting from the previous stage 

 the annotation uses a set of stereotypes to represent the multidimensional concepts; (4)
validation of the annotation by the decisional designer; and (5) automatic generation
of a DM star schema modeled according to the DFM formalism,25 from each annotated package.
On the other hand, Zepeda et al.62 proposed a DW design method based on the
UML class diagram and user requirements. This method is divided into two phases.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1233

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

The rst phase starts with facts identication; its goal is to identify the entities that
are candidates to become facts. Once facts are identied, a recursive algorithm is
applied to nd their dimensions. This algorithm accepts the UML class diagram with
the set of candidate facts, and then produces a snowake schema for each candidate
fact. The second phase gets a set of metrics computed on the bases of user requirements. These metrics help selecting snowake schemas from the candidate ones
previously generated.
In summary, we notice the six following shortages in the so far proposed DW/DM
design methods:
(1) Most DW/DM design methods relying on an object data-source start from a
UML class diagram. However, in practice, such diagram either does not exist
within an organization, or it may be obsolete so that it does not reect the
modications due to the evolution of the OLTP system.
(2) The method starting the design from an ODB22 goes through one unnecessarily
intermediate modeling level and, manually performs the transformations.
(3) Current methods consider that the built candidate DMs are equally important
whereas some of them may be insignicant for the decisional process.
(4) Except for the method of Prat et al.,50 the proposed methods try to represent the
main DM properties at a conceptual level by abstracting away details of an
envisaged implementation platform. \Unfortunately, none of these approaches
denes a set of formal transformations in order to: (i) univocally and automatically derive every possible logical representation from the conceptual model;
or (ii) help the designer to obtain the most suitable logical representation of the
developed conceptual model ".41
(5) The existing methods for the design of DW/DM based on object sources require a
manual human intervention. However, they necessitate a high expertise both in
OLAP domain and in the ODB domain.
(6) In the DW/DM design process, current works neglect some specics of the object
data-sources, like complex attributes (i.e., collections, structures) and method
denitions.
In this paper, we propose a DM design method that overcomes these limits.
To reach this objective, rst our method relies on the recent version of the object
data source that we directly extract from the ODBMS repository; this latter contains
the OBD meta-data (i.e., structures of objects, methods). Second, it automates
the main design steps and assists the designer in the choice of relevant multidimensional concepts among those extracted; this assistance is ensured by assigning
to each concept a relevance level reecting its analytical potential for the decision
making. Third, it keeps track of the origin of each component in the generated DM
schema. This traceability is fundamental both to automatically derive logical representations and to prepare the generation of ETL (Extract Transform and Load)
procedures.

1234

Y. Hachaichi & J. Feki

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

4. Multidimensional Models from ODB


Our bottom-up DM design approach starts from an ODB compliant to the ODMG
model. It is composed of three steps namely ODB schema retrieval, multidimensional
concepts identication and DM construction, and DM schemas display and adjustment. Among these steps only the third is manually conducted by decision makers/
designers where they adapt the automatically constructed DM schemas to their
particular OLAP analytical requirements.
The ODB schema retrieval step extracts the ODB schema (e.g., object names,
attribute names and types, operation names and types, key constraints) from the
ODBMS repository, and constructs the set of objects strongly linked by functional
association. The multidimensional concepts identication and DM construction step
extracts the multidimensional concepts (facts and their measures, dimensions and
their attributes organized into hierarchies) from the ODB schema. It produces
potential DM schemas. To automate this step, we dened for each multidimensional
concept, a set of extraction rules. Our rules are domain independent because they rely
on the structural properties of the data source independently of their semantics. In
addition, they have the merit of keeping track of the origin of each multidimensional
concept in the generated DM schemas. This traceability is fundamental during the
denition of ETL procedures. In the third step, DM schemas display and adjustment,
decision makers/designers are presented with a set of potential DM schemas that they
can adapt to meet their particular analytical needs. During this adaptation, they can
add derived measures, remove and/or rename DM schema components (i.e., dimension, attribute, hierarchy) using a set of operations.32 The application of these
adaptation operations is constrained to ensure that the resulting schemas are syntactically well formed.6,54 As examples of simple constraints, we cite the following: a
fact must be linked to at least two dimensions; a dimension must have at least one
hierarchy; all hierarchies of a dimension d must start from the identier of d.
In the remainder of this section, we focus on the second step and present our rules
for the identication of multidimensional concepts from an ODB. For this, we adopt
the following notation:
.

Given a relationship R between two objects O 1 and O 2 with multiplicity (m1, M 1)


on the side of O 1 and (m2, M 2) on the side of O 2 , the Max(O 1 ; O 2 ) function
returns the two maximum multiplicities (M 1, M 2).
. The transitive closure of an object O, noted O , denotes the set containing O and
all objects O 0 directly or transitively linked to O by relationships with Max
(O; O 0 1; 1. Note that if an object O 0 belongs to O then, O 0 O ; this
avoids the computation of O 0 and therefore optimizes the approach. In our
running example (Fig. 3), Model fModel; Studyg and Model Study .
In our work, we consider that the set O can be semantically seen as a single
complex real-world object; indeed, all objects of O are linked by strong functional
dependencies.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1235

Note that, in the remaining of this paper, ODB, fact, dimension and hierarchy
denote their schemas; and objects indicate object classes. When we refer to instances
of these concepts, we will explicit this.
Now we detail the identication step; it starts with facts identication and then
continues with measures, dimensions and hierarchies identication.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

4.1. Fact identication


A fact is composed of numerical factual data on which analyses are focused. In an
ODB, we identify facts either from relationships (rule RF1) or from objects (rules
RF2, RF2.1 and RF2.2).
In the data warehousing eld, many-to-many relationships are commonly used to
build facts.20,24,37,42,50,55,61
RF1. Each relationship between two objects O 1 and O 2 such that Max(O 1 ; O 2
N ; M with N > 1 and M > 1is identied as a fact.
We conventionally name such a fact F-O1gen-R-O2gen where R is the name of the
relationship transversal path (Fig. 1) and O 1 gen (respectively O 2 gen) is the concatenation of the names of O 1 (respectively O 2 ) and all its generalizing objects if any.
Note also that because a relationship between two objects in an ODB does not
contain attributes, then rule RF1 generates empty facts, i.e., facts without measures
(cf. Sec. 2.2).
The application of rule RF1 on the example of Fig. 3 nds out that Max(Automatic, Machine) satises its condition and produces the fact F-AutomaticTaskexecuted by-Machine, called according to the above naming convention. Similarly,
the fact F-ManualTask-carried out by-Personnel is produced.
As previously mentioned, we also extract facts from objects. To do so, we rst
compute the transitive closure of each object. Indeed, the set of objects forming a
transitive closure may be semantically seen as a single object of the real world, since a
strong functional dependency exists between them.
RF2. Each O containing an object with a nonkey numeric attribute or with an
operation returning numeric value(s) is identied as a fact.
The name of such a fact is the concatenation of \F-" with the names of the objects
belonging to O :
We note that:
.

The term numeric attribute covers ODB attributes that are atomic, collection or
belonging to a structured attribute (Fig. 2).
. Given a fact F built on O , each object in O is useful to extract measures and
dimensions for F. Thus, considering all the objects in O provides for building
facts covering a large variety of analyses.
. If an object does not have a key attribute, then the designer can intervene to select
one. Such an optional intervention improves the result of our identication method.

1236

Y. Hachaichi & J. Feki

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Applying rule RF2 on our running example, rst we identify from Model
fModel; Studyg the fact F-Study-Model since the object Model contains the nonkey
numeric attribute Price. In addition, we identify seven other facts: F-estimate detail,
F-estimate, F-Automatic, F-Manual, F-semiautomatic, F-Machine and F-Personnel.
Note that rule RF2 can produce facts built either on specialized or generalized
objects. We next examine whether it is better to group each such fact with its
generalizing/specialized objects into a single fact. For this, we need to examine the
instances of these objects according to the next two sub-rules.
RF2.1. If S is a specialized object of n generalized objects G i n  1) and S produced
a fact F-S (via rule RF2), then replace the fact F-S with a new fact built on
S
S [ ni1 G i .
The name of the new fact is F-S concatenated with all its generalized objects.
The main intuition behind this rule is that a specialized object cannot exist
without its generalizing objects: there is a strong functional dependency from the
specialized object to its generalizing object. Hence, by grouping the specialized object
with its generalizing objects, we construct a richer fact oering more analysis
potentials. As illustrated in Fig. 6, every instance of the specialized object (SO) has
to be necessarily linked to an instance of its generalizing object (GO).
For our example, rule RF2.1 replaces F-Automatic and F-Manual by F-AutomaticTask and F-Manual-Task, respectively, by including the generalizing object Task.
Similarly, F-Semiautomatic is replaced by F-Semiautomatic-Automatic-ManualTask which includes three generalizing objects Automatic, Manual and Task.
RF2.2. If a generalizing object G produced a fact F-G (on G through RF 2), and if
G has not all its instances linked to any instance of its specialized objects, then
maintain F-G as a fact independent of its specialized objects.
Rule RF2.2 guarantees that instances of G not associated with instances of its
subclasses will be analyzed alone through the fact F-G. Figure 6 case 2 depicts this
situation where GO-instance2 has no specialized instance (SO).
For our running example, rule RF2.2 maintains the two facts F-Manual-Task
and F-Automatic-Task without the specialized object semiautomatic. Indeed, in our
database, there are instances of Manual Tasks and Automatic Tasks not linked to
instances of Semiautomatic.

GO

GO Instance 1

GO Instance 2

GO Instance 3

GO Instance 1

SO Instance 1

SO Instance 2

SO Instance 3

SO Instance 1

GO Instance 2

GO Instance 3

SO Instance 2

SO
Case1: Each instance of GO is linked to an instance of SO

Case2: Some instances of GO are not linked to SO

Fig. 6. The generalization/specialization link at the instance level.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1237

The application of RF2.1 and RF2.2 on objects S(pecialized) of G(eneralized)


can identify simultaneously two facts F-G and F-G-S with their transitive closures.
Conceptually, the fact F-G is included in the fact F-G-S. However, at the logical
level, F-G is loaded with instances from G not attached to any instance of S, and
F-G-S with those of G linked to S. Hence, in our example, rst the fact F-AutomaticTask (resp. F-Manual-Task) will be loaded with objects Automatic and Task (resp.
Manual and Task) which are not attached to any instance of Semiautomatic. Second,
the F-Semiautomatic-Automatic-Manual-Task will be loaded with those linked to
Semiautomatic.
To focus on the most important extracted facts, we consider that facts issued from
relationships (using rule RF1) are more relevant for decision makers than those built
on objects or transitive closure of objects (using rule RF2). In fact, a relationship
generally contains data that describe a business activity; these data result from
transactions of the OLTP system and evolve more quickly than objects.
4.2. Measure identication
Measures serve to compute summarized results by means of aggregate functions.
This imposes that measures have numeric values. In our method, we extract
measures from properties (i.e., attributes and operations) of any object O belonging
to the transitive closure O which produced a fact.
Remember that in general O fO; O 1 . . . ; O n g; O could be restricted to O if
there is no O i 2 O with Max(O; O i 1; 1. The following measure identication
rules takes into account this situation.
RM1. Given a transitive closure O on which a fact F is built. Every operation (i.e.,
method ) in an object O i that belongs to O and returns a numeric value is a candidate
measure for F.
In the object paradigm, an operation is a service that can return a value directly,
or indirectly through a parameter. If the returned value is numeric then rule RM1
considers it as a candidate measure.
RM2. Nonkey numerical attributes in an object that belongs to a transitive closure
O identied as a fact F are candidate measures for F.
Note that rule RM2 excludes key-attributes because they are generally articial
and redundant data; in addition, they do not record/trace a business activity.
In addition, our method has also the advantage to determine measures from
structures; this enriches the set of identied measures as follows:
RM3. Numerical atomic attributes in a simple Structure of an object that belongs to
a transitive closure O identied as a fact F are candidate measures for F.
This rule considers only simple structures and ignores collections of structures. In
fact, a simple structure S in an object O can be seen as a relationship between O and
S, where Max(O; S 1; 1. Furthermore, as shown in the ODB meta-model

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1238

Y. Hachaichi & J. Feki

(Fig. 2), a numeric attribute can be atomic or a collection (e.g., Set, Bag, List,
Array). A collection is multi-valued. However, fact measures are simple values.
Therefore, a numeric function to compute a single value, from the numeric collection
of attributes identied as measures, should be added by the designer. Such a function
is obviously semantic and domain dependent.
Rules RM1 and RM2 identify measures, issued from attributes and operations,
that are directly belonging to object(s) identied as fact whereas, RM3 extracts
measures from Structure within object(s) identied as a fact. Since, the relevance of
measure decrease when we move away from the initial fact object31 then, we consider
that measures obtained with RM1 and RM2 are more relevant than those obtained
with RM3. In addition, we notice that measures dened on operations (RM1) are
more relevant than those dened on attributes (RM2) because these latter are likely
to be keys or descriptive attributes. Hence, we opt for three relevance levels for
measures in the descending order: RM1 then RM2 then RM3.
For our running example, the application of these three rules produces, for each
identied fact, the measures shown in Table 1.
4.3. Dimension identication
Recall that a dimension represents a business analysis axis. Also, dimensional attributes are organized into one or more hierarchies of levels, which correspond to dierent
ways to aggregate fact measures. To complete our DM design method, we determine for
each identied fact, a set of candidate dimensions either from objects or from attributes.
For dimensions built on objects, we dene the following two rules:
RD1. Given an empty fact F built, with rule RF1, on a relationship between O1 and
O 2 . The transitive closure of each of O 1 ; O 2 is a dimension for F.
Conventionally, the name of this dimension is the concatenation of the object
names in O i.
This rule is graphically explained in Fig. 7.
RD2. Let O 1 be an object directly linked to an object O2 with Max(O 1 ; O 2 (1, *),
and O 2 belongs to a transitive closure identied as a fact F. Then, O
1 builds a
dimension for F.
Conventionally, this dimension name is the concatenation of the object names in O 1 :
Rule RD2 identies as a dimension d, every object linked maximally with (1, *)
because every fact instance is one-to-one linked to an instance of its dimension. In
addition, considering O 1 as a dimension (as opposed to O 1 only) enables us to
extract all dimensional attributes for d.
In order to complete the set of dimensional attributes obtained using rules RD1
and RD2, we add to each specialized object SO belonging to the transitive closure
O 1 on which a dimension d is built, all generalizing objects GO (of SO) together with
their transitive closures (GO ). (The name of d will include names of GO ). For our
running example, the impact of this on the results obtained so far is that we replace

Estimate
(RD2)
Model-Study
(RD2)

Price
(RM2)
Quantities
(RM2)
Total detail()
(RM1)
Total detail TI()
(RM1)

S id
(RDI2)

E id
(RDI1)

M id
(RDI1)

Material
(RD2)

F-Estimate-detail
(RF2)

P id
(RDI1)

Article
(RD2)

Price
(RM2)

M id

C id ! Address
E date ! Day ! Month !
Year
P id

Aectation date ! Day!


Month ! Year
SeqId qualication ! Date !
Day ! Month ! Year

P id
(RDI1)

Personnel
(RD1)

F-Model-Study
(RF2)

MO id ! M id
MO id ! P id
S id ! M id
S id ! P id
M Duration ! Second !
Minute ! Hour ! Day

F-ManuelTask-carriedoutbyPersonnel
(RF2)

Purchase date ! Day !


Month ! Year
State

O id
(RDI1)

MC id
(RDI1)

Machine
(RD1)

MO id ! M id
MO id ! P id
S id ! M id
S id ! P id
A Duration ! Second !
Minute ! Hour ! Day

Hierarchy

Manual-Task
(RD1)

O id
(RDI1)

Automatic-Task
(RD1)

F-AutomaticTaskexecutedby-Machine
(RF1)

Identier

Dimension

Multidimensional concepts identied from the ODB of Fig. 3.


Measure

Fact

Table 1.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB


1239

A Duration
(RDI3)

Model-Study
(RD2)
A Duration
(RD5)

AO cost()
(RM1)

MO cost()
(RM1)

SA cost()
(RM1)

F-Automatic-Task
(RF2, RF21 & RF2.2)

F-Manual-Task
(RF2, RF21 & RF2.2)

F-semiautomatic-AutomaticManual-Task
(RF2 & RF2.1)

Hour cost
(RM2)

Wage
(RM2)

F-Machine
(RF2)

F-Personnel
(RF2)

AO cost()
(RM1)
MO cost()
(RM1)

S id
(RDI2)

E date
(RD5)

Total Price TI()


(RM1)

Aectation date
(RD5)
Qualication
(RD3)

State
(RD4)
Purchase date
(RD5)

Aectation date
(RDI3)
SeqId qualication
(RDI4)

State
(RDI3)
purchase date
(RDI3)

A Duration
(RDI3)
SA Duration
(RDI3)

M Duration
(RDI3)

M Duration
(RD5)
A Duration
(RD5)
SA Duration
(RD5)

S id
(RDI2)

M Duration
(RDI3)

M Duration
(RD5)
Model-Study
(RD2)

S id
(RDI2)

Model-Study
(RD2)

E date
(RDI3)

C id
(RDI1)

Client
(RD2)

F-Estimate
(RF2)

Total Price()
(RM1)

Identier

Dimension

Measure

Fact

Table 1. (Continued )

Address

Hierarchy

Date ! Day ! Month !


Year

Day ! Month ! Year

Day ! Month ! Year

Second ! Minute !
Hour ! Day
Second ! Minute !
Hour ! Day

Second ! Minute !
Hour ! Day

P id
M id

Second ! Minute !
Hour ! Day

P id
M id

Second ! Minute !
Hour ! Day

P id
M id

Day ! Month ! Year

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1240
Y. Hachaichi & J. Feki

An Automatic Method for the Design of Multidimensional Schemas from ODB

O1+

1241

O2+

O3

O5
Path_name1

O1

Path_name2

O2
O6

O4

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Dimension O 1-O3-O4
(RD1)

F-O1-Path_name1-O2
(RF1)
Dimension O 2-O5-O6
(RD1)

Fig. 7. Graphical illustration of rule RD1.

the dimensions Automatic and Manual by Automatic-task and Manual-task (Table 1,


column 3) since Automatic and Manual have the generalizing object Task.
Besides rule RD2, we can identify a dimension from an attribute of a particular
data type as well as from some collections of attributes. Such a dimension is known as a
degenerate dimension.a,37 Rules RD3, RD4 and RD5 treat degenerate dimensions.
RD3. Every attribute of a structure collection in an object that belongs to a transitive
closure identied as a fact F is a degenerate dimension for F.
A structure attribute can be assimilated to an object composed of a set of attributes. Thus, a collection of structure S (i.e., Set< S >, List < S >) in an object O
simulates a link between O and S. In addition, in ODB, two objects cannot share the
same instance of an attribute. This guarantees that the link between S and O has
Max(S; O (1, *) and, therefore, we identify the collection of S as a dimension for
the fact built on O .
Degenerate dimensions come also from attributes of special data types, as stated
in the next two rules.
RD4. An atomic Boolean attribute in an object that belongs to a transitive closure
identied as a fact F is a dimension for F.
Naturally, a Boolean attribute splits its object instances into two subsets; thus, it
is a candidate analysis axis.
Finally, as commonly claimed the DW is a chronological collection of data.37
Consequently, the time dimension appears in all DW. We take into account this
claim to dene the temporal dimension rule.
RD5. A temporal attribute (Date, Time, Timestamp, or Interval) in an object that
belongs to a transitive closure identied as fact F is a dimension for F.
Table 1 (column 3) shows the complete set of dimensions identied with these
rules.
a A degenerate dimension is a dimension reduced to one attribute stored as part of the fact, and is not in a
separate dimension.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1242

Y. Hachaichi & J. Feki

Since we envisage assisting the decisional designer, we aect relevance levels to


dimensions. We consider that dimensions identied by RD1, RD2, RD4 or RD5
are the most relevant as they are extracted from objects linked to facts, or from
atomic attributes of specic data type (Temporal/Boolean). Whereas dimensions
identied by RD3 are less relevant because they are extracted from complex attributes sometimes dicult to aggregate.10
The star schema is the keystone construction in multidimensional modeling.55 In
this paper, we have shown how to build star schemas. In addition, we can automatically build constellation schemas in two dierent manners: by merging star
schemas, those have common dimensions, or even by looking to correlated facts
within the OLTP data model. This matter is studied in Ref. 19.
Once dimensions are identied, we continue the construction of multidimensional
schemas by identifying dimensional attributes and organize them into hierarchies.
4.4. Hierarchies identication
Recall that a dimension hierarchy is made up of discrete dimensional attributes
organized from the nest to the highest granularity. The dimension identier attribute is the nest aggregation granularity; the remaining attributes dene progressively coarser granularities. In addition to these organized attributes, a hierarchy may
include weak attributes that are descriptive information for dimensional attributes.
Since all hierarchies of a dimension d start from the identier of d, we rst extract the
dimension identier.
4.4.1. Dimension identier identication
Based on the data structure from which a dimension d is built (i.e., object, attribute),
we dene four rules to extract the identier. The rst rule extracts an identier built
on single attribute key of an object.
RDI1. The key attribute in an object O identied as a dimension d is the identier of d.
As an example, the identier of the dimension Machine is MC id.
RDI2. The identier of a dimension built on the transitive closure O of an object O
is any key attribute belonging to one of the objects in O ; we consider the key attributes of the remaining objects in O as weak attributes for the selected identier.
For example, the Model-Study dimension can be identied by one of the two
keys attributes S id (identier of Study) or Mo id (identier of Model ). In Table 1
(column 4), we have arbitrarily chosen the identier S id and, therefore, consider
Mo id as a weak attribute for S id (cf. RND1).
Note that if none of the objects being considered by rules RDI1 and RDI2 has a
key attribute, then the identier of the dimension may be the OID of any of these
objects.
As we can build dimensions on attributes, the next rule denes identiers for such
dimensions.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1243

RDI3. The identier of a dimension built on an atomic attribute A is A.


For instance, the identier of the dimension A Duration is the attribute
A Duration. Finally, rule RDI4, sets how to dene an identier for a dimension built
on a structured attribute.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

RDI4. The identier of a dimension built on a structured attribute A is a surrogate


(i.e., sequential articial ) identier named SeqId A.
Table 1 (column 4) gives the identier of every identied dimension.
We continue the construction of hierarchies by identifying the remaining
dimensional attributes; i.e., those located after the identier.
4.4.2. Dimensional attributes identication
First, we extract the dimensional attributes located immediately after the dimension
identier (those of the second level of the hierarchy); second, for each one, we extract
its successors. For this, we dened four rules: The rst one for extraction from
objects, and the three other rules for extraction from attributes.
RDA1. Let O 1 be an object directly related to O 2 with Max(O 1 ; O 2 (1, *) and O1
belongs to a transitive closure that produced a dimension d. Then, the key attribute of
every object in O 1 is a dimensional attribute of level two for d.
Similarly to the dimension identiers, if an object does not have a key attribute,
then the OID (assigned by the ODBMS) is assumed as a default dimensional attribute of level two.
RDA2. Every attribute with a structure collection type in an object belonging to a
transitive closure identied as a dimension d is a dimensional attribute of level two for d.
As we have proceeded for dimension identication, attributes of a specic data
type can be identied as dimensional attributes.
RDA3. Every atomic Boolean attribute in an object belonging to a transitive closure
that produced a dimension d is a dimensional attribute of level two for d.
RDA4. A temporal structured attribute (Date, Time, Timestamp, or Interval) in an
object belonging to a transitive closure that produced a dimension d is a dimensional
attribute of level two for d.
Note that rules from RDA1 to RDA4 extract dimensional attributes of level
two. To obtain attributes at higher levels, we apply recursively either the above four
rules on objects from which a dimensional attribute is extracted by RDA1, or the
rules RDA2 to RDA4 on attributes identied by RDA2. In the second assertion
we do not apply RDA1 because an attribute cannot be linked to an object by a
relationship and RDA1 is based on relationships between objects.
Note that Temporal attributes (Date, Interval, Time, and Timestamp by interfaces) resulting from rule RDA4 are dened in the ODMG object model as

1244

Y. Hachaichi & J. Feki


Table 2. Hierarchies built on temporal attributes.
ODMG constructor14

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Data type : Date


Julian date (in unsigned short year,
in unsigned short day)
Calendar date (in unsigned short year,
in unsigned short month, in unsigned short day)
Data type : Time
From hmsm (in unsigned short hour,
in unsigned short minute,
in unsigned short second,
in unsigned short millisecond)
Data type : Interval
D Interval (in unsigned short day,
in unsigned short hour, in unsigned
short minute, in unsigned short second)

Corresponding hierarchy
day ! year
day ! month ! year

millisecond ! second ! minute ! hour

second ! minute ! hour ! day

Data type : TimeStamp


second ! minute ! hour ! day ! month ! year
d Timestamp (in unsigned short year, in unsigned
short month, in unsigned short day, in unsigned
short hour, in unsigned short minute, in
unsigned short second)

structured objects. We have studied the factory of ODMG operations for creating
such objects and we have dened for each one its corresponding hierarchy. Table 2
gives the most used constructors for temporal attributes and their corresponding
hierarchies. In Table 2, Attribute1 ! Attribute2 means that Attribute1 has lower
granularity than Attribute2.
Similar to the classication of dimensions, we consider that dimensional attributes produced by RDA1, RDA3 and RDA4 are more relevant for decision
making than those obtained with RDA2. The last ones are issued from complex
attributes. For our running example, Table 1 (column 5) lists the hierarchies for each
dimension.
We continue the construction of hierarchies by identifying for each dimensional
attribute its weak attributes (i.e., nondimensional attributes), if any.
4.4.3. Weak attributes identication
To identify weak attributes, we dened the following four rules.
RND1. Key attributes of objects belonging to O that produced a dimension d, other
than those extracted through RDI 2 as dimension identier for d, are weak attributes
for d identier.
We have already introduced this rule in RDI2. The idea is to associate for the
identier of dimension d a descriptive attribute from each object that participates to
d construction: the Mo id attribute becomes a descriptive attribute for the ModelStudy dimension (Fig. 8).

An Automatic Method for the Design of Multidimensional Schemas from ODB

1245

M_id

MO_id

Price
Description

Year Month Day Purchase_date

State

MC_id
Machine
Machine

F-AutomaticTaskF-AutomaticTaskexecutedby-Machine
executedby-Machine

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Name
Hour_cost

Name
Description
P_id

O_id

Name
Description

M_id

Automatic-Task
Automatic-Task
Description
S_id

Name
Description
P_id

Price
Description

Name
Description

A_Duration Second Minute Hour Day

M_id
Year Month Day E_date

F-Estimate-detail
F-Estimate-detail
E_id

Address C_id

Estimate
Estimate

Price
Price
Quantities
Quantities
Total_detail()
Total_detail()
Total_detail_TI()
Total_detail_TI()

S_id
Model-Study
Model-Study

Name
Description
P_id

MO_id
Name
Name
Description
Price
Description

Fig. 8. Generated star schemas F-AutomaticTask executedby Machine and F-Estimate-detail.

RND2. Every nonkey (textual or numerical ) atomic attribute belonging to an object


of a transitive closure providing a dimensional attribute p (through rules RDI1, RDI2
or RDA1) is a weak attribute for p.
RND3. If an object provides a dimensional attribute p, through rules RDI1 or
RDI2 or RDA1, and contains a simple structure attribute S (not a collection) then
textual and numerical atomic attributes of S are weak attributes for p.
RND4. Every nonkey textual or numerical atomic attribute belonging to a collection
of structured attribute providing a dimensional attribute p (through rules RDI4 or
RDA2) is a weak attribute for p.
In rules RND2 to RND4, we have considered both textual and numerical
attributes. Because weak attributes are descriptive, then we consider that textual
attributes are more signicant than numerical ones. Practically, numerical attributes
may be insignicant as descriptive.
Figure 8 shows graphically according to the DFM model the two facts F-Automatic
Task executedby Machine and F-Estimate-detail among those extracted and listed in
Table 2. In this gure, most relevant multidimensional concepts are in bold font.
In Appendix A, we give an algorithm for the identication of multidimensional
concepts.
To examine experimentally that our proposed rules extract all pertinent multidimensional concepts starting from an ODB and produce signicant/useful

1246

Y. Hachaichi & J. Feki

multidimensional schemas, we have developed the CAME-BDO toolset that support


our proposed method.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

5. CAME-BDO: A Toolset for DM Construction


CAME-BDO extends the CAME31 software tool that carries out the design of conceptual DM schemas starting from either the relational database,30 or from a set of
XML documents compliant to a given DTD.32 CAME-BDO is an assistance toolset
for the decisional designer: its main functions cover our DM design method steps
starting from a MATISSE ODB. More accurately, it enables the construction of
multidimensional schemas. We have implemented our tool using SQL to access the
repository and compute some results as the transitive closure of an object, and java
as a programming language.
MATISSE is an ODBMS compliant to the ODMG standard. Its strength lies
mainly in its meta-schema (Fig. 9). In MATISSE, everything is an object: a class is
an instance of the meta-class Mt-Class, relationships are instances of the meta-class
Mt-Relationship, attributes are instances of the meta-class Mt-Attribute, and
methods (i.e., operations) are instances of the meta-class Mt-Method. For more
details about MATISSE the reader can refer to the Matisse release notes.40
The remainder of this section is devoted to the description of the CAMEBDO features and to its GUI, through the source object database Media planning50
(Fig. 10). For this ODB source, Prat et al.50 have semi-automatically identied multi
dimensional concepts. In Sec. 6, we will compare our results to those obtained by
Prat et al.50 in order to evaluate CAME-BDO and highlight the benets of our method.

Mt Universal Method
Mt Internal Function
Mt Documentation
Mt Overridable

Mt Message

Mt Interpretation Of

Mt Selector
Mt Documentation

Mt Interpretation

Mt Method

Classe
Attribute1
Attribute2
...

Relationship

Inverse Relationship

Mt Subclasses

Mt Internal Function
Mt Documentation

Mt
Ow
nM
Mt
M

eth

od

Mt Attribute

eth

od

Mt Superclasses

Mt Class
Mt Attributes

Mt Successors

Mt
C

ritir

ria

ion

Mt Classes

rite

Mt Relationship
Mt Relationships

Mt Name
Mt Instance Check Function

Attributes Of

Mt
C

Relationship Of

Of

Mt Index

Mt Name
Mt Default Value
Mt AttributeCheck Function
Mt Make Entry Function
Mt Before Attribute
Modification Function
Mt Type

Legend:

Mt Successors Of

Mt Inverse
Relationship

Mt Index
Mt Name
Mt Criteria Size
Mt Criteria Order

Fig. 9.

Mt Name
Mt Relationship Check Function
Mt Before Adding Successor
Mt After Adding Successor
Mt Before Removing Successor
Mt After Removing Successor
Mt Cardinality

The ODBMS MATISSE Meta-schema.40

Mt Inverse Relationship

An Automatic Method for the Design of Multidimensional Schemas from ODB

Main_shareholder

Percentage_of_shareholder
(List(N))

Shareholder

Wage(N)
shareholder_name(K,S)
date_recrutement(D)

1247

publicshareholder_level(S)

Public_shareholder

nbr_ans_exp(N)
Private_shareholder
media_name(k,S)
Media

Get_nbr_ans_exp
(N)

advertising_price(N)
Set_media_name(void)
Company

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

region(k,S)
media_type(S)
Media_type

Region

manager_
name(S)

Person

nbr_exp_
travail(N)

number_of_inhabitants(N)
Set_region(void)

media_unit(S)
Get_media_type(S)

quantity(N)
Exposure

media_exposure(N)

amount(N)

Purchase

Total_quantity(N)
Date

dd_mm_yyyy(k,S)

Target
Quarter

quarter(k,S)

product_code(k,S)
Product

Year

year(k,S)

Influence_of_company

quarter(k,S)

Quarter

Target_code(K,N)
Status(S)
minimum_age(S)
maximum_age(S)
sex(S)
pourcentage_of_region(N)

product_name(S)

influence_of_coefficient(N)
product_type(k,S)
Product_type

Advertising_campaign

product_unit(S)

campaign_code(k,N)

A dashed class denotes a class repeated to avoid overlapped links

Fig. 10. A source object database modeling Media planning.50

CAME-BDO features cover the three steps of our design method (Fig. 11):
.

Object database schema retrieval. This step, rst displays the list of databases
implemented under the MATISSE ODBMS. After the selection of one of these
ODB sources, the DM schemas construction process starts. CAME-BDO accesses
the MATISSE meta-model (i.e., repository), extracts the objects of the selected
data source and displays them in a tabular format as depicted in the interface of
Fig. 12. In this GUI, the designer can see attributes, methods (operations),
relationships, and the type (generalized, specialized or normal) of every pointed
object. Here, the designer can keep all/some objects for the DM construction
process (i.e., Select/deselect objects) and, nally launches the identication of
multidimensional concepts in order to obtain all DM schemas.

1248

Y. Hachaichi & J. Feki

Design Method Steps


1

CAME-BDO Functions
MPI-EDITOR

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Object Database schema


Object retrieval
Database schema
retrieval

Multidimensional concepts
Multidimensional
identification andconcepts
star
identification
and star
schema
construction
schema construction

DM Schemas
DM Schemas
display
and
display and
adjustment
adjustment

XML
XML
XML

CAME-BDO Repositories

ODBMS
MATISSE
Repository

Extracted schemas
repository

Multidimensional
schemas repository

Fig. 11. CAME-BDO functional architecture.

Current object

Select/deselect objects

Type of the current object

Methods of the current object

Attributes of the current object

Relationships of the current object

Starts up the identification process

Fig. 12. The schema of the ODB of Fig. 10 as extracted from MATISSE repository.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1249

Measures of the selected fact

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Identified facts

Empty
facts

Dimensions of the selected fact

Fig. 13. (Color online) Identied facts, measures and dimensions in the Media planning ODB.

Multidimensional concepts identication and star schema construction.


This identication applies automatically our identication rules on the elected
objects of the chosen ODB. It extracts facts and, for each one it identies its
measures, dimensions with their dimensional attributes organized into hierarchies
and weak attributes, if any. These identied concepts are stored in the multidimensional schemas repository together with their corresponding elements (i.e.,
names of objects, attributes, methods) in the ODB source. This stored traceability
has a twofold benet; rst, it is fundamental to derive automatically logical representations and, second, it helps to generate ETL procedures.

Applied on the example of Fig. 10, our identication rules construct 10 multidimensional schemas (Fig. 13). For example, the obtainment of the schema of the
fact called F-Main shareholder follows steps below:
RF1 identies F-Main shareholder as a fact; indeed, Main shareholder
fMain shareholderg and Main shareholder contains the nonkey numeric attribute
Percentage of shareholder.
RM2 identies the nonkey numeric attribute Percentage of shareholder as a
measure for F-Main shareholder.
RD2 identies each of the objects Shareholder, Media and Date as a dimension
for F-Main shareholder: each one is linked to Main shareholder with maximum

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1250

Y. Hachaichi & J. Feki

Fig. 14. DM schema built with CAME-BDO and displayed using MPI-Editor.

multiplicities (1, *). Applying RDI1, the identier of each of these dimensions is
the key attribute.
Date and Media hierarchies are constructed using RDA1 (applied twice for
Date).
The hierarchy of the Shareholder dimension is constructed by RDA4; the ODMG
Calendar date constructor (Table 2) gives its last three parameters.
.

DM Schemas display and adjustment. To display the DM star schemas built


in the previous step, CAME-BDO oers two formats: tabular and graphical. The
tabular format (Fig. 13) displays the DM schemas extracted with CAME-BDO for
the ODB of Fig. 10. For the selected fact F-Main shareholder, this interface displays its measures and dimensions classied by relevance level (dark blue for the
highest relevance). Once a dimension is selected, the interface visualizes its
attributes and hierarchies, ordered by relevance level (i.e., high or low).

The interface of Fig. 14 shows graphically a DM schema constructed with CAMEBDO, it is displayed according to the DFM graphical notation. This interface is
obtained with MPI-Editor 5 toolset which communicates with CAME-BDO through
XML technology. The decisional designer can validate the schema by adding derived
measures, removing and/or renaming dimensional elements. These adjustments can
be performed either through the CAME-BDO tabular format, or through the MPIEditor GUI.

6. Discussion and Evaluation


Our design method for multidimensional schemas from an object-oriented database
relies on transformation rules inspired from those we have dened for the

An Automatic Method for the Design of Multidimensional Schemas from ODB


Table 3.
Criteria
Data sourcesa

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

Counseling system for


technician admission22
Media-planning50
Faculty members load61
Farmer company62;b

1251

CAME-BDO evaluation.

Number of multidimensional concepts identied by CAME-BDO/number


of multidimensional concepts identied by authors
Fact

Measure

Dimension

Dimensional attribute

2/1

3/3

5/3

2/1

10/9
9/8
6/6

9/8
11/9
3/3

10/8
8/8
5/4

10/8
5/4
9/6

a We

have created a MATISSE ODB for each data source.


have restricted the comparison of measures, dimensions and dimensional attributes only to facts
(two) for which a complete schema is constructed by authors.

b We

construction of DM schemas from relational databases.31 Moreover, these rules


deeply prot from the ODB specicities: they consider object operations (rules RF2,
RM 1), exploit inheritance between objects (RF2:1, RF2:2) and complex attributes
(RM 3, RD3, RDI 4, RDA2, RND3, RND4). In addition to these advantages, our
method diers from those of the literature because it considers the recent version of
the data source (i.e., the version in use, actually extracted from the ODBMD repository). Furthermore, it assigns a relevance level to the extracted multidimensional
elements (e.g., measures, dimensions, hierarchies) and traces back the DM schema
elements to the data source schema elements.
To evaluate our method, we have experimented our CAME-BDO software
prototype on several object databases (some of them are taken from the literature
and for which multidimensional schemas were built manually/semi-automatically.22,50,61,62) A comparative analysis between results obtained with CAME-BDO
and schemas constructed by authors of these cases are summarized in Table 3.
Through these evaluations, we concluded the following points:
.

CAME-BDO (and thus our method) identies all the facts that a bottom-up
analysis can manually gure out; moreover, it nds out empty facts.
. In most cases, CAME-BDO extracts more measures than those obtained manually;
this is because we consider both operations and multi-valued numerical attributes.
However, we noticed that CAME-BDO does not identify calculated measures since
they are semantics dependent. In practice, these measures can be added manually
by the designer in the third step of our method.
. The number of dimensions and dimensional attributes extracted by CAME-BDO
is slightly higher than those obtained manually. Thus, CAME-BDO builds schemas
that oer a larger panoply of analyses. This variation in numbers is due to the fact
that our rules take into account Boolean, temporal and structured attributes, and
are not limited to work on objects.

1252

Y. Hachaichi & J. Feki

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

7. Conclusion
In this paper, we have tackled DW design issues at the conceptual level. More
specically, we have proposed an automatic DW design method that starts from an
ODB compliant to the ODMG standard and generates DM schemas modeled
according to the well-known DFM model. In order to generate DM schemas based on
the most recent version of the ODB source, our method extracts the ODB logical
model directly from the ODBMS repository. Thus, the generated DMs can reect all
of the organization activities.
Furthermore, to automatically generate DM schemas, we have dened a set of
rules for the extraction of the DM schema components (i.e., facts, measures,
dimensions with their attributes organized into hierarchies). Our rules have the
merit to be independent of the semantics of the ODB source domain. To be independent, they take advantage of the structural-semantics oered by the ODMG
object model. In an additional attempt to assist the DW/DM designer, our rules
assign to each component of the generated DM schemas a relevance level reecting its
analytical potential. This assists the designer to choose those DM concepts that are
more interesting for the decision-making process. Moreover, and as a third advantage, our method keeps track of the origin of each component in the generated
DM schema. This traceability has a twofold benet; rst, it is fundamental to
automatically derive logical representations and, second, it helps to generate ETL
procedures.
For this method, we have developed a software prototype, called CAME-BDO, we
used to conduct some experimental evaluations. In fact, CAME-BDO extends our
CAME31 software tool which carries out the design of conceptual DM schemas
starting either from a relational database source,30 or from a set of XML documents
compliant to a given DTD.32 These preliminary evaluations showed the feasibility of
our method in identifying all facts and their measures, dimensions and hierarchies.
To generalize these results, we are looking for a more consistent evaluation on a set
of ODB.
In addition, as an immediate extension of this work, we are currently developing a
software tool for the automatic generation of ETL procedures under OWB (Oracle
Warehouse Builder); some preliminary results in this perspective are recently published.33 Furthermore, we are examining how to integrate adjusted/validated DM
schemas obtained with CAME to build a DW schema loadable from heterogeneous
data sources (i.e., relational database, XML data-centric documents and object
databases), and how to generate platform-independent ETL procedures according to
the MDE (Model Driven Engineering) approach.

Acknowledgments
We would like to thank the anonymous reviewers for their valuable comments and
suggestions to improve the quality of the paper.

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB


1253

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1254
Y. Hachaichi & J. Feki

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB


1255

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1256

Y. Hachaichi & J. Feki

References
1. A. Abello, J. Samos and F. Saltor, Y AM2 (Yet Another Multidimensional Model): An
Extension of UML, In IDEAS '02 (IEEE Computer Society, 2002), pp. 172181.
2. R. Agrawal, A. Gupta and S. Sarawagi, Modeling multidimensional databases, in Proc.
Int. Conf. Data Engineering, Birmingham UK (1997), pp. 232243.
3. D. K. Barry, When an object database should be used (2011), available at http://www.
service-architecture.com/object-oriented-databases/articles/when an object database
should be used.html.
4. C. Batini, S. Ceri and S. Navathe, Conceptual Database Design: An Entity-Relationship
Approach (Benjamin-Cummings Publishing, Redwood City, USA, 1992), p. 470.
5. M. Ben Abdallah, J. Feki and H. Ben-Abdallah, MPI-EDITOR: OLAP requirement
specication tool for the reuse of logical multidimensional patterns, Workshop on Decisional Systems (ASD'06), Maroc (2006) (in French).
6. M. Ben Abdallah, H. Ben-Abdallah and J. Feki, Well formed multidimensional patterns
for the build of data marts, Journal of Global Management Research 16(3) (2008).
7. A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta and S. Paraboschi, Designing data marts
for data warehouse, in ACM Transaction on Software Engineering and Methodology 10
(2001) 452483.
8. M. Bohnlein and A. Ulbrich-vom Ende, Deriving initial data warehouse structures from
the conceptual data models of the underlying operational information systems, DOLAP,
Missouri (1999), pp. 1521.
9. R. Bloor, The failure of relational database, the rise of object technology and the need for
the hybrid database, White Papers & Analyst Reports, Advanced software technologies
for breakthrough applications, InterSystems (2003).

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1257

10. F. Bret and O. Teste, Graphical construction of data warehouses and data marts,
INFORSID'99, La Garde, France (1999) (in French).
11. R. Bruckner, B. List and J. Schiefer, Developing requirements for data warehouse systems
with use cases, 7th Americas Conf. Information Systems, Boston (2001), pp. 329335.
12. L. Cabibbo and R. Torlone, A logical approach to multidimensional databases, Conf.
Extended Database Technology, Valencia, Spain (1998), pp. 187197.
13. A. Carme, J. N. Mazon and S. Rizzi, A model-driven heuristic approach for detecting
multidimensional facts in relational data sources, 12th Int. Conf. Data Warehousing and
Knowledge Discovery, Bilbao, Spain (2010), pp. 1324.
14. R. G. Cattell and D. Barry, The Object Data Standard: ODMG 3.0 (Morgan Kaufmann
Pub., Sans Francisco, CA, 2002) p. 288.
15. B. D. Czejdo, J. Eder, T. Morzy and R. Wrembel, Designing and implementing an object
relational data warehousing system, in Proc. IFIP TC6/WG6.1 Third Int. Working Conf.
New Developments in Distributed Applications and Interoperable Systems (2001), pp.
311316.
16. K. Decker, A. Oaks and M. Salinas, Building a cost engineering data warehouse, AACE
International Transactions, IM.06, AACE International, Morgantown (1997).
17. B. Dhawan and A. Gosain, Extending uml for multidimensional modeling in data warehouse, International Journal of Computer & Communication Technology 2(7) (2011)
5964.
18. J. Feki and Y. Hachaichi, Assisted data mart design: A method and a toolset, Journal of
Decision Systems 16(3) (2007) 303333 (in French).
19. J. Feki and Y. Hachaichi, Constellation discovery from OLTP parallel-relations, International Arab Conf. Information Technology (ACIT'07), Lattakia, Syria, November 2007.
20. J. Feki, A. Nabli, H. Ben-Abdallah and F. Gargouri, An automatic data warehouse
conceptual design approach, in Encyclopedia of Data Warehousing and Mining (IGI
Global Publication, 2008).
21. E. Fernandez-Medina, J. Trujillo, R. Villarroel and M. Piattini, Developing secure data
warehouses with a uml extension, Information Systems 32(6) (2007) 826856.
22. S. K. Gandhi and S. Jain, Data warehouse design: An object-oriented approach, Journal
of Engineering, Science & Management Education 4 (2011) 1620.
23. A. Gosain and M. Suman, Object oriented multidimensional model for a data warehouse
with operators, International Journal of Database Theory and Application 3(4) (2010)
3540.
24. M. Golfarelli, D. Maio and S. Rizzi, Conceptual design of data warehouses from E/R
schemas, Conf. System Sciences, Kona-Hawaii, Vol. VII (IEEE Computer Society,
Washington, DC, USA, 1998).
25. M. Golfarelli, D. Maio and S. Rizzi, The dimensional fact model: A conceptual model for
data warehouses, International Journal of Cooperative Information Systems 7(23)
(1998) 215247.
26. M. Golfarelli, The DFM: A conceptual model for data warehouse, in Encyclopedia of Data
Warehousing and Mining, J. Wang (ed.), 2nd edn. (IGI Global, Hershey, PA, 2008).
27. M. Golfarelli, From user requirements to conceptual design in data warehouse design a
survey, in Data Warehousing Design and Advanced Engineering Applications: Methods
for Complex Construction, Part of the Advances in Data Warehousing and Mining
(ADWM) Book Series (IGI Global Publication, 2009).
28. P. Giorgini, S. Rizzi and G. Maddalena, Goal-oriented requirement analysis for data
warehouse design, DOLAP Bremen, Germany (2008), pp. 4756.
29. Y. Hachaichi and J. Feki, From relational to multidimensional model: Data mart design,
EDA 07, RNTI B-3 (2008), pp. 519 (in French).

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

1258

Y. Hachaichi & J. Feki

30. Y. Hachaichi, J. Feki and H. Ben-Abdallah, From XML source to multidimensional


model: Data mart design, EDA 08, RNTI B-4 (2008), pp. 4559 (in French).
31. Y. Hachaichi, J. Feki and H. Ben-Abdallah, Designing data marts from XML and relational data sources, in Data Warehousing Design and Advanced Engineering Applications: Methods for Complex Construction, Part of the Advances in Data Warehousing
and Mining (ADWM) Book Series (IGI Global Publication, 2009), pp. 5580.
32. Y. Hachaichi, J. Feki and H. Ben-Abdallah, Multidimensional modeling of data-centric
XML documents, Journal of Decision Systems 19 (2010) 313345.
33. J. Hajlaoui, J. Feki and Y. Hachaichi, DM-Creator: A tool for implementing data mart
schemas, Workshop on Decisional Systems (ASD'10), Sfax, Tunisia (2010), pp. 1324 (in
French).
34. R. Hull, Managing semantic heterogeneity in databases: A theoretical perspective, 16th
ACM SIGACT SIGMOD SIGART Symp. Principles of Database Systems (1997), pp. 5161.
35. W. H. Inmon, Building the Data Warehouse, 3rd edn. (Wiley, New York, 2002).
36. M. Jensen, T. Mller and T. B. Pedersen, Specifying OLAP cubes on XML data, Journal
of Intelligent Information Systems 17(23) (2001) 255280.
37. R. Kimball, L. Reeves, M. Ross and W. Thornthwaite, The Data Warehouse Lifecycle
Toolkit (John Wiley, 1998).
38. A. Konovalov, Object-oriented data model for data warehouse, ADBIS 2002, LNCS,
Vol. 2435, (Springer-Verlag, Berlin, Heidelberg, 2002), pp. 319325.
39. S. Lujan-Mora, J. Trujillo and I. Y. Song, A UML prole for multidimensional modeling
in datawarehouses, Data & Knowledge Engineering 59(3) (2006) 725769.
c 19922010 Matisse Software Inc, available at http://www.
40. Matisse 8.3.3 Release Notes,
matisse.com/pdf/developers/rn 840.pdf.
41. J.-N. Mazon and J. Trujillo, An MDA approach for the development of data warehouses,
Decisional Support System 45(1) (2008) 4158.
42. D. Moody and M. Kortnik, From enterprise models to dimensional models: A methodology for data warehouse and data mart design, DMDW'00, Sweden (2000).
43. T. B. Nguyen, A. M. Tjoa and R. Wagner, An object oriented multidimensional data
model for OLAP, Web-Age Information Management (WAIM) (Springer-Verlag, 2000),
pp. 6982.
44. ODBMS FAQ (2011), available at http://www.service-architecture.com/object-orienteddatabases/articles/odbms faq.htm.
45. F. R. S. Paim and J. B. Castro, DWARF: An approach for requirements denition and
management of data warehouse systems, Conf. Requirements Eng., Monterey (2003).
46. T. B. Pedersen and C. Jensen, Multidimensional data modeling for complex data, in Proc.
Int. Conf. Data Engineering, Sydney, Australia (1999), pp. 336345.
47. F. Ravat and O. Teste, A temporal object-oriented data warehouse model, in Proc. 11th
Int. Conf. Database and Expert Systems - DEXA 2000, Greenwich, London, UK (2000).
48. F. Pinet and M. Schneider, A unied object constraint model for designing and implementing multidimensional systems, Journal on Data Semantics, Lecture Notes in Computer Science, Vol. 5530, (Springer, 2009), pp. 3771.
49. C. Phipps and K. Davis, Automating data warehouse conceptual schema design and
evaluation, 4th Int. Workshop Design and Management of Data Warehouses, Vol. 58
(2002), pp. 2332.
50. N. Prat, J. Akoka and I. Comyn-Wattiau, A UML-based data warehouse design method,
Decision Support Systems 42 (2006) pp. 14491473.
51. N. Prakash and A. Gosain, Requirements driven data warehouse development, The 15th
Conf. Advanced Information Systems Engineering, Austria (2003).

Int. J. Info. Tech. Dec. Mak. 2013.12:1223-1259. Downloaded from www.worldscientific.com


by AUCKLAND UNIVERSITY OF TECHNOLOGY on 08/16/16. For personal use only.

An Automatic Method for the Design of Multidimensional Schemas from ODB

1259

52. T. Rujirayanyonga and J. J. Shib, A project-oriented data warehouse for construction,


Knowledge Enabled Information System Applications in Construction 15(6) (2006)
800807.
53. A. Sarkar, S. Choudhury, N. Chaki and S. Bhattacharya, Object relational implementation of graph based conceptual level multidimensional data model, The 9th Int. Conf.
Computer Information Systems and Industrial Management Applications (CISIM'10)
(October 2010), pp. 154159.
54. M. Schneider, Well-formed data warehouse structures, 5th Int. Workshop at VLDB'03 on
Design and Management of Data Warehouses (DMDW'2003) (Berlin, Germany 2003).
55. I. Y. Song, R. Khare and B. Dai, SAMSTAR: A semi-automated lexical method for
generating star schemas from an entity-relationship diagram, The ACM Tenth Int.
Workshop on Data Warehousing and OLAP (Lisbon, Portugal, November 2007).
56. V. Stefanov and B. List, A UML prole for modeling data warehouse usage, The 2007
Conf. Advances in Conceptual Modeling: Foundations and Applications (Auckland, New
Zealand, 59 November, 2007).
57. R. Torlone, Conceptual multidimensional models, in Multidimensional Databases: Problems and Solutions (IGI Global Publication, 2003).
58. M. Torres and J. Santos, A Language to dene external schemas in ODMG databases,
Journal of Object Technology 3(10) (2004) 181192.
59. J. Trujillo, M. Palomar and J. Gomez, The GOLD denition language (GDL): An object
oriented formal specication language for multidimensional databases, SAC 1 (2000),
pp. 346350.
60. B. Vrdoljak, M. Banek and S. Rizzi, Designing web warehouses from XML Schema, 5th
Int. Conf. DAWAK, Prague Czech Republic (2003).
61. S. Zribi, J. Feki, From UML class diagram to multidimensional model, 2nd Workshop on
Decisional Systems, Sousse-Tunisia (2007) (in French).
62. L. Zepeda, M. Celma and R. Zatarain, A methodological framework for conceptual
data warehouse design, 43rd ACM Southeast Conf. (Kennesaw, GA, USA, March 2005),
pp. 1820.
63. J. Zubco and J. Trujillo, A UML 2.0 prole to design association rule mining models in
the multidimensional conceptual modeling of data warehouses, Data & Knowledge
Engineering 63 (2007), pp. 4462.

You might also like