An Automatic Method For The Design of Multidimensional Schemas From Object Oriented Databases
An Automatic Method For The Design of Multidimensional Schemas From Object Oriented Databases
DOI: 10.1142/S0219622013500351
YASSER HACHAICHI
Multimedia, InfoRmation Systems and Advanced
Computing Laboratory, University of Sfax
Higher Institute of Business Administration
BP 1013, Sfax, 3018, Tunisia
[email protected]
JAMEL FEKI
Multimedia, InfoRmation Systems and Advanced
Computing Laboratory, University of Sfax
Faculty of Economics and Management
BP 1018, Sfax, 3018, Tunisia
[email protected]
A data warehouse (DW) is a large data repository system designed for decision-making
purposes. Its design relies on a specic model called multidimensional. This multidimensional
model supports analyses of huge volumes of data that trace the enterprise's activities over
time. Several design methods were proposed to build multidimensional schemas from either
the relational data model or the entity-relationship data model. Almost all proposals that
treated the object-oriented data model assume the presence of the data source UML classdiagram. However, in practice, either such a diagram does not exist or is obsolete due to
multiple changes/evolutions of the information system. Furthermore, these few proposals
require an intense manual intervention of the designer, which requires a high expertise both in
the DW domain and in the object database domain. To overcome these disadvantages, this
work proposes an automatic DW schema design method starting from an object database
(schema and its instances). This method applies a set of extraction rules to identify multidimensional concepts and to generate star schemas. It is dened for the standard ODMG
model and, thus, can be adapted with slight changes for other object database models. In
addition, its extraction rules have the merit of being independent of the domain semantics.
Furthermore, they automatically generate schemas classied according to their analytical
potential; this classication helps the DW designer in selecting the most relevant schemas
among the generated ones. Finally, being automatic, our method is supported by a toolset that also prepares for the automatic generation of the Extract Transform and Load
procedures used to load the DW.
Keywords: Multidimensional model; data warehouse; automatic design; object-oriented
database.
1223
1224
1. Introduction
Organizations generate huge volumes of data pertinent to various aspects of their
business processes, such as customers' management, suppliers' management and
procurement. In addition, they need to access and analyze this data to support either
their daily operations or business decisions.52 Thus, today's organizations have two
types of information systems: operational information systems and decision support
systems (DSS). The operational information system supports the execution of the
daily business operations. On the other hand, a DSS manages historical information
used to analyze the business performance over time in order to take appropriate
business decisions.
Data warehousing is a technology that intends to provide decision makers access to
various levels of information.52 A data warehouse (DW) provides an architectural
model for the ow of data from the operational information system to decision support
environments.16 It is periodically populated with data from operational information
systems, e.g., for equipment management, accounting, inventory and customer
management. Essentially, a DW collects all relevant data into one central system,
organizes data eciently so it is consistent and convenient for many purposes such as
retrieving/using or keeping old data for historical analyses.52 In addition, a DW is a
central data repository used to build and load data marts (DM); each DM contains an
extract of the DW and is oriented to a specic subject of decisional analyses.
On the other hand, given their functional dierences, designing a DW/DM
requires a methodology dierent from those commonly adopted for operational
information systems. In fact, current commercial software tools only assist the
administrators in the DW/DM structure specication and production of analytical
results; hence, they suppose that the DW/DM schemas are designed beforehand.
This shortage motivated the proposition of several design methods for DW/DM
schema.8,10,12,13,18,20,22,24,29,30,32,36,37,42,45,50,51,55,61 These methods dier in three
aspects: their approaches (top-down,11,37,45,51 bottom-up12,13,18,24,30,32,36,42,50,61 versus
mixed7,8,12,18,20,24,49,55) the data model they assume (relational,8,12,13,18,29,31,42,55
object-oriented22,50,61,62 versus XML32,36,60), and their degree of automation (automatic,13,18,20,29,30,32 semi-automatic24,36,42,45,50,55,61 versus manual11,12,22,37,51). The
current state-of-the-art shows that several DW/DM design methods have been
proposed for relational databases with various degree of automation. However, XML
and object-oriented databases still interest researchers. In this paper, we focus on
object-oriented databases (ODB) as they have better treated complex objects
increasingly in use in today's computer applications.9 In addition, as set forth by
Barry,3 \generally, an ODB is a good choice when you have all three factors: business
need, high performance, and complex data. Recently, people have also been considering an ODB even when their data is not particularly complex. An ODB can allow
for a smaller team and faster development because there is only one data model."
Examples of ODBMS (Object Data Base Management System) in use in industrial
applications are given in ODBMS FAQ44; for instance, British Airways uses the
1225
Versant Object Database for its Origin and Destination Revenue Management
System.
To our knowledge, there are very few DW/DM schema design methodologies that
start from an ODB.22 Almost all methods rely on an object-oriented data source start
from a UML class diagram.50,61,62 However, in practice, the organization either does
not always have such a diagram or even when it has one it is often obsolete: not up to
date to reect the evolution/maintenance of the operational information system. In
addition, the few proposed methods for the design of DW/DM based on objects
require human intervention with high expertise in both domains OLAP (On-Line
Analytical Processing) and ODB. Furthermore, in the design process, these works do
not address some specicities of the object data sources, such as methods and
structured attributes.
In this paper, we propose a DM schema design method from ODB (schema and its
instances). This method is supported by a software engineering tool and is domain
independent because it relies on the structural properties of the data source independently of its semantics. It automatically applies a set of rules to extract, from the
ODB, multidimensional concepts (i.e., candidate facts, dimensions) and generates
star schemas. Moreover, it keeps track of the origin (e.g., object name, attribute
name, data type, length) of each multidimensional concept in the generated DM
schemas; this traceability helps the automatic generation of ETL (Extract Transform and Load) procedures to load the DM. Finally, to ensure its adaptability to
various ODBMS, we base our method on the standard object-model ODMG14
(Object Database Management Group); thus, any ODBMS compliant with this
standard can benet from our method for the design of DM schemas.
The remainder of this paper is organized as follows. Section 2 presents the ODMG
object model as a standard, and the multidimensional model. Section 3 puts our
method in its scope by overviewing DW/DM bottom-up design approaches based on
object-oriented data models. Section 4 describes our design method of DM from an
ODB. Section 5 overviews the CAME-BDO toolset which supports our DM schema
design method. The evaluation of CAME-BDO is discussed in Sec. 6. Finally, the
conclusion summarizes the presented work and outlines its perspectives.
1226
The object model species the kind of semantics that can be dened explicitly for
an ODBMS. Among other things, the semantics of the object model determines the
characteristics of objects, how objects can be related to one another, and how objects
can be named and identied. The object model species the constructs that are
supported by an ODBMS.58
The basic modeling concepts are the object and the literal. Each object has a
unique identier whereas a literal has no identier. In addition,
Objects and literals can be categorized by their types. All elements of a given type
have a common range of states (i.e., the same set of properties) and a common
behavior (i.e., the same set of operations). An object is sometimes referred to as an
instance of its type.
. The state of an object is dened by the values it carries for the set of properties.
These properties can be attributes of the object itself or relationships with other
objects.
. The behavior of an object is dened by the set of operations that can be executed
on or by the object. Operations may have a list of input and output typed parameters and may return a typed result; and
. An ODBMS has an appropriate meta-model according to which instances of ODBs
are stored and managed.
The ODMG Object meta-model (Fig. 1) species what is meant by objects, literals, types, operations, properties, attributes, relationships, and so forth. It includes
signicantly richer semantics than the relational model does, by declaring relationships and operations explicitly. In addition, the ODMG standard is based on a
common object model and uses several aspects of OMG's object model. It supports
types (interface) and classes (implementation), encapsulation, inheritance and
polymorphism.
support
Class
Instantiate
key_list
extent_name *
super_class 1
extends
Operation
signature
has
invoke
return
return_abnormally
Property
Object
OID
has_name?
names
class
create
delete
exits
same_has?
Attribute
attr_name
attr_type
set_value
get_value
Traversal path
path_name
to_cardinality
to_type
traverse
creator_iterator
2
Relationship
define
1
1227
We next shortly explain the basic concepts of the ODMG Model; for more details
the reader is referred to the Object Data Standard: ODMG 3.0.14
Type
Object
Atomic Obj.
Collection Obj.
Set <>
Bag <>
List <>
Array <>
Dictionary<>
Literal
Structured Obj.
Date
Time
Timestamp
Interval
Atomic Lit.
Long
Short
Ulong
Ushort
Float
Double
Character
Boolean
string
octet
enum<>
Collection Lit.
Structured Lit.
Set <>
Bag <>
List <>
Array <>
Dictionary<>
Structure<>
Date
Time
Timestamp
Interval
1228
M_id (K, N)
Name (S)
Description (S)
Material
Client
C_id (K,N)
Name (S)
First_Name (S)
struct Address {number (N), street (S), city_name (S)}
requests
used_by
P_id (K, N)
Name (S)
Description (S)
Article
belongs_to
Model
concern
S_id (K, N)
Description (S)
Set_Model Price (void)
Machine
executes
MO_id (K,S)
Name (S)
Price (N)
estimated_for
designed_for
needs
used_for
study
Elaborated_for
requires
uses
Estimate
detail
contains
related_to
Quantities (N)
Price (N)
Total_detail (N)
Total_detail_TI (N)
MC_id (K, S)
Name (S)
State (B)
Hour_cost (N)
Purchase_date(D)
E_id (K,N)
E_date (D)
Total_price (N)
Total_price_TI (N)
Estimate
associated_to
Task
P_id (K,S)
Name
Affectaion_date (D)
Wage(N)
set Struct qualification {Name (S),
specialty (S), date (D)}
carries out
O_id (K,N)
Description(S)
Personnel
carried out_by
Automatic
A_Duration(I)
A_cost (N)
Manual
Semi-automatic
SA_Duration(I)
SA_cost (N)
M_Duration(I)
MO_cost (N)
executed_by
to their object by horizontal lines; attribute and operation types are specied
between parentheses \(N)umeric, (S)tring, (B)oolean, (D)ate, (I)nterval"; key
attributes are tagged with K; relationships are drawn by arrowed lines between
objects as follows:
: one-to-one,
: one-to-many,
: many-tomany and
: extends relationship.
In this example, an Estimate is elaborated for a Client and contains Estimate
details. Each Estimate detail concerns one Model, needs a Study to dene associated
Materials, Articles and Tasks. A Task can be Automatic, and therefore is executed by
Machine, Manual i.e., carried out by Personnel, or Semi-automatic requiring both
Automatic and Manual interventions. We will refer to this example to illustrate our
DM design method.
2.2. The multidimensional model
A conceptual schema is used to dene logical schemas supported by a class of software systems (e.g., relational DBMS). Then, a logical schema is translated into a
physical one, supported by a specic software system (e.g., Oracle).34
In the data warehousing context, it was early realized that traditional conceptual
models for database modeling, such as the E/R model,4 do not provide a suitable means
to describe the fundamental aspects of such applications. The crucial point is that, in
designing a DW, there is the need to represent explicitly certain important characteristics of the information contained therein. These characteristics are not related to
the abstract representation of real-world concepts, but rather to the nal goal of the
data warehouse: supporting data analyses oriented to decision making.57
In the last few years, multidimensional modeling has attracted the attention
of several researchers who dened various solutions each focusing on the set of
information they considered strictly relevant. Some of these solutions have not2,46 or
1229
have limited12 graphical support, and aim at establishing a formal foundation for
representing cubes, hierarchies and an algebra for querying them. On the other hand,
we believe that a distinguishing feature of DW conceptual models is that of providing
a graphical support to be easily understood by both designers and decision makers
when discussing requirement and validating conceptual solutions. Hence, we opted to
use the Dimensional Fact Model (DFM) which is a graphical conceptual model,
specically devised for multidimensional design. DFM was rst proposed in 1998 by
Golfarelli et al.25 and has been continuously enriched and rened in order to suit
optimally the variety of modeling situations that may be encountered in real projects
from small to large complexity.27
A conceptual design according to the DFM consists of a set of fact schemas
(thereafter multidimensional schemas) where the basic concepts are facts, measures,
dimensions and hierarchies.27 Figure 4 is a meta-model grouping these concepts.
A formal denition of these concepts can be found in the DFM.26 Here, we
informally present these concepts through the example shown in Fig. 5 which depicts
an introductory multidimensional schema according to the DFM model. This schema
allows decision makers to analyze the Sales fact.
year
month
dateID
Hierarchy of store
country state
Dimensions
STORE
STORE
sname
Dimensional Attributes
brand
quantity
quantity
unitPrice
unitPrice
Non-Dimensional Attribute
managerG
Fact
DATE
DATE
SALES
SALES
city storeID
week
productID type
PRODUCT
PRODUCT
pname
customerID
CUSTOMER
CUSTOMER
cname
phone
Measures
markGroup
category
1230
A fact is a focus of interest for the decision-making process; typically, it models a set
of events occurring in the organization and/or its environment. A fact is graphically
represented by a box divided into two compartments, the upper compartment is for
the fact name and the bottom one is for the fact measures. Examples of facts in
the commercial domain are sales, shipments, purchases; others in the nancial domain
are stock exchange transactions, contracts for insurance policies, etc.
A measure is a numerical property of a fact; it describes a quantitative aspect of
interest for decisional analyses. For instance, the fact Sales has two measures:
quantity and unitPrice. Measures should be numerical because they are used for
computing aggregated values using aggregate functions (e.g., Sum, Average, Count).
Rarely, a fact may have no measures, this happens when the only interesting thing to
be recorded is the occurrence of events; in this case the fact is said to be empty and is
typically queried to count the events that occurred.
A dimension is directly linked to a fact considered as an association linking
dimensions. Dimensions of a fact set the context of recording its measures and,
therefore represent the fact analysis coordinates. Graphically, dimensions are represented as rectangles attached to the fact by straight lines. Typical dimensions for
the Sales fact (Fig. 5) are Product, Customer, Store and Date. Usually one of the fact
dimensions represents the time which is necessary to extract time series from the DW
data. In addition, note that, in the multidimensional schema of Fig. 5, each measure
depends on all dimensions; i.e., concerns one product, sold to one customer, delivered
from one store at a given date.
Aggregation is the basic OLAP operation since it produces summarized information from large amounts of detailed data. An aggregation is carried out on
measures thanks to dimensions. For instance, we can compute the total amount of
sales (sum of quantity * unit price) by Product (or even by any combination of
dimensions). In addition, this total can be obtained at dierent levels of details (by
Product type or category) thanks to the denition of dimensional attributes organized
into hierarchies.
A dimensional attribute (also called parameter) is a property of a dimension. It is
graphically represented by a circle. Relationships between dimensional attributes are
expressed by hierarchies. A hierarchy of a dimension d is a directed graph, rooted
from the identier of d, whose all nodes are dimensional attributes of d and, whose
arcs model many-to-one associations between pairs of dimensional attributes. Hierarchies determine how primary events can be aggregated into secondary events and
selected signicantly for the decision-making process. In Fig. 5, the hierarchies of the
Product dimension enable us to aggregate measures by type and markGroup or by
type and category.
Note that each dimensional attribute may functionally determine some nondimensional attributes; these latter are called descriptive (or weak) attributes. They
are linked to their corresponding dimensional attributes by a line. For example, in
Fig. 5, the dimensional attribute StoreID has one descriptive attribute called sname
(for store name).
1231
3. Related Works
In the literature, DW/DM development approaches are classied into three categories: (1) Data-driven approaches12,13,18,24,29,30,32,36,42,50,61 which rely on the
analysis of the corporate data model of the OLTP (On-Line Transaction Processing)
system and its relevant transactions; (2) User-driven approaches11,37,45,51 which start
from a set of analytical requirements dened by the decision makers of the future
(DSS); and (3) Mixed approaches7,8,12,18,20,24,49,55 which combine data-driven and
user-driven approaches in order to prot of their oered advantages.
User-driven (aka. top-down) approaches presume that users have enough expertise in expressing their analytical requirements in order to design schemas loadable
from the organization OLTP system. Mixed approaches are advisable when the data
source model (i.e., logical schema) is well known and has substantial size and complexity.28 Finally, data-driven (aka. bottom-up) approaches benet from two major
advantages: rst, they help decision makers since they oer potential multidimensional schemas built on the source data model and, second they guarantee that
the organization's OLTP system can feed the user-selected schemas with the needed
data. In data-driven approaches, user-requirements elicitation is voluntary neglected. In fact, Inmon35 argued that user-requirements are the last thing to be considered in a DSS development since they are well understood after the DW is
populated with data and query-results are analyzed by decision makers. Considering
these advantages, we elected to propose a DM design method within the data-driven
approaches category.
Current DW/DM design methods consider either E/R,8,12,42,55 relational,18,29,31
XML32,36,60 or object models.22,50,61,62 Since this paper treats ODB, in this section we
restrict our review to works that consider object models.
From our point of view, the research combining the DW and object paradigm
elds follows three main lines.
The rst research line applies the object paradigm for the multidimensional
modeling of the DW schema. Works of this line propose multidimensional conceptual
object-oriented models based on UML (Unied Modeling Language)1,17,23,39,43,48;
some of them focus their model on a specic area of the DW such as temporal
aspects47 security or access control,21 requirements11,56 and association rules mining
models.63
The second research line addresses the problem of how to build a DW from an
object model. More precisely, it aims at the design of multidimensional conceptual
models from object data sources.22,50,61,62
In the third research line, we gather works that implement a DW by means of an
object or object-relational language/databases.15,38,53,59 They mainly address the
ecient acquisition, storage, query, change control, and schema integration of data
into an object/object-relational DW.
Since, our work aims to design multidimensional conceptual models from object
data sources therefore it ts into to the second research line. Due to space limitation,
1232
in the remainder of this section, we limit ourselves to the context of the second
research line where we review the main referred works.
In this context, Gandhi and Jain22 propose an approach called object-oriented
methodology for DW design. It is a two-step approach:
(1) Transformation of an object model into a relational schema; the authors use
three trivial transformation rules: (i) an object class maps to a table, (ii) an association maps to a table, (iii) a generalization maps to a super class plus a series of sub
class tables. They also supply manually details that are missing from an object
model, such as primary key. (2) Transformation of the obtained relational schema
into a DW schema; to do so, the authors dene ve transformations to be applied
manually; nevertheless, they do not precise in which case each transformation should
be applied.
On their side, Prat et al.50 propose a conceptual design phase that starts with the
denition of a UML class diagram representing the decision makers' initial requirements. This denition uses no multidimensional concepts, thereby enabling maximal
reuse of traditional methods commonly used for OLTP systems engineering. The
designed UML model is then enriched/transformed in order to facilitate subsequent
mapping into a logical multidimensional schema. To do so, the authors dene four
transformations applied on the UML conceptual model: (1) determination of identifying attributes for the classes; (2) manual determination of attributes representing
measures to distinguish them from qualitative attributes (i.e., descriptive); (3)
migration of attributes of 1-1 and N -1 associations into one of the participating
classes; and nally, (4) transformation of the generalizations into aggregations to
enable their automatic mapping into multidimensional hierarchies. After these four
transformations, the logical multidimensional schema is generated, from the enriched
UML conceptual model, through the semi-automatic transformations detailed in
Prat et al.50 \These transformations are semi-automated, more specically in the
logical phases human interaction is required to validate the step-by-step application of
the transformations or to provide information".50 The result of this step is a UML
class diagram extended with the multidimensional concepts. This diagram can be
mapped semi-automatically into a physical multidimensional schema.
Also, Zribi et al.61 proposed a method for the construction of DM schemas,
starting from a UML class diagram, in ve semi-automatic steps: (1) identication of
transaction entities on which they build facts; (2) construction of decisional UMLpackages each containing a transaction entity and its associated classes (directly or
indirectly linked to the transaction entity); (3) graphical annotation of packages to
identify multidimensional concepts resulting from the previous stage
the annotation uses a set of stereotypes to represent the multidimensional concepts; (4)
validation of the annotation by the decisional designer; and (5) automatic generation
of a DM star schema modeled according to the DFM formalism,25 from each annotated package.
On the other hand, Zepeda et al.62 proposed a DW design method based on the
UML class diagram and user requirements. This method is divided into two phases.
1233
The rst phase starts with facts identication; its goal is to identify the entities that
are candidates to become facts. Once facts are identied, a recursive algorithm is
applied to nd their dimensions. This algorithm accepts the UML class diagram with
the set of candidate facts, and then produces a snowake schema for each candidate
fact. The second phase gets a set of metrics computed on the bases of user requirements. These metrics help selecting snowake schemas from the candidate ones
previously generated.
In summary, we notice the six following shortages in the so far proposed DW/DM
design methods:
(1) Most DW/DM design methods relying on an object data-source start from a
UML class diagram. However, in practice, such diagram either does not exist
within an organization, or it may be obsolete so that it does not reect the
modications due to the evolution of the OLTP system.
(2) The method starting the design from an ODB22 goes through one unnecessarily
intermediate modeling level and, manually performs the transformations.
(3) Current methods consider that the built candidate DMs are equally important
whereas some of them may be insignicant for the decisional process.
(4) Except for the method of Prat et al.,50 the proposed methods try to represent the
main DM properties at a conceptual level by abstracting away details of an
envisaged implementation platform. \Unfortunately, none of these approaches
denes a set of formal transformations in order to: (i) univocally and automatically derive every possible logical representation from the conceptual model;
or (ii) help the designer to obtain the most suitable logical representation of the
developed conceptual model ".41
(5) The existing methods for the design of DW/DM based on object sources require a
manual human intervention. However, they necessitate a high expertise both in
OLAP domain and in the ODB domain.
(6) In the DW/DM design process, current works neglect some specics of the object
data-sources, like complex attributes (i.e., collections, structures) and method
denitions.
In this paper, we propose a DM design method that overcomes these limits.
To reach this objective, rst our method relies on the recent version of the object
data source that we directly extract from the ODBMS repository; this latter contains
the OBD meta-data (i.e., structures of objects, methods). Second, it automates
the main design steps and assists the designer in the choice of relevant multidimensional concepts among those extracted; this assistance is ensured by assigning
to each concept a relevance level reecting its analytical potential for the decision
making. Third, it keeps track of the origin of each component in the generated DM
schema. This traceability is fundamental both to automatically derive logical representations and to prepare the generation of ETL (Extract Transform and Load)
procedures.
1234
1235
Note that, in the remaining of this paper, ODB, fact, dimension and hierarchy
denote their schemas; and objects indicate object classes. When we refer to instances
of these concepts, we will explicit this.
Now we detail the identication step; it starts with facts identication and then
continues with measures, dimensions and hierarchies identication.
The term numeric attribute covers ODB attributes that are atomic, collection or
belonging to a structured attribute (Fig. 2).
. Given a fact F built on O , each object in O is useful to extract measures and
dimensions for F. Thus, considering all the objects in O provides for building
facts covering a large variety of analyses.
. If an object does not have a key attribute, then the designer can intervene to select
one. Such an optional intervention improves the result of our identication method.
1236
Applying rule RF2 on our running example, rst we identify from Model
fModel; Studyg the fact F-Study-Model since the object Model contains the nonkey
numeric attribute Price. In addition, we identify seven other facts: F-estimate detail,
F-estimate, F-Automatic, F-Manual, F-semiautomatic, F-Machine and F-Personnel.
Note that rule RF2 can produce facts built either on specialized or generalized
objects. We next examine whether it is better to group each such fact with its
generalizing/specialized objects into a single fact. For this, we need to examine the
instances of these objects according to the next two sub-rules.
RF2.1. If S is a specialized object of n generalized objects G i n 1) and S produced
a fact F-S (via rule RF2), then replace the fact F-S with a new fact built on
S
S [ ni1 G i .
The name of the new fact is F-S concatenated with all its generalized objects.
The main intuition behind this rule is that a specialized object cannot exist
without its generalizing objects: there is a strong functional dependency from the
specialized object to its generalizing object. Hence, by grouping the specialized object
with its generalizing objects, we construct a richer fact oering more analysis
potentials. As illustrated in Fig. 6, every instance of the specialized object (SO) has
to be necessarily linked to an instance of its generalizing object (GO).
For our example, rule RF2.1 replaces F-Automatic and F-Manual by F-AutomaticTask and F-Manual-Task, respectively, by including the generalizing object Task.
Similarly, F-Semiautomatic is replaced by F-Semiautomatic-Automatic-ManualTask which includes three generalizing objects Automatic, Manual and Task.
RF2.2. If a generalizing object G produced a fact F-G (on G through RF 2), and if
G has not all its instances linked to any instance of its specialized objects, then
maintain F-G as a fact independent of its specialized objects.
Rule RF2.2 guarantees that instances of G not associated with instances of its
subclasses will be analyzed alone through the fact F-G. Figure 6 case 2 depicts this
situation where GO-instance2 has no specialized instance (SO).
For our running example, rule RF2.2 maintains the two facts F-Manual-Task
and F-Automatic-Task without the specialized object semiautomatic. Indeed, in our
database, there are instances of Manual Tasks and Automatic Tasks not linked to
instances of Semiautomatic.
GO
GO Instance 1
GO Instance 2
GO Instance 3
GO Instance 1
SO Instance 1
SO Instance 2
SO Instance 3
SO Instance 1
GO Instance 2
GO Instance 3
SO Instance 2
SO
Case1: Each instance of GO is linked to an instance of SO
1237
1238
(Fig. 2), a numeric attribute can be atomic or a collection (e.g., Set, Bag, List,
Array). A collection is multi-valued. However, fact measures are simple values.
Therefore, a numeric function to compute a single value, from the numeric collection
of attributes identied as measures, should be added by the designer. Such a function
is obviously semantic and domain dependent.
Rules RM1 and RM2 identify measures, issued from attributes and operations,
that are directly belonging to object(s) identied as fact whereas, RM3 extracts
measures from Structure within object(s) identied as a fact. Since, the relevance of
measure decrease when we move away from the initial fact object31 then, we consider
that measures obtained with RM1 and RM2 are more relevant than those obtained
with RM3. In addition, we notice that measures dened on operations (RM1) are
more relevant than those dened on attributes (RM2) because these latter are likely
to be keys or descriptive attributes. Hence, we opt for three relevance levels for
measures in the descending order: RM1 then RM2 then RM3.
For our running example, the application of these three rules produces, for each
identied fact, the measures shown in Table 1.
4.3. Dimension identication
Recall that a dimension represents a business analysis axis. Also, dimensional attributes are organized into one or more hierarchies of levels, which correspond to dierent
ways to aggregate fact measures. To complete our DM design method, we determine for
each identied fact, a set of candidate dimensions either from objects or from attributes.
For dimensions built on objects, we dene the following two rules:
RD1. Given an empty fact F built, with rule RF1, on a relationship between O1 and
O 2 . The transitive closure of each of O 1 ; O 2 is a dimension for F.
Conventionally, the name of this dimension is the concatenation of the object
names in O i.
This rule is graphically explained in Fig. 7.
RD2. Let O 1 be an object directly linked to an object O2 with Max(O 1 ; O 2 (1, *),
and O 2 belongs to a transitive closure identied as a fact F. Then, O
1 builds a
dimension for F.
Conventionally, this dimension name is the concatenation of the object names in O 1 :
Rule RD2 identies as a dimension d, every object linked maximally with (1, *)
because every fact instance is one-to-one linked to an instance of its dimension. In
addition, considering O 1 as a dimension (as opposed to O 1 only) enables us to
extract all dimensional attributes for d.
In order to complete the set of dimensional attributes obtained using rules RD1
and RD2, we add to each specialized object SO belonging to the transitive closure
O 1 on which a dimension d is built, all generalizing objects GO (of SO) together with
their transitive closures (GO ). (The name of d will include names of GO ). For our
running example, the impact of this on the results obtained so far is that we replace
Estimate
(RD2)
Model-Study
(RD2)
Price
(RM2)
Quantities
(RM2)
Total detail()
(RM1)
Total detail TI()
(RM1)
S id
(RDI2)
E id
(RDI1)
M id
(RDI1)
Material
(RD2)
F-Estimate-detail
(RF2)
P id
(RDI1)
Article
(RD2)
Price
(RM2)
M id
C id ! Address
E date ! Day ! Month !
Year
P id
P id
(RDI1)
Personnel
(RD1)
F-Model-Study
(RF2)
MO id ! M id
MO id ! P id
S id ! M id
S id ! P id
M Duration ! Second !
Minute ! Hour ! Day
F-ManuelTask-carriedoutbyPersonnel
(RF2)
O id
(RDI1)
MC id
(RDI1)
Machine
(RD1)
MO id ! M id
MO id ! P id
S id ! M id
S id ! P id
A Duration ! Second !
Minute ! Hour ! Day
Hierarchy
Manual-Task
(RD1)
O id
(RDI1)
Automatic-Task
(RD1)
F-AutomaticTaskexecutedby-Machine
(RF1)
Identier
Dimension
Fact
Table 1.
A Duration
(RDI3)
Model-Study
(RD2)
A Duration
(RD5)
AO cost()
(RM1)
MO cost()
(RM1)
SA cost()
(RM1)
F-Automatic-Task
(RF2, RF21 & RF2.2)
F-Manual-Task
(RF2, RF21 & RF2.2)
F-semiautomatic-AutomaticManual-Task
(RF2 & RF2.1)
Hour cost
(RM2)
Wage
(RM2)
F-Machine
(RF2)
F-Personnel
(RF2)
AO cost()
(RM1)
MO cost()
(RM1)
S id
(RDI2)
E date
(RD5)
Aectation date
(RD5)
Qualication
(RD3)
State
(RD4)
Purchase date
(RD5)
Aectation date
(RDI3)
SeqId qualication
(RDI4)
State
(RDI3)
purchase date
(RDI3)
A Duration
(RDI3)
SA Duration
(RDI3)
M Duration
(RDI3)
M Duration
(RD5)
A Duration
(RD5)
SA Duration
(RD5)
S id
(RDI2)
M Duration
(RDI3)
M Duration
(RD5)
Model-Study
(RD2)
S id
(RDI2)
Model-Study
(RD2)
E date
(RDI3)
C id
(RDI1)
Client
(RD2)
F-Estimate
(RF2)
Total Price()
(RM1)
Identier
Dimension
Measure
Fact
Table 1. (Continued )
Address
Hierarchy
Second ! Minute !
Hour ! Day
Second ! Minute !
Hour ! Day
Second ! Minute !
Hour ! Day
P id
M id
Second ! Minute !
Hour ! Day
P id
M id
Second ! Minute !
Hour ! Day
P id
M id
1240
Y. Hachaichi & J. Feki
O1+
1241
O2+
O3
O5
Path_name1
O1
Path_name2
O2
O6
O4
Dimension O 1-O3-O4
(RD1)
F-O1-Path_name1-O2
(RF1)
Dimension O 2-O5-O6
(RD1)
1242
1243
1244
Corresponding hierarchy
day ! year
day ! month ! year
structured objects. We have studied the factory of ODMG operations for creating
such objects and we have dened for each one its corresponding hierarchy. Table 2
gives the most used constructors for temporal attributes and their corresponding
hierarchies. In Table 2, Attribute1 ! Attribute2 means that Attribute1 has lower
granularity than Attribute2.
Similar to the classication of dimensions, we consider that dimensional attributes produced by RDA1, RDA3 and RDA4 are more relevant for decision
making than those obtained with RDA2. The last ones are issued from complex
attributes. For our running example, Table 1 (column 5) lists the hierarchies for each
dimension.
We continue the construction of hierarchies by identifying for each dimensional
attribute its weak attributes (i.e., nondimensional attributes), if any.
4.4.3. Weak attributes identication
To identify weak attributes, we dened the following four rules.
RND1. Key attributes of objects belonging to O that produced a dimension d, other
than those extracted through RDI 2 as dimension identier for d, are weak attributes
for d identier.
We have already introduced this rule in RDI2. The idea is to associate for the
identier of dimension d a descriptive attribute from each object that participates to
d construction: the Mo id attribute becomes a descriptive attribute for the ModelStudy dimension (Fig. 8).
1245
M_id
MO_id
Price
Description
State
MC_id
Machine
Machine
F-AutomaticTaskF-AutomaticTaskexecutedby-Machine
executedby-Machine
Name
Hour_cost
Name
Description
P_id
O_id
Name
Description
M_id
Automatic-Task
Automatic-Task
Description
S_id
Name
Description
P_id
Price
Description
Name
Description
M_id
Year Month Day E_date
F-Estimate-detail
F-Estimate-detail
E_id
Address C_id
Estimate
Estimate
Price
Price
Quantities
Quantities
Total_detail()
Total_detail()
Total_detail_TI()
Total_detail_TI()
S_id
Model-Study
Model-Study
Name
Description
P_id
MO_id
Name
Name
Description
Price
Description
1246
Mt Universal Method
Mt Internal Function
Mt Documentation
Mt Overridable
Mt Message
Mt Interpretation Of
Mt Selector
Mt Documentation
Mt Interpretation
Mt Method
Classe
Attribute1
Attribute2
...
Relationship
Inverse Relationship
Mt Subclasses
Mt Internal Function
Mt Documentation
Mt
Ow
nM
Mt
M
eth
od
Mt Attribute
eth
od
Mt Superclasses
Mt Class
Mt Attributes
Mt Successors
Mt
C
ritir
ria
ion
Mt Classes
rite
Mt Relationship
Mt Relationships
Mt Name
Mt Instance Check Function
Attributes Of
Mt
C
Relationship Of
Of
Mt Index
Mt Name
Mt Default Value
Mt AttributeCheck Function
Mt Make Entry Function
Mt Before Attribute
Modification Function
Mt Type
Legend:
Mt Successors Of
Mt Inverse
Relationship
Mt Index
Mt Name
Mt Criteria Size
Mt Criteria Order
Fig. 9.
Mt Name
Mt Relationship Check Function
Mt Before Adding Successor
Mt After Adding Successor
Mt Before Removing Successor
Mt After Removing Successor
Mt Cardinality
Mt Inverse Relationship
Main_shareholder
Percentage_of_shareholder
(List(N))
Shareholder
Wage(N)
shareholder_name(K,S)
date_recrutement(D)
1247
publicshareholder_level(S)
Public_shareholder
nbr_ans_exp(N)
Private_shareholder
media_name(k,S)
Media
Get_nbr_ans_exp
(N)
advertising_price(N)
Set_media_name(void)
Company
region(k,S)
media_type(S)
Media_type
Region
manager_
name(S)
Person
nbr_exp_
travail(N)
number_of_inhabitants(N)
Set_region(void)
media_unit(S)
Get_media_type(S)
quantity(N)
Exposure
media_exposure(N)
amount(N)
Purchase
Total_quantity(N)
Date
dd_mm_yyyy(k,S)
Target
Quarter
quarter(k,S)
product_code(k,S)
Product
Year
year(k,S)
Influence_of_company
quarter(k,S)
Quarter
Target_code(K,N)
Status(S)
minimum_age(S)
maximum_age(S)
sex(S)
pourcentage_of_region(N)
product_name(S)
influence_of_coefficient(N)
product_type(k,S)
Product_type
Advertising_campaign
product_unit(S)
campaign_code(k,N)
CAME-BDO features cover the three steps of our design method (Fig. 11):
.
Object database schema retrieval. This step, rst displays the list of databases
implemented under the MATISSE ODBMS. After the selection of one of these
ODB sources, the DM schemas construction process starts. CAME-BDO accesses
the MATISSE meta-model (i.e., repository), extracts the objects of the selected
data source and displays them in a tabular format as depicted in the interface of
Fig. 12. In this GUI, the designer can see attributes, methods (operations),
relationships, and the type (generalized, specialized or normal) of every pointed
object. Here, the designer can keep all/some objects for the DM construction
process (i.e., Select/deselect objects) and, nally launches the identication of
multidimensional concepts in order to obtain all DM schemas.
1248
CAME-BDO Functions
MPI-EDITOR
Multidimensional concepts
Multidimensional
identification andconcepts
star
identification
and star
schema
construction
schema construction
DM Schemas
DM Schemas
display
and
display and
adjustment
adjustment
XML
XML
XML
CAME-BDO Repositories
ODBMS
MATISSE
Repository
Extracted schemas
repository
Multidimensional
schemas repository
Current object
Select/deselect objects
Fig. 12. The schema of the ODB of Fig. 10 as extracted from MATISSE repository.
1249
Identified facts
Empty
facts
Fig. 13. (Color online) Identied facts, measures and dimensions in the Media planning ODB.
Applied on the example of Fig. 10, our identication rules construct 10 multidimensional schemas (Fig. 13). For example, the obtainment of the schema of the
fact called F-Main shareholder follows steps below:
RF1 identies F-Main shareholder as a fact; indeed, Main shareholder
fMain shareholderg and Main shareholder contains the nonkey numeric attribute
Percentage of shareholder.
RM2 identies the nonkey numeric attribute Percentage of shareholder as a
measure for F-Main shareholder.
RD2 identies each of the objects Shareholder, Media and Date as a dimension
for F-Main shareholder: each one is linked to Main shareholder with maximum
1250
Fig. 14. DM schema built with CAME-BDO and displayed using MPI-Editor.
multiplicities (1, *). Applying RDI1, the identier of each of these dimensions is
the key attribute.
Date and Media hierarchies are constructed using RDA1 (applied twice for
Date).
The hierarchy of the Shareholder dimension is constructed by RDA4; the ODMG
Calendar date constructor (Table 2) gives its last three parameters.
.
The interface of Fig. 14 shows graphically a DM schema constructed with CAMEBDO, it is displayed according to the DFM graphical notation. This interface is
obtained with MPI-Editor 5 toolset which communicates with CAME-BDO through
XML technology. The decisional designer can validate the schema by adding derived
measures, removing and/or renaming dimensional elements. These adjustments can
be performed either through the CAME-BDO tabular format, or through the MPIEditor GUI.
1251
CAME-BDO evaluation.
Measure
Dimension
Dimensional attribute
2/1
3/3
5/3
2/1
10/9
9/8
6/6
9/8
11/9
3/3
10/8
8/8
5/4
10/8
5/4
9/6
a We
b We
CAME-BDO (and thus our method) identies all the facts that a bottom-up
analysis can manually gure out; moreover, it nds out empty facts.
. In most cases, CAME-BDO extracts more measures than those obtained manually;
this is because we consider both operations and multi-valued numerical attributes.
However, we noticed that CAME-BDO does not identify calculated measures since
they are semantics dependent. In practice, these measures can be added manually
by the designer in the third step of our method.
. The number of dimensions and dimensional attributes extracted by CAME-BDO
is slightly higher than those obtained manually. Thus, CAME-BDO builds schemas
that oer a larger panoply of analyses. This variation in numbers is due to the fact
that our rules take into account Boolean, temporal and structured attributes, and
are not limited to work on objects.
1252
7. Conclusion
In this paper, we have tackled DW design issues at the conceptual level. More
specically, we have proposed an automatic DW design method that starts from an
ODB compliant to the ODMG standard and generates DM schemas modeled
according to the well-known DFM model. In order to generate DM schemas based on
the most recent version of the ODB source, our method extracts the ODB logical
model directly from the ODBMS repository. Thus, the generated DMs can reect all
of the organization activities.
Furthermore, to automatically generate DM schemas, we have dened a set of
rules for the extraction of the DM schema components (i.e., facts, measures,
dimensions with their attributes organized into hierarchies). Our rules have the
merit to be independent of the semantics of the ODB source domain. To be independent, they take advantage of the structural-semantics oered by the ODMG
object model. In an additional attempt to assist the DW/DM designer, our rules
assign to each component of the generated DM schemas a relevance level reecting its
analytical potential. This assists the designer to choose those DM concepts that are
more interesting for the decision-making process. Moreover, and as a third advantage, our method keeps track of the origin of each component in the generated
DM schema. This traceability has a twofold benet; rst, it is fundamental to
automatically derive logical representations and, second, it helps to generate ETL
procedures.
For this method, we have developed a software prototype, called CAME-BDO, we
used to conduct some experimental evaluations. In fact, CAME-BDO extends our
CAME31 software tool which carries out the design of conceptual DM schemas
starting either from a relational database source,30 or from a set of XML documents
compliant to a given DTD.32 These preliminary evaluations showed the feasibility of
our method in identifying all facts and their measures, dimensions and hierarchies.
To generalize these results, we are looking for a more consistent evaluation on a set
of ODB.
In addition, as an immediate extension of this work, we are currently developing a
software tool for the automatic generation of ETL procedures under OWB (Oracle
Warehouse Builder); some preliminary results in this perspective are recently published.33 Furthermore, we are examining how to integrate adjusted/validated DM
schemas obtained with CAME to build a DW schema loadable from heterogeneous
data sources (i.e., relational database, XML data-centric documents and object
databases), and how to generate platform-independent ETL procedures according to
the MDE (Model Driven Engineering) approach.
Acknowledgments
We would like to thank the anonymous reviewers for their valuable comments and
suggestions to improve the quality of the paper.
1254
Y. Hachaichi & J. Feki
1256
References
1. A. Abello, J. Samos and F. Saltor, Y AM2 (Yet Another Multidimensional Model): An
Extension of UML, In IDEAS '02 (IEEE Computer Society, 2002), pp. 172181.
2. R. Agrawal, A. Gupta and S. Sarawagi, Modeling multidimensional databases, in Proc.
Int. Conf. Data Engineering, Birmingham UK (1997), pp. 232243.
3. D. K. Barry, When an object database should be used (2011), available at http://www.
service-architecture.com/object-oriented-databases/articles/when an object database
should be used.html.
4. C. Batini, S. Ceri and S. Navathe, Conceptual Database Design: An Entity-Relationship
Approach (Benjamin-Cummings Publishing, Redwood City, USA, 1992), p. 470.
5. M. Ben Abdallah, J. Feki and H. Ben-Abdallah, MPI-EDITOR: OLAP requirement
specication tool for the reuse of logical multidimensional patterns, Workshop on Decisional Systems (ASD'06), Maroc (2006) (in French).
6. M. Ben Abdallah, H. Ben-Abdallah and J. Feki, Well formed multidimensional patterns
for the build of data marts, Journal of Global Management Research 16(3) (2008).
7. A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta and S. Paraboschi, Designing data marts
for data warehouse, in ACM Transaction on Software Engineering and Methodology 10
(2001) 452483.
8. M. Bohnlein and A. Ulbrich-vom Ende, Deriving initial data warehouse structures from
the conceptual data models of the underlying operational information systems, DOLAP,
Missouri (1999), pp. 1521.
9. R. Bloor, The failure of relational database, the rise of object technology and the need for
the hybrid database, White Papers & Analyst Reports, Advanced software technologies
for breakthrough applications, InterSystems (2003).
1257
10. F. Bret and O. Teste, Graphical construction of data warehouses and data marts,
INFORSID'99, La Garde, France (1999) (in French).
11. R. Bruckner, B. List and J. Schiefer, Developing requirements for data warehouse systems
with use cases, 7th Americas Conf. Information Systems, Boston (2001), pp. 329335.
12. L. Cabibbo and R. Torlone, A logical approach to multidimensional databases, Conf.
Extended Database Technology, Valencia, Spain (1998), pp. 187197.
13. A. Carme, J. N. Mazon and S. Rizzi, A model-driven heuristic approach for detecting
multidimensional facts in relational data sources, 12th Int. Conf. Data Warehousing and
Knowledge Discovery, Bilbao, Spain (2010), pp. 1324.
14. R. G. Cattell and D. Barry, The Object Data Standard: ODMG 3.0 (Morgan Kaufmann
Pub., Sans Francisco, CA, 2002) p. 288.
15. B. D. Czejdo, J. Eder, T. Morzy and R. Wrembel, Designing and implementing an object
relational data warehousing system, in Proc. IFIP TC6/WG6.1 Third Int. Working Conf.
New Developments in Distributed Applications and Interoperable Systems (2001), pp.
311316.
16. K. Decker, A. Oaks and M. Salinas, Building a cost engineering data warehouse, AACE
International Transactions, IM.06, AACE International, Morgantown (1997).
17. B. Dhawan and A. Gosain, Extending uml for multidimensional modeling in data warehouse, International Journal of Computer & Communication Technology 2(7) (2011)
5964.
18. J. Feki and Y. Hachaichi, Assisted data mart design: A method and a toolset, Journal of
Decision Systems 16(3) (2007) 303333 (in French).
19. J. Feki and Y. Hachaichi, Constellation discovery from OLTP parallel-relations, International Arab Conf. Information Technology (ACIT'07), Lattakia, Syria, November 2007.
20. J. Feki, A. Nabli, H. Ben-Abdallah and F. Gargouri, An automatic data warehouse
conceptual design approach, in Encyclopedia of Data Warehousing and Mining (IGI
Global Publication, 2008).
21. E. Fernandez-Medina, J. Trujillo, R. Villarroel and M. Piattini, Developing secure data
warehouses with a uml extension, Information Systems 32(6) (2007) 826856.
22. S. K. Gandhi and S. Jain, Data warehouse design: An object-oriented approach, Journal
of Engineering, Science & Management Education 4 (2011) 1620.
23. A. Gosain and M. Suman, Object oriented multidimensional model for a data warehouse
with operators, International Journal of Database Theory and Application 3(4) (2010)
3540.
24. M. Golfarelli, D. Maio and S. Rizzi, Conceptual design of data warehouses from E/R
schemas, Conf. System Sciences, Kona-Hawaii, Vol. VII (IEEE Computer Society,
Washington, DC, USA, 1998).
25. M. Golfarelli, D. Maio and S. Rizzi, The dimensional fact model: A conceptual model for
data warehouses, International Journal of Cooperative Information Systems 7(23)
(1998) 215247.
26. M. Golfarelli, The DFM: A conceptual model for data warehouse, in Encyclopedia of Data
Warehousing and Mining, J. Wang (ed.), 2nd edn. (IGI Global, Hershey, PA, 2008).
27. M. Golfarelli, From user requirements to conceptual design in data warehouse design a
survey, in Data Warehousing Design and Advanced Engineering Applications: Methods
for Complex Construction, Part of the Advances in Data Warehousing and Mining
(ADWM) Book Series (IGI Global Publication, 2009).
28. P. Giorgini, S. Rizzi and G. Maddalena, Goal-oriented requirement analysis for data
warehouse design, DOLAP Bremen, Germany (2008), pp. 4756.
29. Y. Hachaichi and J. Feki, From relational to multidimensional model: Data mart design,
EDA 07, RNTI B-3 (2008), pp. 519 (in French).
1258
1259