Data Warehousing
Data Model Overview
Modeling for the Enterprise while Serving the Individual
Debbie Smith
Data Warehouse
Consultant
Teradata Global
Sales Support
Data Model Overview
Table of Contents
Executive Summary
Introduction
Revisit Dr. E. F. Codds 12 Rules
So What is an EDW?
Data Infrastructure
Data Modeling Theory
From Logical to Physical Modeling
Physical Models
Impact of Information Delivery Tools
10
Impact of ETL Tools
10
Surrogate Keys
11
Changing Dimensions
12
The Teradata Direction
13
Impact of the Data
Warehousing Evolution
13
Summary
14
Appendix A Codds 12 Rules
15
Appendix B A Comparison
16
Appendix C Glossary
17
Appendix D Endnotes
18
Appendix E References
18
EB-2406
> 1007 > PAGE 2 OF 18
Executive Summary
The data model choice for the data warehouse is often a matter
of great controversy. The desire is to offer a self-service type of
environment that allows business users easy access with acceptable
response times. Response time also includes the time required
between the conception of a new application and the delivery of
that application. What data model do you employ to provide ease
of use for the business user while still being able to address current
and future needs of the data warehouse in terms of updating,
expansion, availability, and management? This paper will provide
an overview of popular data modeling and the Teradata Corporation position regarding data modeling.
Data Model Overview
Introduction
> number 8; Physical Data Independence
enterprise. Before continuing, lets discuss
Data model choices are often a matter of
where the user is isolated from how
the environment that Teradata refers to
great controversy when discussing build-
the data is physically stored
when we say enterprise data warehouse.
ing a data warehouse. When building the
> number 9; Logical Data Independence
We define an EDW as an area where the
data warehouse, how do you build it? Do
where the user is not impacted should
data of the business (the enterprise) is
you attempt to build it for the business
the physical data structure change
centrally integrated, centrally stored, and
entity current and future or do you
accessed through common business
build it in a manner that satisfies current
These three guidelines specifically provide
defined methods. We believe, and our
business users needs? This paper will
a methodology to isolate users from the
customers have shown us, that the value of
discuss the idea of being able to build the
impact of IT activities and directly impact
centralizing the data is the synergy gained
centralized enterprise data warehouse to
and lay the foundation for being able to
by storing the data once, managing it once
sustain longevity, and use view layer to
build a logical, user-friendly data structure
and accessing it for and in many varied
form fit individual business user needs
that is independent of the physical data
ways, times, methods, and reasons. (By
and requirements, combining the benefits
structure. This foundation provisions for
gathering and integrating the data of the
of a normalized model with the benefits of
form fitting individual business users
business, business users are able to have a
a dimensional model.
needs while maintaining a physical model
360-degree view of the data.) Of course,
that facilitates the needs of ETL, update
the accumulation of business data is an
Revisit Dr. E. F. Codds
frequencies, and data management
ongoing process and ever evolving. One
12 Rules
requirements, enabling the enterprise data
of the goals is to develop the long-term
In 1985, Dr. Codd published a list of 12
warehouse with user friendliness.
strategy that provides a methodology of
principles that have become the design of
relational database systems guidelines (see
So What is an EDW?
Appendix A).1 Dr. Codd is credited as the
The foundation of this paper is built on
creator of the relational model, and within
the concept of an enterprise data ware-
these guidelines, he outlines the structur-
house (EDW). Many people use the term
ing of information into tables, that null
data warehouse to represent similar
values are properly identified and a high-
concepts; however, there can be some
level of insert, update, and delete is
variances in what is meant. It is generally
maintained. Note guidelines 6, 8, and 9.
accepted that the data warehouse is the
> number 6; View Updating Rule where
logical views support full range data
manipulation
environment that provides for decision
support within the organization. By
appending the word enterprise, that
environment is now thought of or expected to become reflective of the entire
EB-2406
> 1007 > PAGE 3 OF 18
adding and refining business data. The
higher the degree of customer success, the
greater degree the infrastructure is able to
answer any question, from any user
(internal or external) on any data at any
time, including the yet unknown queries
or unknown future application needs,
without the need or intervention of
additional resources. This synergy allows
the business to use the information of the
business to quickly respond to and create
changes in the market place.
Data Model Overview
Data Infrastructure
the enterprise with increasing speed. It is
With more than 35 years of experience in
The data model is the core of the data
also driving changes in how the enterprise
database technology and data warehouse
warehouse. The decisions made when
conducts business by providing insight
design, Bill Inmon is recognized as an
defining the data model determine the data
into current business practices. These
authority in the data warehousing indus-
infrastructure. This data infrastructure can
increasing requirements of the data
try. His lectures and papers promote a
impact performance, time to market for
warehouse impact data modeling choices.
normalized model as the model of choice
new applications, facilitate responses to
The goal of the EDW is to address the
for the data warehouse, reserving the use
changes in the market place, business users
needs of the enterprise. This requires a
of star schema models for data marts if
ability to retrieve information, data latency,
data model that provides an enterprise
and when necessary. He states, It is not
and the optimization of the data ware-
solution rather than a localized solution
possible to build an effective DSS environ-
house over the long term. The focal point
that only addresses specific application
ment without an enterprise data ware-
of the data warehouse data modeling
needs for individuals within the organiza-
house. The design of the enterprise data
discussion typically covers normalization
tion. The charge of the EDW is to enable
warehouse is typically normalized. The
theory and dimensional theory.
the business at an enterprise level. To do
classical entity relationship structuring of
that, hard decisions concerning data
data accompanied by the ensuing normal-
modeling must be addressed. Questions
ization of the data structures fits very
such as: should the data model reflect the
conveniently with the requirements of the
needs of the enterprise or the needs of the
usage of the enterprise data warehouse. 2
Teradata Database, as an RDBMS, is and
always has been agnostic about what well
defined data model is chosen. There are
many customers who execute a normalized model, others using a snowflake
model, others using a star schema, and
many others using some variation or
derivation of each/any and all. With that
said, it is important to note that Teradata
as a company, with its vast experience in
data warehousing, does have preferences to
ensure the data model employed provides
the highest flexibility and responsiveness
to the business for not only current needs
but also future needs.
individual? What is the impact on the data
model as the needs of the individual
and/or the needs of the enterprise change?
Do you have to sacrifice one over the
other? What if you could minimize those
sacrifices? Experience has shown us that
with Teradata Database, it is possible to
take full advantage of Codds guidelines of
physical data independence, logical data
independence, and view updating to create
a data model that provides the flexibility
to satisfy both current and future needs of
As data warehouses are evolving and taking
the enterprise. And then, through the use
on greater responsibility and active roles in
of views, create a view model that address-
how an enterprise conducts business, data
es the needs of the individual. This
model requirements, and expectations are
provides a methodology to utilize both the
changing at a faster rate to support this
normalized model and augment it with
increased responsibility. The data ware-
views that use dimensional modeling
house is expected to reflect changes within
techniques, taking the best of both worlds.
EB-2406
> 1007 > PAGE 4 OF 18
Data Modeling Theory
Entity-Relationship Modeling
Before designing and establishing the
physical data model, theres a logical data
modeling process. This process includes
extended discussions with the business
community. During these discussions, the
business requirements are identified, the
data entities and elements required to meet
those business requirements are established, and the relationships between the
data entities are captured. These insights
are later diagrammed into a representation
of the business, and referred to as an Entity
Relationship (ER) model. An ER model or
diagram represents the data entities of the
business and the relationships between
those entities. At this stage, the specific
functions or queries the model will be used
Data Model Overview
Customer
Entity
CustomerName and CustomerAddr are
attributes of Customer
CustomerName
CustomerAddr
Places
Relationship
Business Rule
> A Customer places zero or more Travel-Requests
> A Travel-Request is placed by only one Customer
Boston to
Entity
New York
TravelRequest
Figure 1. Example ER Diagram
to support are not included. This is a
functions. It identifies the dimensions and
The dimensional model is often thought
logical representation of the enterprise and
levels within the business, separating out
of or represented as a cube with dimen-
is later used to develop a physical data
the facts or measures of the business. The
sions such as time, product, and
model or schema. There are a number of
dimensional model enforces additional
geography. The business metric is at the
tools available to map the ER diagram,
rules that eliminate many to many rela-
intersection of these dimensions. Visualiz-
ERwin, now known as AllFusion Erwin
tionships between entities, allowing only
ing the dimensional model as a cube,
Data Modeler, being the most recognized.
many-to-one relationships. It fuses
sometimes referred to as a star schema,
ER modeling is guided by the rules of
multiple entities together into a new
(See Figure 2.) makes it easy to imagine
normalization. These are strict rules meant
entity. This new entity does not directly
being able to slice and dice that segment of
to ensure the essence of business relation-
reflect the entities and relationships that
data. Creating these cubes for reporting
ships is captured. Examples of the
occur in the business, but is established for
requires understanding of what questions
components (entities and relationships) of
the convenience of storing a data point or
will be asked before designing. Each cube
an ER diagram are shown in Figure 1.
metric that is important to the targeted
is designed to facilitate quick and easy
group of business users. The goal of the
access to a specific business application.
Dimensional Modeling
Dimensional modeling is another logical
design method used to organize data for
functional groups of users or business
EB-2406
> 1007 > PAGE 5 OF 18
dimensional model is to provide a presentation layer that facilitates easy navigation
of the data for business users and quick
access to reports.
Data Model Overview
Advertising
Fiscal
Calendar
Ad Year
Year
Year
Product
Geography
Dept
Quarter
Quarter
Period
Month
Ad Period
Region
Minor
Dept
Category
Sub
Category
District
Ad Week
Week
Week
SKU
UPC
Day
Store
Sales
Figure 2. A Cube (Star Schema)
The dimensional model flattens the
week to a fiscal week providing a method-
With a dimensional model, the central fact
dimensions via denormalization. Business
ology for drilling through to lower levels.
table is forced to a single grain, causing the
rules are used to validate and ensure
In effect, the functional rules and process-
initial fact table design to become brittle
additional restrictions (as in Figure 2).
es are accommodated within the model.
or inflexible. You cant incorporate new
For example, many companies have a
This, in turn, means that much more
data sources without breaking the original
fiscal year that differs from the calendar
needs to be understood about the queries
star schema design or creating separate
year; they may even have another time
and outputs that the model is expected to
fact tables or data marts. To get an enter-
dimension, such as advertising, which
support. A dimensional model can be
prise view, you would then have to drill
differs from both calendar and fiscal.
created from an ER model, however, an
across these different stars or data marts.
If/when the advertising week does not fall
ER model could not be created from a
within the fiscal week, business rules will
dimensional model. Once entities are
be used to determine and assign the ad
fused together, separating those fused
entities is difficult, if not impossible.
EB-2406
> 1007 > PAGE 6 OF 18
Data Model Overview
From Logical to Physical
Modeling
Why would you create a physical model
Less Data Redundancy
that is different from the logical model?
The physical model often differs from the
logical model to yield the best overall
performance for specific database tech-
Snowflake
Star
Flattened
nologies, and not all database technologies
are equal. Technologies differ so much
Normalized
Denormalized
that in an article written for Intelligent
Magazine Neil Raden states, As you know,
Simpler Data Access
the primary schema for a data warehouse
is either a star schema or a normalized
schema. The latter is a term so loosely
Figure 3. The Modeling Spectrum
defined that its hard to describe, but a
normalized schema typically resembles a
third (or higher) normal form (3NF)
is where the greatest variation, deviation,
accomplished this, you can see the busi-
schema thats not dimensional. These 3NF
and difference of opinion takes shape,
ness functions as if looking down from a
designs dont support query and analysis.
which of course, forms the basis for the
skyscraper. The denormalization survival
Their sole purpose is to act as a staging
confusion surrounding data modeling. On
guide then provides guidelines of how to
area, an upstream data repository for a
one end of the spectrum is the flattened
transform the normalized logical model
series of star schemas, online analytic
denormalized model; while on the other
into a physical model. This survival guide
processing (OLAP) cubes, and other
end is the normalized model. Of course,
provides some of the principles for denormal-
structures that are directly queried by
the reality is that most customers develop
ization, the basics of which are to minimize
analysts. The only routine exception to this
a data model that falls somewhere between
denormalization so as not to compromise the
is Teradata implementations: Because of
the opposing ends, or some combination
business current and future needs.
the unique characteristics of the massively
of both. This is often the result of the data
parallel architecture and database optimiz-
warehouse evolving to understand busi-
er, Teradata can process analytical SQL
ness demands.
Normalized Physical Model
As you move along the modeling spectrum
against a 3NF schema with acceptable
performance.
Physical Models
In his book Data Modelers Workbench,
(see Figure 3), the further to the left your
Steve Hoberman discusses what he refers
data model is, the greater the degree of
This implies the physical model is often
to as the normalization hike and the
normalization. A normalized model
changed from the logical model not
denormalized survival guide. Hobermans
separates data into tables based on very
necessarily due to business requirements
principles state that the logical model is
strict rules that relate an attribute to the
but to facilitate the technology being used
completely normalized, up to fifth normal
primary key. One of the fundamental rules
and ensure query speed. This transition
form. His analogy states that once youve
of normalization is the elimination of data
from the logical model to physical model
EB-2406
> 1007 > PAGE 7 OF 18
Data Model Overview
redundancy within the model, keeping
tables in a normalized model, the greater
power users) were created within the
each attribute within each table function-
the degree of normalization, the greater
organization to code and generate reports
ally dependent upon the primary key.
the degree of storage space conservation
for business users. These power users
By following this rule, eliminating data
because data is not duplicated. However,
became so busy that requests were often
redundancy within the model, the number
denormalized structures, such as summary
queued for weeks at a time. Business users
of tables along with the complexity of the
tables, will and often do exist in a data
wanted direct access to information, and
data model increases. As new data subjects
warehouse that has been developed under
they wanted it instantaneously. Denormal-
for the data warehouse are identified, the
the constructs of normalization. When the
ized models provided for simple, easy
model is extended following the same
business value warrants the cost to build
navigation of the data. The data are
normalization rules to include the new
and manage summary table structures,
typically aggregated along one of the lines
data subjects.
they are a viable augmentation of the
of dimension. The types of queries
normalized data model.
requested are well understood, and the
[While data warehouse modelers are often
model is established to answer those
embroiled in controversy over how to model
Denormalized Model
the data warehouse, the world of transac-
Moving right along the modeling
tional databases is in agreement that a
spectrum (see Figure 3) the degree of
normalized model is the optimal choice. As
denormalization increases. A flattened
data warehouses evolve towards transaction-
model essentially resembles an Excel
al decision making using very current data,
spreadsheet. Denormalized data models
the need for a more normalized physical
flourished out of a need to provide busi-
design becomes evident.] A normalized data
ness users with quick and easy access to
model provides a methodology for captur-
data without the requirement of under-
ing and storing the lowest level of data,
standing the underlying data model.
The star schema model is typically only
eliminates multiple updates to various
Business users, whose job does not include
considered for use in data marts or the
tables with the same transaction, and is
knowing SQL or the logic of data models
data warehouse and is built to address a
the model of choice for the OLTP types of
and, therefore, how to join tables together,
specific business need, report, or applica-
transactions that are targeted and specific.
needed easy access to information.
tion. Each star schema is built as an
However, those same business users
individual data mart. When the business
did understand spreadsheet formats and,
needs a new report, another star schema,
in fact, often requested information in a
or data mart, is built. If the needed
report format. Therefore, when data was
dimensions are the same as what has
presented in a two-dimensional form
already been built, they are duplicated
(similar to a spreadsheet); as opposed to
to this new data mart. Each of these data
a series of connected tables, business users
marts caters to the needs of the individual.
were protected from data model complexi-
There may be a temptation to think of a
ty. Because the data model was generated
group of these data marts as constituting a
to address specific business constituencies,
data warehouse. However, they are distinct
new user constituencies (often called
structures that have duplicated data in
Customer experience has shown that
the normalized model can answer new
business questions or previously unknown
questions without making changes to
the structure of the database, because
the relationships and entities have been
represented physically and provide the
greatest flexibility for the business. It
eliminates data duplication and, therefore,
the complexity of maintaining duplicated
data. Even though there are typically more
EB-2406
> 1007 > PAGE 8 OF 18
questions. These pre-defined models
ensure the known query will return an
answer set quickly. When new queries
are identified, the model is extended to
include a duplication of existing data in
a form that allows the business user to
navigate and quickly retrieve the answer
set for the new set of queries.
Data Model Overview
order to store it either in a different
Item
Date
Sales
format or on a different platform, either
100012
01102001
10.00
a different database or just different
300012
02122001
3.00
hardware. The star schema model doesnt
200012
01152001
2.50
100012
03042001
15.00
user usage. When the business community
Item
Jan Sales
Feb Sales
Mar Sales
Apr Sales
has a new request, a new star schema is
100012
345.00
450.00
326.50
245.90
typically built to facilitate the request.
200012
456.60
376.50
210.00
390.00
300012
254.00
112.00
310.00
295.00
400012
510.00
610.00
590.00
545.00
have the goal of storing data once for
multiple uses. Its goal is to facilitate end-
For example, when a business user wants
to know the monthly sales of products,
the report produced would have headings
Figure 4. Example table in denormalized format
such as Item, January Sales, February
Sales, March Sales, etc. The normalized
forcing all dimensions to conform to what
Inmon states, In short, simply doing
Sales table would consist of columns of
is established in the staging area. This bus
dimensional modeling as a basis for data
Item, Date, and Sales. This table would
architecture is the roadwork that connects
warehouse design leads down a dark path
provide the greatest flexibility, allowing for
all the different data marts and servers
when multiple star joins are considered. It
the next questions. These questions might
being used. Kimball states, It is acceptable
is never apparent that there is a problem
include what are sales per week for specific
to create a normalized database to support
with star joins when you are looking at
items, what were fiscal month sales, what
the staging processes; however, this is not
just one star join. But when you look at
items sell on Mondays, or what are
the end goal. The normalized structures
multiple star joins, the limitations of
weekend sales. However, because it is
must be off-limits to user queries because
dimensional modeling become apparent.
often difficult for users or even database
they defeat understandability and per-
Does this mean that dimensional model-
software to make the conversion from
formance. As soon as a database supports
ing is invalid as a database design
the table structure to the report layout,
query and presentation services, it must be
technique for data warehousing? The
in the dimensional model, DBAs simply
considered part of the data warehouse
answer is not at all. Dimensional modeling
store the table in the denormalized report
presentation area. By default, normalized
and star joins fit the bill very nicely for
format (see Figure 4).
databases are excluded from the presenta-
data marts. In fact, if I had to design a
tion area, which should be strictly
data mart tomorrow, I would not consider
In an effort to maintain conformity, the
dimensionally structured.
star schema is built with constrained
comes to the foundation data, its another
dimensions. Ralph Kimball explains that
Contrasting Ralph Kimballs belief that
data are placed in a staging area for
data warehouses are best suited for a star
transformation. These data, using what
schema, Bill Inmon firmly believes the
he refers to as the data warehouse bus
EDW has at its heart a normalized model,
architecture, are then propagated out to
reserving the use of star schema models
all the different data marts that require it,
for data marts if and when necessary.
EB-2406
> 1007 > PAGE 9 OF 18
using any other approach. But, when it
story. The foundation data the data
warehouse requires a different treatment
than dimensional modeling altogether.
The data warehouse, which is the proper
foundation for all DSS activity, including
star joins, requires very granular, very
Data Model Overview
flexible data. The ideal structure for the
model would be the easiest to implement.
extract data from the transactional sys-
data warehouse is normalized data. The
Since the data warehouse was updated or
tems, and those are in normalized format,
normalized data can be bent and shaped
refreshed on a weekly or even monthly
ETL tools work better with the normalized
any old way.
basis, the update schedule mitigated issues
data model.
surrounding the complexity of updates
The fact table is connected to the
dimension tables by means of foreign key
relationships. The keys to the dimension
tables sit in the fact table. There may be
many relationships that find their way
into the fact table. In addition the fact
table carries with it other non key data
if needed. In other words, the dimension
table may share with the fact table data
other than just the key. When the fact table
and the dimension tables are connected,
they form a star, hence the name star join.
The net effect of the fact table is a struc-
to maintain simple data models. Business
users were expected to know, understand,
and code simple SQL for retrieving
information, allowing IT to turn the data
warehouse over to the business users and
only maintain responsibility for keeping it
updated. When the business began coming
up with additional questions that the data
model couldnt address, and joins between
star schemas became complex, information delivery tools were developed to help
business users retrieve the information
they needed through an easy interface.
ture which is highly denormalized and
If the data warehouse is built with a
denormalized data model, then most
require additional steps to land the data
in a separate area for ETL. While the ETL
tools work well with a normalized model,
moving those data into a denormalized
model requires manipulation or transformation. The suggested way of handling
this manipulation or transformation is
the creation of a data staging area. This
staging area is used to land the extracted
data then manipulate or transform them
to the format needed before loading. This
staging area also provides the platform
which is very efficient to access. Once the
While many of these information delivery
from where data, in particular dimensions,
fact table is created, it can be efficiently
tools have grown up expecting this star
can be duplicated out to the different star
analysed. But there is a tradeoff. The
data model, many are growing in sophis-
schema data marts.
extreme efficiency of the star join also
tication and technologically developing
makes it inflexible. If there is a desire to
to be able to recognize and use more
see the data in the star join in a manner
normalized data models and making it
other than that for which the structure is
easier for end users to navigate a more
built, then the star join is very inflexible.8
complex schema.
When loading to a normalized model, the
need to transform the data for modeling
purposes is significantly less, and can
actually happen during the load process,
eliminating the need for a staging area. In
Impact of Information
Impact of Extraction,
a normalized model, those using a staging
Delivery Tools
Transformation, and Loading
area are typically using it to correct data
Denormalized models, specifically star
(ETL) Tools
quality issues that exist in the transactional
schemas, gained general acceptance on the
While Information Delivery tools were
systems. For example, ensuring character
basis of the ease provided for business
developed to assist the business user in
fields follow the same set of rules. If an
users to directly access data. After gather-
retrieving information from the data
address contains the word street, then all
ing the detailed business requirements,
warehouse, ETL tools were developed to
formats of street are transformed to be
the developers often determined that to
assist in extracting data from transactional
the same, for example ST becomes Street
mitigate risk and provide the business
systems. Since the ETL tools are meant to
and St. becomes Street.
users what they requested, a denormalized
EB-2406
> 1007 > PAGE 10 OF 18
Data Model Overview
Surrogate Keys
within the table minimal. Another
surrogate key, a process to link the values
In general, a logical key comprises the data
argument is that when the natural key is
the user does know with the surrogate
fields whose value ensures uniqueness for
large or consists of multiple columns, a
value must be established to retrieve
each row, and whose value will not change
surrogate key would be smaller and require
information. Typically, this requires
during the life of the row. Implemented
less space as it is populated throughout the
additional secondary indexes. Using
physically, it is a natural key. Often, the
data model. To facilitate access, a process is
surrogate keys tends to add both a column
terms key and index are used interchange-
developed that will lead to the surrogate
and a unique index to each table, some-
ably, but they are not the same. An index
key for joins and retrieval of information.
times multiple secondary indexes, and in
is a mechanism used to optimize performance. The physical implementation differs
depending on the database management
system employed.
some cases, entirely new tables. Surrogate
In a data warehouse, which brings together
data, often disparate data, from the trans-
key values also must be propagated to all
the dependent tables as foreign keys.
actional environment, surrogate keys may
provide a viable methodology to do so.
While surrogate keys are frequently with
A surrogate key is an artificial key, or
The data warehouse is subject oriented,
star schemas, their use is not reserved for
generated key, that is used as a primary
which contrasts the application or function
specific data models. The decision to use or
key in substitution for natural keys. Using
orientation of transactional environments.
not use a surrogate key should be part of
a surrogate key is a choice rather than a
Integrating and reorganizing multiple
the data model design considerations. That
requirement. Typically, they are numerical
systems that represent the same subject,
decision should be predicated on what the
and randomly generated to uniquely
such as customer, can require the integra-
data modeler believes to be best for the
identify a row. The use of a surrogate key
tion of data that does not occur naturally
organization? Will there be requirements
is often promoted on the basis of adding
in the operations environment. A surrogate
for the natural key to change? Is there a
flexibility and data independence to the
key will uniquely identify each instance
need to ensure that every row is unique?
data model. In business, change is an
of that integrated subject, each customer,
Will there be a resulting impact from
ongoing fact of being in business. Business-
without worry of duplication in the future
adding secondary indexes?
es are involved in mergers and acquisitions,
as the business continues to change. This
new products and channels are added, they
provides a stable data model that will
are reorganized, and they enter into new
withstand the test of time. However,
markets. There are a couple of arguments
beware not to use surrogates as replace-
for surrogate keys. One is that as business
ments for good data modeling practices.
natural key is used as the primary key,
changing either the value or the format of
that key is an intrusive effort, which could
result in loss of historical information and
data relationships. When a surrogate key is
used as the primary key, the impact of
changing the value of the natural key is the
same as changing the value of any column
EB-2406
> 1007 > PAGE 11 OF 18
order table is order number, and a business
user wanted to know how many units of
product X were ordered by company ABC,
chances are he wouldnt know the order
changes, so do the values, and potentially,
the format of natural keys. When the
For example, if the natural key for an
Before compromising to use surrogate
number. But would the business user know
keys, you must consider some things. To
the product number and/or the company
generate the surrogate key, a process must
name? When a surrogate key is used as the
be put in place to ensure uniqueness and
primary key, order number, product, and
consistency. Updating and inserting new
ordering company would probably each be
rows require first the creation of the
made secondary indexes. When the natural
surrogate key and also a lookup process
key of order number is used as the primary
to ensure the newly generated key hasnt
key, product and ordering company would
been used. Since the user doesnt know the
probably each be made secondary indexes.
Data Model Overview
SCD Type One
SCD Type Two
SCD Type Three
SKU
Catg
SKU
Catg
Updated
SKU
Catg
Prev Catg
1001
01
1001
03
20021029
1001
03
01
1002
02
1001
01
20010101
1001
01
15
1003
03
1002
02
20010101
1001
15
20
1003
01
20020915
1002
02
02
1003
03
20010101
1003
01
03
1003
03
12
1003
12
05
1003
05
03
Figure 5. Methods to address SCD
Outside of the maintenance issues of the
However, a changing dimension is only a
A normalized model is typically developed
surrogate key, the data modeler must
problem when the model is built on dimen-
as type two, developing a history as
determine when it is wise to use either a
sions. Typically, there are three methods used
changed rows are inserted into the table.
surrogate key or a natural key. For example,
to address SCDs (see Figure 5).
If the business application requires only
while order number may make sense to
> Type one updates in place the dimen-
the current value, a view could be used
use as the primary key, in the case of a
sion value as it changes. This provides
to present to the application current values
customer table, it may make sense to use a
a dimension that is always current,
only. Those applications that need history
surrogate key for customer id. The decision
eliminating the capture of history
or the ability to identify values during a
to use a surrogate key should be evaluated
because the association of the data
specific time frame would then have access
on a case by case occurrence and not
with the old dimension value has
to all the history data.
entered into as a standard.
been overlaid.
While the normalized model provides
> Type two inserts a new row into the
for the type two SCD, the construct of the
A changing dimension is often referred to
table as the dimension changes. This
dimensional model increases the challenge
as a Slowly Changing Dimension (SCD).
method provides both current and
of providing for changing dimensions.
The impact of time on the data warehouse
history information of the changing
Included in this challenge is the need to
can be significant. Time has a tendency to
dimension, and usually involves
determine how to reflect the changing
generate many changes. Customers move
carrying an effective date to indicate
dimension. Which type satisfies the
and their address changes; people change
which row is current.
business need for the dimension? Does
Changing Dimensions
their names. Area demographics change,
> Type three updates the dimension
it satisfy all the business needs? Which
populations shift. Products move from one
in place after moving the changed
method satisfies the needs of all the
category to another. These are changing
dimension to an old column. This
data marts in the dimensional model?
dimensions. Time impacts all data models,
method provides for current and
and dimensions will change over time.
most recent changes.
EB-2406
> 1007 > PAGE 12 OF 18
Data Model Overview
The Teradata Direction
model of atomic data provides the greatest
One of the key factors that sets the Teradata
Identifying the most appropriate choice of
flexibility and, therefore, often the greatest
Database apart from all others is the
data model for a Teradata solution is a two-
benefit for long-term business benefit. A
ability to use a normalized data model
step process. First, you must distinguish
well integrated data warehouse is a valu-
that facilitates ease of management, ease
between what Teradata Database lends itself
able mechanism in business operations. As
of expansion, simplifies load strategies,
to and what Teradata professionals advo-
businesses use the data warehouse to
and allows for full integration of enterprise
cate. Teradata Database is impartial to the
uncover patterns in their manufacturing
data. This facilitates building for the
data model. The database has the technical
processes, their ordering processes, or in
enterprise. Addressing the needs of the
ability and the power to perform with any
customer purchasing habits, business
individual becomes as easy as applying a
well-designed data model. Teradatas ability
processes change. The business expects the
view to the underlying data structure. This
to parallelize every aspect of query process-
data warehouse to enable and reflect those
view then allows IT and the data warehouse
ing eliminates technical issues and
business changes. If business changes
to address the needs of the individual.
complexity of decision support processing
require an additional column (dimension)
of large volumes of data, including multi-
to one of the act tables or product in one
table joins and joins within joins that are
category is transferred to another category,
often the catalyst for denormalized data
the business expects the data warehouse to
models for some technologies. Teradata
reflect these changes based on a business
engineering is always reviewing and identi-
timeline. If that timeline is negatively
fying areas within the optimizer to improve
impacted by time requirements in address-
performance whether the data model is
ing data model changes to enable and
normalized or the denormalized star
reflect the business changes, the data
schema. Join efficiencies improvements,
warehouse becomes an obstacle to the
such as large table/small table joins in the
businesss success. Customer and field
early 1990s, were targeted at star schemas.
experience repeatedly shows that a normal-
Impact of the Data
Advanced indexing features, join index
ized model provides for the most flexibility
Warehousing Evolution
introduced in Teradata Database V2R3 and
and ease in delivering new products to
As data warehousing evolves to an active
aggregate join index introduced in Teradata
market and for facilitating rapid business
environment where frequent and/or
Database V2R4, provide a methodology for
changes. So Teradata professionals often
continual data updates are required, where
users of Teradata Database to have a
suggest starting with a normalized model
processing becomes event driven rather
normalized model and use these features to
because of flexibility and ease in adding
than report driven, where the need to
create star schemas if and where processing
new data elements. However, this does not
address changes in the market place are
merits it. Teradata Database V2R5 has made
mean that the data warehouse follows all
expected with increasing speed, the data
generating surrogate keys easier with the
the rules of third normal form in the
model becomes even more critical. The
introduction of the Identity column.
strictest sense. Based on business require-
mixed workload of the shorter tactical
Optimizer learning and intelligence occurs
ments, summary tables are often found
queries with the longer strategic queries,
without regard to data model.
useful to augment the normalized model.
against the same data creates additional
The goal is to provide a model that sup-
denormalization challenges. Evolving to an
Years of field experience have shown
Teradata professionals that a normalized
EB-2406
> 1007 > PAGE 13 OF 18
ports the business through all of the
changes that occur over time.
Views are a transparent layer that is on
top of the underlying data structure and
provide a methodology of creating a
presentation layer that eases business
user access. In Teradata Database, views
dont require space and will transition
the underlying data model into nearly
any presentation with such efficiency the
cost of using views is mitigated to simply
the management of them.
Data Model Overview
active data warehouse eliminates many of
Summary
be unable to resolve their issues in a timely
the known variables, such as what queries
The goal of a data warehouse is to provide
manner, time that had a direct impact on
will be submitted, who will submit those
the business with a tool that facilitates
the success of the business. After deter-
queries, which were used in defining the
and enables the business to make better
mining the data was available in the data
data models of early data warehouses. To
business decisions and to take timely
warehouse, in a normalized format, the
accommodate this evolution, some of
action on those decisions. The robustness
data warehouse group was able to generate
Teradatas competitors technologies are
of the data infrastructure determines the
a presentation layer that resolved the
beginning to add functions and features
long-term success of the data warehouse
business needs within a couple of hours.
that enable a more normalized model.
and the ability of the business to harness
The ability to do this was valuable to the
Tools such as information delivery tools
the information for its own success. The
business bottom line.
are also preparing for this evolution. As
robustness of the Teradata Database
the data warehouse evolves to an active
provides for the ability to combine the
environment, the requirement for fresh
best of all worlds. The DBA can generate
data to be available and accessible erodes
a DBA-friendly normalized data model
the time available to maintain denormal-
that facilitates the full integration of data
ized models.
representing the enterprise, following
the teachings of Codd, Date, and Inmon.
The data model chosen for the data
warehouse should be chosen with a clear
understanding of the benefits of each. The
early days of data warehousing tended to
depend on a denormalized model. Much
of that decision was based on addressing
specific business needs rather than
addressing the breadth of the business,
or on existing technology that limited
choices. During those early days, when
Teradata professionals recommended a
normalized model, it was considered
unusual. As data warehousing evolves to
The use of views, a presentation layer that
employs the user friendliness of dimensional modeling, enables catering to the
needs of the individual, interfacing with
the data warehouse for business user ease
ness, and it should enable the business.
Taking a hard line with any specific data
model type wont satisfy the business
needs. However, the impact of time
generates change within the business, and
the data warehouse requires the flexibility
to address and yield to those changes.
A normalized data model is the correct
place to begin.
Debbie Smith is a Data Warehouse
for relational databases, and following the
Consultant in Global Sales Support. She had
teachings of Kimball. This allows for data
14 years of experience with Teradata systems
to be centrally stored, eliminates redun-
at a retail customer. Her responsibilities
dant data easing data management, provides
while with the retail customer included
a methodology to support future business
application development/support, database
needs by capitalizing on existing data
administrator, business power user, and
structures and shortening time to market.
responsibility for developing, implementing,
ment increase for businesses to change,
For example, one Teradata customer had
the idea of a normalized data model for
their business users request a new applica-
the data warehouse no longer seems so
tion from IT. After the applications group
radical. In fact, many of the leading
said that it would take 6 months just to
analysts and data warehouse professionals
find the data and once found, then they
understand the value of a normalized
would have to determine the time required
model and are becoming more critical
to generate the application. The business
of denormalized models.
users left the meeting believing they would
> 1007 > PAGE 14 OF 18
should reflect requirements of the busi-
of access, capitalizing on Codds guidelines
being active and as the need and require-
EB-2406
The data model for the data warehouse
and managing an Information Delivery
strategy for business end users. Since joining
Teradata, Debbie has worked extensively
with prospects as well as new and mature
data warehouse customers to implement
effective tactical and strategic data warehousing initiatives.
Data Model Overview
Appendix A Codds 12 Rules
transaction control. All commercial rela-
particularly difficult to satisfy. Most
Rule 1: The Information Rule
tional databases use forms of the standard
databases rely on strong ties between
All data should be presented to the user in
SQL (Structured Query Language) as their
the user view of the data and the actual
table form.
supported comprehensive language.
structure of the underlying tables.
Rule 2: Guaranteed Access Rule
Rule 6: View Updating Rule
Rule 10: Integrity Independence
All data should be accessible without
Data can be presented to the user in
The database language (like SQL) should
ambiguity. This can be accomplished
different logical combinations, called
support constraints on user input that
through a combination of the table name,
views. Each view should support the same
maintain database integrity. This rule is
primary key, and column name.
full range of data manipulation that direct-
not fully implemented by most major
access to a table has available. In practice,
vendors. At a minimum, all databases do
providing update and delete access to
preserve two constraints through SQL.
Rule 3: Systematic Treatment of
Null Values
A field should be allowed to remain
logical views is difficult and is not fully
supported by any current database.
empty. This involves the support of a null
No component of a primary key can have
a null value. (See Rule 3)
value, which is distinct from an empty
Rule 7: High-level Insert, Update,
string or a number with a value of zero.
and Delete
Of course, this cant apply to primary keys.
Data can be retrieved from a relational
In addition, most database implementa-
database in sets constructed of data from
tions support the concept of a non-null
multiple rows and/or multiple tables. This
Rule 11: Distribution Independence
field constraint that prevents null values
rule states that insert, update, and delete
A user should be totally unaware of
in a specific table column.
operations should be supported for any
whether or not the database is distributed
retrievable set rather than just for a single
(whether parts of the database exist in
row in a single table.
multiple locations). A variety of reasons
Rule 4: Dynamic On-Line Catalog Based
on the Relational Model
If a foreign key is defined in one table,
any value in it must exist as a primary key
in another table.
make this rule difficult to implement.
A relational database must provide access
Rule 8: Physical Data Independence
to its structure through the same tools that
The user is isolated from the physical
Rule 12: Nonsubversion Rule
are used to access the data. This is usually
method of storing and retrieving informa-
There should be no way to modify the
accomplished by storing the structure
tion from the database. Changes can be
database structure other than through
definition within special system tables.
made to the underlying architecture
the multiple row database language (like
(hardware, disk storage methods) without
SQL). Most databases today support
affecting how the user accesses it.
administrative tools that allow some direct
Rule 5: Comprehensive Data
Sublanguage Rule
manipulation of the data structure.
The database must support at least one
Rule 9: Logical Data Independence
clearly defined language that includes
How a user views data should not change
(From: ITWorld.com, Codds 12 Rules
functionality for data definition, data
when the logical structure (tables structure)
Data Management Strategies 05-07-2001.)
manipulation, data integrity, and database
of the database changes. This rule is
EB-2406
> 1007 > PAGE 15 OF 18
Data Model Overview
Appendix B A Comparison
Normalized Model
Denormalized (Star) Model
Active Data Warehousing
Supports active data warehousing initiatives.
Challenges the ability to be active.
Any Question
The exibility of the normalized model provides for
any question to be answered.
The simplicity of the denormalized model limits
what questions can be answered without changes.
Complex Analysis
A normalized model supports complex analysis.
The enforced simplicity of a denormalized model is
unable to support complex analysis.
Data Duplication
The rules of normalization require a single version
of data.
Data is often duplicated multiple times to satisfy
different groups of questions.
Data Granularity
The normalized model easily stores the lowest level
of data.
To maintain the ease of navigation, the denormalized model is typically aggregated along one of the
dimension lines.
Data Navigation
The normalized model is more difcult to navigate.
Business users have a difcult time at best in
understanding the model, the necessary joins and
sometimes even the relationships between data
elements.
The denormalized model is specically designed for
easy navigation by the business user.
Flexibility
The greater the degree of normalization in the data
model, the greater the exibility.
The denormalized model is meant to be simplistic
and is designed for specic requirements. As
requirements change, changes to the model are
needed.
Maintenance
Updates to the normalized model are easier
because a piece of information is in one place and
requires one update. The normalized model is the
standard for the transactional environment.
Continuing that model throughout the data warehouse provides a closer link and reection of
transactional processes.
Updating the denormalized model requires preprocessing, aggregation and longer time frames.
EB-2406
> 1007 > PAGE 16 OF 18
Data Model Overview
Appendix C Glossary
Dimensional Model
Foreign Key
Attribute
A design method that models the data
Columns in a table that reflect primary
Data that is attributed or describes an
based on a predefined set of business
keys of other entities. Determines the
entity.
questions to facilitate ease of access and
relationship between entities.
response time speed.
Cube
Hierarchy
A way to visualize how a dimensional
Drill (Across)
The arrangement or ranking of entities
model is organized. It is also a type of
Movement of end-user request across
into a graded series.
Dimensional Model design where the
different dimensions.
Intelligent Key
measures and dimensions are stored in a
format that is expected by an OLAP tool
(many times proprietary format). One
cube may represent one Fact table with its
contributing Dimensions.
Drill (Up or Down)
Key is made up of or consists of values
Movement of end-user request through
that reflect a business requirement or use.
data along a dimension.
Natural Key
Entity
The data fields whose value ensures
An identifiable object that is distinguish-
uniqueness for each row, and the value
able from all other objects.
will not change during the life of the row.
are used to present to the end user a
Entity Relationship Model
Normalized Model
portion or derivation of the underlying
Shows the enterprise entities (or objects)
Model of entities designed to remove data
data for their purposes. Data is not
and the relationships between the entities
redundancy, the higher the form (2NF,
replicated.
in that enterprise, without duplication.
3NF, 4NF, etc) the less redundancy.
Data Mart (physical)
ETL
OLTP
A segment of data that is aggregated or
Terminology given to a group of tools that
Online transaction processing. Consists of
preprocessed to enable specific end-user
extract data, transform it and then load it.
a set of instructions that serially, mechani-
queries and requirements.
Sometimes the sequence of processing
cally and repetitively processes many
changes to reflect ELT which is extract,
thousands of transactions through a
load and then transform.
predefined access path that ultimately
Data Mart (logical)
The data is stored in one place but Views
Data Warehouse
A reflection of integrated data that is
updates the underlying data store. OLTP
organized by subject areas and is stored to
Fact
address cross functional and multiple end-
A measurement or value (quantity).
user requirements and queries.
transactions are used by a transactional
system that is focused on the acquisition
Flattened Model
of data. OLTP transactions normally have
Dimension
Made up of rows that contain the data
strict service level agreements (SLA)
The corporate line along which facts are
elements of all the column headings of a
attached to them requiring sub-second
measured (time, product, geography).
business report so that nothing has to be
response time. OLTP concentrates on data
calculated to generate portions of that
update and/or inserts.
report.
EB-2406
> 1007 > PAGE 17 OF 18
Data Model Overview
Teradata.com
Primary Key
Appendix D Endnotes
Appendix E References
Unique identifier for rows in a table.
1. Dr. E. F. Codd, Codds 12 Rules Data
Ralph Kimball, The Data Warehouse
Schema
Structured framework that, for purposes
of this paper, provides and represents
relationships between entities.
Service Level Agreement (SLA)
Management Strategies, ITWorld.com,
Toolkit, Practical Techniques for Building
05-07-2001.
Dimensional Data Warehouses, Solutions
2. William Inmon, Enterprise Data
Warehouse, http://www.billinmon.com/
library/articles/introedw.asp.
3. Ralph Kimball and Margy Ross, The
from the Expert., John Wiley & Sons, New
York, 1996.
Daniel L .Moody, Department of Information Systems, University of Melbourne,
An agreement between IT and the busi-
Data Warehouse Toolkit, Second Edition,
Kortink, Mark, From Enterprise Models
ness that pertains to a performance aspect
Wiley Computer Publishing, New York,
to Dimensional Models: A Methodology
of a process. The agreement could be
2002, pp 11-12.
for Data Warehouse and Data Mart Design,
based on availability, enforcing that data is
loaded by a specific timeframe, or it could
be based on retrieval processing enforcing
a processing time for specific queries or
types of queries.
Slowly Changing Dimensions (SCD)
4. Neil Raden, edited by Ralph Kimball,
Designer, Real Time: Get Real, Part II,
www.intelligententerprise.com, June
5. Steve Hoberman, Data Modelers
Workbench, Tools and Techniques
that dimensions changes (i.e. customer
for Analysis and Design, John Wiley &
addresses).
Sons, New York, 2002.
Made up of multiple Star Schemas without the duplication of dimension tables
that can be joined together.
William Inmon/Peter Meers, The Dilemma
of Change: Managing Changes over Time
in the Data Warehouse/DSS Environment,
30, 2003.
The impact of time on a dimension and
Snowflake Schema
Simsion Bowles & Associates, 2000.
Intelligent Enterprise, Data Warehouse
White paper on http://www.billinmon.com/
library/whiteprs/Dilemma of Change.pdf,
March 2001.
6. Ralph Kimball and Margy Ross,
The Data Warehouse Toolkit, Second
Edition, Wiley Computer Publishing,
2002, page 9.
7. William Inmon, The Problem
Star Schema
with Dimensional Modeling,
Individual fact table surrounded by
http://www.billinmon.com/library/
dimensional tables to which it will join.
articles/artdimmd.asp.
Surrogate Key
Artificial key used as a substitute for
natural data keys (i.e. customer id).
8. William Inmon, Star Joins,
http://www.billinmon.com/library/
articles/starjoin.asp.
AllFusion is a trademark and ERwin is a registered trademark of Computer Associates International, Inc. Teradata continually improves products as new technologies and components become available. Teradata, therefore, reserves the right to change specifications without prior notice. All features, functions, and operations
described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for more information.
Copyright 2004-2007 by Teradata Corporation
EB-2406
> 1007 > PAGE 18 OF 18
All Rights Reserved.
Produced in U.S.A.