Dimensional
Modelling
Important Concepts
DATA ENGINEERING - DIMENSIONAL MODELLING
1
DIMENSIONAL MODELING
A data modeling technique optimized for
data warehousing and decision support
systems, focusing on ease of querying.
Using star and snowflake schemas to
organize data in a data warehouse.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
2
STAR SCHEMA
A type of database schema that consists
of one or more fact tables referencing any
number of dimension tables.
A sales database where a fact table
contains sales data, and dimension tables
include information about products,
customers, and time.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
3
SNOWFLAKE SCHEMA
A more complex version of the star
schema where dimension tables are
normalized into multiple related tables.
A sales database where dimension tables
such as products and customers are
further split into related tables like product
categories and customer demographics.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
4
FACT TABLE
Central table in a star schema that
contains quantitative data for analysis.
A sales fact table that includes measures
like sales revenue, quantity sold, and
discount applied.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
5
DIMENSION TABLE
Tables that contain descriptive attributes
(dimensions) related to the facts.
A product dimension table that includes
product names, categories, and
descriptions.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
6
GRAIN
The level of detail or granularity of the
data stored in a fact table.
Daily sales data versus monthly sales
data in a fact table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
7
SURROGATE KEY
A unique identifier for each row in a
dimension table, not derived from
application data.
An auto-incremented integer used as the
primary key in a customer dimension
table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
8
SLOWLY CHANGING DIMENSIONS
(SCD)
Techniques for managing and tracking
changes in dimension table attributes
over time.
Type 1 SCD updates data directly, Type 2
SCD adds new rows, and Type 3 SCD adds
new columns to track changes.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
9
TYPE 1 SCD
Overwrites old data with new data in a
dimension table.
Updating a customer's address directly in
the customer dimension table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
10
TYPE 2 SCD
Creates a new record with a new
surrogate key when a change occurs in
the dimension data.
Adding a new row for a customer who has
moved to a new address, retaining the
history of the old address.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
11
TYPE 3 SCD
Adds new columns to a dimension table
to track changes over time.
Adding columns for "previous address"
and "current address" in a customer
dimension table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
12
CONFORMED DIMENSIONS
Dimensions that are shared across
multiple fact tables and/or data marts.
A date dimension table used across sales,
inventory, and finance data marts.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
13
JUNK DIMENSION
Combines low-cardinality flags and
indicators into a single dimension table.
Combining boolean flags like
"is_promotional" and "is_returned" into a
single junk dimension table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
14
ROLE-PLAYING DIMENSIONS
A single physical dimension table used in
different contexts within the schema.
A date dimension table used for "order
date," "ship date," and "delivery date" in a
sales schema.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
15
FACTLESS FACT TABLE
A fact table that captures events or
conditions with no associated numeric
facts.
A table capturing student attendance
records without any numerical measures.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
16
AGGREGATE FACT TABLE
A summarized fact table that improves
query performance by reducing the
amount of data processed.
A monthly sales summary table that
aggregates daily sales data.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
17
BRIDGE TABLE
A table used to handle many-to-many
relationships between fact and dimension
tables.
A table linking customers to multiple sales
regions they belong to.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
18
DEGENERATE DIMENSION
A dimension attribute stored in the fact
table itself.
An order number stored directly in the
sales fact table without a separate
dimension table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
19
ETL (EXTRACT, TRANSFORM,
LOAD)
The process of extracting data from
source systems, transforming it, and
loading it into a data warehouse.
Extracting sales data from transactional
databases, transforming it to match the
warehouse schema, and loading it into
the data warehouse.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
20
BUS ARCHITECTURE
A framework for organizing data marts
and warehouses using conformed
dimensions and facts.
Ensuring that all data marts use the same
date dimension table to provide a
consistent view of time across the
organization.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
21
DATA MART
A subset of the data warehouse focused
on a specific business area or
department.
A sales data mart that contains sales-
related data for analysis by the sales
department.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
22
GRAIN DECLARATION
The process of defining the granularity of
the fact table during design.
Deciding that the grain of the sales fact
table will be at the daily transaction level.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
23
DIMENSIONAL HIERARCHY
A structure that organizes dimension
attributes into levels of granularity.
A time hierarchy with levels for year,
quarter, month, and day.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
24
MULTIVALUED DIMENSIONS
Dimensions where attributes can have
multiple values for a single entity.
A customer dimension where a customer
can have multiple phone numbers or
email addresses.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
25
OUTRIGGER DIMENSION
A dimension table that is linked to another
dimension table rather than directly to a
fact table.
A product subcategory table linked to a
product category table, which in turn is
linked to the fact table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
26
MINI-DIMENSION
A dimension table that captures rapidly
changing attributes, separated from the
main dimension table.
A mini-dimension for tracking changes in
customer preferences separate from the
main customer dimension table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
27
DATA STEWARDSHIP
The management and oversight of an
organization's data assets to ensure data
quality and governance.
Implementing policies and procedures for
data entry, validation, and maintenance
to ensure accurate and reliable data in
the data warehouse.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
28
SURROGATE KEY PIPELINE
The process of assigning surrogate keys to
records as they are loaded into the data
warehouse.
Generating unique integer surrogate keys
for customer records during the ETL
process.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
29
SLOWLY CHANGING MEASURE
Measures in a fact table that change over
time and need to be tracked historically.
Tracking historical changes in product
prices in the sales fact table.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
30
FACT TABLE GRAIN
The level of detail or granularity of the
data stored in a fact table.
Storing sales data at the transaction level
versus the daily summary level.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
31
SNAPSHOT FACT TABLE
Captures the state of a process at a
specific point in time.
A table capturing the month-end
inventory levels for each product.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
32
ACCUMULATING SNAPSHOT FACT
TABLE
Tracks the progress of a process over
time, updating records as milestones are
reached.
A table tracking the stages of an order
from placement to shipment and delivery,
with updates to the same record as the
order progresses.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
33
SEMI-ADDITIVE MEASURES
Measures that can be summed across
some dimensions but not others.
Bank account balances that can be
summed across time but not across
accounts.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
34
NON-ADDITIVE MEASURES
Measures that cannot be summed across
any dimension.
Ratios or percentages, such as profit
margins, that cannot be summed
meaningfully.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
35
ETL STAGING AREA
A temporary storage area used during the
ETL process to hold data before it is
transformed and loaded.
Using a staging database to store raw
sales data extracted from transactional
systems before transforming and loading
it into the data warehouse.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
36
DATA QUALITY
Ensuring the accuracy, completeness, and
reliability of data in the data warehouse.
Implementing data validation checks
during the ETL process to ensure accurate
and consistent data.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
37
DATA LINEAGE
Tracking the origin and transformations of
data as it moves through the data
warehouse.
Maintaining metadata that documents
the source systems, transformations, and
loading processes for each data element
in the warehouse.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
38
BUSINESS PROCESS
A series of activities or tasks that produce
a specific outcome, often used as the
basis for defining fact tables.
The order fulfillment process, which
includes order placement, processing,
shipment, and delivery.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
39
DRILL-DOWN ANALYSIS
The ability to navigate from summarized
data to more detailed data.
Analyzing sales data by drilling down from
monthly sales totals to daily sales
transactions.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
40
DERIVED TABLE
A table created as the result of a query,
often used to simplify complex joins and
calculations.
Creating a derived table to calculate the
average order value for each customer
segment.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
41
FACT TABLE AGGREGATION
The process of summarizing detailed data
in a fact table to improve query
performance.
Aggregating daily sales data into monthly
sales summaries to speed up reporting
queries.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
42
PERFORMANCE TUNING
Techniques used to optimize the
performance of the data warehouse and
its queries.
Indexing key columns in fact and
dimension tables to improve query
performance.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
43
DATA WAREHOUSE
ARCHITECTURE
The overall structure and organization of
the data warehouse, including schemas,
ETL processes, and data storage.
Implementing a three-tier architecture
with staging, integration, and presentation
layers.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
44
DIMENSIONAL INTEGRITY
Ensuring consistency and accuracy of
dimensions across the data warehouse.
Implementing referential integrity
constraints to ensure dimension keys in
fact tables match primary keys in
dimension tables.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
45
LATE ARRIVING DATA
Data that arrives after the initial load of a
fact table and needs to be integrated into
the existing data.
Handling late arriving sales transactions
that need to be added to a fact table after
the end of the reporting period.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
46
DATA WAREHOUSE LIFECYCLE
The stages of development and
maintenance of a data warehouse, from
initial planning to ongoing management.
Following a lifecycle approach that
includes requirements gathering, design,
implementation, testing, and
maintenance phases.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
47
OLAP (ONLINE ANALYTICAL
PROCESSING)
A category of software tools that provide
analysis of data stored in a database,
often used in data warehousing.
Using OLAP tools to perform
multidimensional analysis of sales data,
such as slicing, dicing, and pivoting.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
48
KIMBALL METHODOLOGY
An approach to data warehousing that
emphasizes the use of dimensional
modeling and incremental development.
Implementing a data warehouse using
the Kimball Methodology, starting with a
data mart and gradually integrating
additional data sources.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
49
CORPORATE INFORMATION
FACTORY (CIF)
A framework for enterprise data
warehousing that integrates data from
various sources into a central repository.
Using CIF to integrate data from sales,
finance, and HR systems into a centralized
data warehouse for enterprise-wide
reporting and analysis.
Shwetank Singh
GritSetGrow - [Link]
DATA ENGINEERING - DIMENSIONAL MODELLING
50
DATA VAULT MODELING
A database modeling methodology
designed to provide long-term historical
storage of data from multiple systems.
Implementing a data vault model to store
historical sales data from multiple
transactional systems in a central
repository.
Shwetank Singh
GritSetGrow - [Link]