Fadilat Braimah is a Data Analyst and Research Consultant with
over 3 years of working experience in various industries. Fadilat
has worked as a market analyst in food supply chain startup
and a Data analyst and research consultant in a real estate and
proptech company. Fadilat is interested in helping data
enthusiasts transitioned fully into data analytics.
Twitter: @F_deela
LinkedIn: https://www.linkedin.com/in/fadilat121221/
DATA ANALYTICS CLASS
DAY THREE
Learning Objectives
• Understanding Data Modelling and its necessity
• Components of a Data Model
• Facts and Dimension Table
• Primary and Foreign keys
• Schema Design
• Relationships and Cardinality
Understanding Data Modelling
What is a data modelling?
The process of organizing and structuring data based on relationships
in a way that supports efficient analysis and reporting in Power BI.
Data is organized into tables as either:
➢Fact Table(s) – where the main business events are such a sales,
prices, transaction date and time. It can contain several repeated
values.
OR
➢Dimension Table(s) – where the description of the business events
are such as location, customer details, products. It contains only
unique values.
Why is Data Modelling Important?
• It helps you organize data efficiently.
• It helps you create accurate dashboards and reports.
• It helps reduce redundancy.
• It helps improve performance
• It helps you easily create calculated columns and measures.
Components of a Data Model
• Entity: An entity is a person, place or concept about which data is collected. For
example, in a customer database, the “customer” entity might include
information such as name, address, email and phone number.
• Attribute: It contains piece of information about an entity. “Name”, “Address”,
and “Phone Number” would all be attributes of a “customer” entity.
• Relationship: A relationship is a connection between two or more entities. For
example, in a database of customer and orders, there might be a “customer
places order” relationship between the “customer” and “order” entities.
• Cardinality: This refers to the number of entities that can be associated with one
another in a given relationship. It could be “one-to-many”, “one-to-one”, or
many-to-many.
Concept of Facts and Dimension Tables
Fact Table: It contains measures and is used for analysis. It contains
primary keys of Dimension Tables and associated facts or measures
like quantity sold, amount sold and average sales.
The Fact Table contains FACTS about the business-like quantity sold,
revenue, etc.
It can contain several repeated values.
Concept of Facts and Dimension Tables
Dimension Table: Dimension tables provide descriptive information
for all the measurements recorded in the fact table.
Dimensions are relatively very small compared to fact table.
Commonly used dimensions are:
Customer – attributes of a customer
Employee – attributes of an employee
Organization – attributes of an organization
Date – attributes of a date (ex. fiscal year, month, day)
Concept of Facts and Dimension Tables
Payments Table: Example of a FACT Table
1. Contains numerical measures/data to measure
performance
2. Transactional and each transaction is complemented
with a date
3. Fact tables are vertically large because they contain
significant number of transactions
4. Fact table contains a considerable number of foreign
keys
Concept of Facts and Dimension Tables
Customer Table: Example of a Dimension Table
1. Contains descriptive data that are largely text-
based
2. Dimension tables explains specific entities of
the business. E.g. Customer, Product,
Distributor
3. Dimension tables are small when compared to a
fact table
4. Dimension tables start with primary keys to
uniquely identify records
Schema Design
STAR SCHEMA
A star schema is a type
of dimensional model
in which data is
organized into a
central fact table
surrounded by a
number of dimensional
tables.
Schema Design
SNOWFLAKE SCHEMA
In a snowflake schema,
the dimensional tables
are connected to other
dimension tables
which are in turn
connected to the fact
tables.
Concept of Primary and Foreign Key
Primary Key: Primary key is a column in a table that uniquely identify a
record in the table. Primary keys must contain unique values. A primary
key segment can’t have NULL qualities
Foreign Key: Foreign key is a field in a table that is a primary key in
another table. It acts as a bridge between one table and another table in
the database.
A foreign key is a column or a set of columns in a table whose values
correspond to the values of the primary key in another table. In order to
add a row with a given foreign key value, there must exist a row in the
related table with the same primary key value
Concept of Primary and Foreign Key
• The "PersonID" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
• The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
.
Relationships and Cardinality
Cardinality refers to the relationship between the data in two tables.
There are four different types of cardinality
• One-to-many (1:*)
• Many-to-one (*:1)
• One-to-one (1:1)
• Many-to-many (*:*)
Relationships and Cardinality
1. One-to-many (and many-to-one) cardinality
The one-to-many and many-to-one cardinality options are essentially the same,
and they're also the most common cardinality types. It exists when one row in a
table can be related to many roles in another table.
2. One-to-one cardinality
A one-to-one relationship means both columns contain unique values. This
cardinality type isn't common.
3. Many-to-many cardinality
A many-to-many relationship means both columns can contain duplicate
values. This cardinality type is infrequently used.
How to build a Data Model
Split the tables into Fact & Dimensions
1. Separate the reporting fields into categories and values.
2. Consider all value fields as fact table fields.
3. Look for hierarchies within the categories.
a) Consider all categorical fields without hierarchies as fact table
fields.
b) Consider each category of hierarchy as dimension tables.
4. Identify the smallest unit within each dimension as a primary key.
5. Replicate all primary key fields in the fact table
Assignment
Replicate the data modelling process
Dataset (Mar 2014) is available on the drive
Link: https://drive.google.com/drive/folders/1W5tRqyiAljFjTTG5nxbgrnrthHmtdgab?usp=drive_link
Q&A