0% found this document useful (0 votes)
70 views368 pages

Basics of Power Bi Modeling

powerbi essential books includes about very importent topics in powerbi implimentation.

Uploaded by

yerukulreddy0520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views368 pages

Basics of Power Bi Modeling

powerbi essential books includes about very importent topics in powerbi implimentation.

Uploaded by

yerukulreddy0520
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 368

Basics of Power BI Modeling

The fundamental lessons of building a data model that works best for Power BI
solutions

Author: Reza Rad


September 2020
Edition one
PUBLISHED BY
RADACAD Systems Limited
http://radacad.com

24 Riverhaven Drive,
Wade Heads,
Whangaparaoa 0932
New Zealand
Copyright © 2020 by RADACAD, Reza Rad
All rights reserved. No part of the contents of this book may be reproduced or
transmitted in any form or by any means without the written permission of the publisher.
Agenda
Agenda
About the Author
Introduction: from the author
Part 1: Relationships
Chapter 1: Power BI Relationship Demystified
Chapter 2: Many-to-One or Many-to-Many? The Cardinality of Power BI
Relationship
Chapter 3: What is the Direction of Relationship in Power BI?
Chapter 4: UseRelationship or Role-Playing Dimension; Dealing with Inactive
Relationships in Power BI
Chapter 5: Relationship with Multiple Columns
Part 2: Dimensional Modeling
Chapter 6: Data Preparation; The First and Foremost Important Task in Power BI
Chapter 7: Why My Values are Not Showing Correctly in Power BI?
Chapter 8: What is a Dimension Table and Why Say No to a Single Big Table
Chapter 9: Fact Tables
Chapter 9: Do You Need a Date Dimension?
Chapter 10: Power BI Date Dimension, Default, or Custom?
Chapter 11: Script to Create Date Dimension in Power BI using Power Query
Chapter 12: Script for Creating TIME Table in Power BI with Hours, Minutes, and
Seconds Buckets
Chapter 13: Star Schema and How to Build it
Part 3: Star Schema in Action
Chapter 14: Combining Dimension Tables in Power BI using Power Query
Chapter 15: Creating a Shared Dimension in Power BI Using Power Query
Chapter 16: Combine Tables or Create Relationships?
Chapter 17: What Fields to Hide in Your Power BI Solution?
Chapter 18: Build Your First Star Schema Model in Action
Chapter 19: How to Use Time and Date Dimensions in a Power BI Model
Chapter 20: Budget vs. Actual: Zero Complexity Model in Power BI
Chapter 21: One Dimension Filters Another Dimension in Power BI
Part 4: Calculations
Chapter 22: M or DAX? That is the Question!
Chapter 23: Scenarios of Using Calculated Tables in Power BI
Chapter 24: Measure vs. Calculated Column
Part 5: Better Data Model
Chapter 25: Move Your Shared Tables to Dataflow; Build a Consistent Table in
Power BI
Chapter 26: Power BI Shared Datasets: What is it? How does it work? And Why
should you care?
Chapter 27: Reduce the size of the Power BI file
Chapter 28: Design Tip for Power BI: Be Careful of Dates
Other books from Reza Rad
About the Author
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker, and
Consultant. He has a BSc in Computer engineering; he has more than 20 years’
experience in data analysis, BI, databases, programming, and development, mostly on
Microsoft technologies. He is a Microsoft Data Platform MVP for ten continuous years.
Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and
co-organizer of the Difinity conference in New Zealand.
His articles on different aspects of technologies, especially on MS BI, can be found on
his blog: https://radacad.com/blog.
He wrote some books on Microsoft BI and Power BI topics. He was also an active
member on online technical forums such as MSDN and Experts-Exchange, and was a
moderator of MSDN SQL Server forums, and is an MCP, MCSE, and MCITP of BI. He
is the leader of the New Zealand Business Intelligence users group. He is also the
author of the very popular book Power BI from Rookie to Rock Star, which is free with
more than 1700 pages of content and the Power BI Pro Architecture published by
Apress.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications
Summit, Data Insight Summit, PASS Summit, SQL Saturday, and user groups.
Connect to Reza
LinkedIn: https://www.linkedin.com/in/rezarad/
LinkedIn: https://www.linkedin.com/company/radacad
Twitter: https://twitter.com/Rad_Reza
Twitter: https://twitter.com/RADACAD_COM
Twitter: https://twitter.com/DifinityConf
Introduction: from the author
I have been dealing with many Power BI challenges in my professional life as a Power
BI consultant and a trainer. Challenges normally come as calculation or DAX questions,
or sometimes as a performance question. However, after digging deeper into the
problem, soon, it will be revealed that the problem is related to a more fundamental
challenge; data modeling.
If you have a Power BI implementation with many calculation-related or performance-
related issues, I strongly suggest looking into your data model because that is where
most of the problems start.
A good data model is a great base, which upon that, you can build up many stories of
calculations and analysis. A bad data model causes problems on every level that you
add upon it, and might sometime cause the whole solution to collapse.
Fortunately, data modeling is not rocket science. I explained the basic principles of the
data modeling with examples in this book. Use this book as the learning path towards a
better data model. Most of the tips mentioned in this book are product-agnostic (such as
star-schema, dimension, and fact tables). However, this book is particularly designed
and developed for a Power BI product user.
This book is for you if you are building a Power BI solution. If your task is only
visualizing the existing data, this book might not be needed for you. However, What I
have seen in many cases, is that the requirement starts with just visualize the data, and
then more data tables appear, and you get into the tunnel of data modeling without
knowing the principles of it. This book is a guide for you through that tunnel.
Most of the chapters of this book come from my blog articles and videos, and countless
comments from my readers are applied to it. If you ever feel that you have a question
which you can’t get the answer of it through this book, feel free to contact me directly.

Download the files and codes for this book from here
[http://radacad.com/books/files/BookFiles-
BasicsOfPowerBIModeling.zip].
Part 1: Relationships

The book starts with understanding the basics of a relationship in Power BI. In this part,
you will learn why relationships are needed. You will also learn what the types of
relationships are. This part continues with some advice about how to use the right
relationship in your model.
Chapter 1: Power BI Relationship Demystified

This chapter is going to cover the basics of the relationship. Tips such as: What is a
relationship? Why do you need it? And what is happening logically behind the scene of
a relationship?
Sample Dataset
If you want to walk through the example of this chapter, create a new Power BI Desktop
file and get data from the Pubs.xlsx Excel file.
What is the Relationship in Power BI?
Let’s go through the meaning of a relationship through an example; Let’s say you have a
single table for Store, and you want to analyze that in Power BI. Our Store table looks
like this:

In the Power BI report, if you want to build a visualization which is the count of stores
in State, it would be like this:
As you can see, the State can filter the table to show the count of records for each state.
If I want to visualize the filtering, it would be like this:

What if more than one table?


Now let’s check what happens if we have more than one table; Let’s say there is a Sale
table also as part of the solution. The Sales table has information such as how many of
each book title, at which date, at what store is sold. Here is a screenshot of the data in
the Sales table;
In order to learn relationships, I do recommend you to turn off the auto-detect
relationship (Just for the purpose of this learning). Having the auto-detect relationship
stops you from learning what is happening behind the scene. If you want to turn off this
feature, firstly, you need to go to options of Power BI Desktop;
And these are options to uncheck if you want to turn off the auto-detect relationship;
If we load the Sales table also in the same solution, now we will have two tables.
Without any relationships (because the auto-detect relationship is turned off) as below;
Now, if we want to filter and find out what is the total Qty of book titles sold in each
store, we can do that using a visual like below;
This would be very similar to the previous visualization on the Store table. The filtering
happens all in one table; however, this time, the table is Sales;

Now, let’s do a visualization like this: The total Qty of titles sold in each state:
As you can see in the above visualization, the total of qty for each state is 493! And the
total qty in the Sales table is also 493! Obviously, this time filtering didn’t happen
correctly. Because Sales only knows about the stor_id, not the state:

If we want to filter this table by store, the only field that gives us such ability is stor_id,
and that doesn’t know anything about the state.
On the other hand, the Store table has the state field, but knows nothing about the Sale
and Qty! However, there is a stor_id field in this table too. Which is the unique id of the
store per each row (store);

In this example, we need both tables because they both contain fields that we need in
our visualization: State field from the Store table, and the Qty field from the Sales table.
The stor_id field is the field that can help us in looking up one value in the other table.
Let’s see that through an example:
The store Barnum’s located in Tustin, state CA. The ID of this store is “7066”.
The Sales table, on the other hand, tells us how much sales we had in store 7066 (which
is Barnum’s store);

These two tables can be connected to each other using stor_id. Or let’s say it in other
words; these two tables can be RELATED to each other using the stor_id field.
Relationships
When you want to query data from two tables, and the data of those two tables should
be somehow connected, then you can create a relationship between them (if these two
tables are not related through other tables).
To create a relationship between two tables, you need a field that can link these two
tables together. We call this field a relationship field. You can simply drag that field
from one of the tables (doesn’t matter source or destination), and drop it on the same
field on the other table.
After drag and drop, you should see the relationship created like below;

The relationship which is created might not be located in a way showing exactly from
stor_id to stor_id. However, when you click on the relationship line, then you can see
that fields that are part of the relationship are highlighted.
Fields can Have Different Names
The two fields from the two tables can have the same name or different names. If they
have the same name (or let’s say similar name pattern), Power BI can automatically
detect that relationship. Remember that we turned off that feature earlier in this chapter
to make the learning process easier. By default, the auto-detect relationship is on and
would have detected a scenario like this that we had stor_id in both fields and would
have created the relationship.
If field names are different, then you have to create the relationship manually, like what
we have done above. Alternatively, there is another way to create a relationship too.
Creating a relationship using the Manage Relationship Menu Option
Under the Home tab in Power BI Desktop, you can click on Manage Relationship;
You will see the Manage Relationship window, which will show all existing
relationships and their “from” and “to” columns. You can create a new one by clicking
on the New. You can also notice that there is a “autodetect” option right beside “New”
just in case you want to use it.

We cannot see the relationship that I have created in the previous step here because I
removed it manually to create it again this way and show you this as the second method
of creating it.
After clicking on New relationship, then in the Create Relationship window, you will
be able to set the source table, the destination table, and you have to also click on the
relationship field to highlight it in each table. Like what illustrated in the below image:
The choice of having the Stores table as the source or the Sales table as the source
makes no difference. However, the cardinality will make the process different. Power
BI usually is smart enough to understand cardinality. However, Cardinality itself is a
big topic in the relationship, which I will talk about it later in another chapter.
There are many other configurations in the Create Relationship window and each of
those needs a full chapter discussion for themselves. In this chapter, we are only
focusing on the basics of relationships. After clicking on OK with the configuration
above, you should achieve the same relationship diagram as we did in the previous
step:
Edit Relationship
In case you want to edit a relationship, you can either go through the Manage
Relationship section under the Home tab, find that relationship, and Edit it;

Or you can just double click on the relationship line between the two tables,
Data Types Should Match
The two fields that you are connecting to each other should have the same data type or a
data type that can be matched to each other (for example, text to number). Otherwise,
when you create a relationship, you get an error.
Linking Field Should Exists in Both Tables
If you don’t have the link field in one of the tables, then obviously, you cannot create a
relationship between them. In the above example, the stor_id is our link field, and it
HAS TO exists in both tables.
Relationship Means Filtering Across Tables
The whole purpose of creating relationships in Power BI is to be able to filter data by
the fields from two separate tables. Creating a relationship like the above example
means connecting the two tables like below;

Now, after creating the relationship; we can see the visualization works fine:
Behind the scene, this means filtering like below:
Stor_id field behind the scene is acting to connect these two tables. The first table gets
filtered by state, and then the second table (through the relationships of stor_id) also
gets filtered for each state.
Other Properties of a Relationship
There are some other important properties of the relationships, which I am going to
explain in the next few chapters, such as:
The cardinality of the relationship (1-1, 1-Many, Many-1, Many-Many)
The direction of the relationship
Active or inactive relationship
Creating a relationship based on multiple fields
Summary
Relationships are one of the most essentials parts of Power BI modeling. If you are
coming from the database background, you already have a good understanding of
relationships. If you are coming from Excel or a business background, you can consider
each relationship like a lookup table.
The relationship is what makes the filtering across multiple tables possible. In this
chapter, you learned about the basics of relationships, such as why we need it, and how
to create relationships, and what happens behind the scene of a relationship logically. In
future chapters, you will learn about other important features of a relationship.
Chapter 2: Many-to-One or Many-to-Many? The
Cardinality of Power BI Relationship

In the previous chapter, you learned the basics of relationships, you learned why we
need a relationship, and what is the filtering impact of it across multiple tables. In this
chapter, you will learn about one of the most important properties of a relationship
called Cardinality. Understanding what the meaning is of 1-1, 1-Many, Many-1, and
Many-Many relationship is the purpose of this chapter.
Relationships in Power BI
Power BI relationships give us the ability to have fields from multiple tables and
filtering ability across multiple tables in the data model. Relationships are based on a
field, which will connect two tables, and filter one based on the other (or vice versa
depends on the direction). As an example, we can filter the Qty of the Sales table by the
State in the Store table, as long as there is a relationship between Sales and Store table
based on stor_id;
And the relationship between the tables is as below;

To learn more about the details of the relationships, and why we need that, read chapter
one.
What is the Cardinality of the relationship?
When you create a relationship between two tables, you get two values, which can be
one or * on the two ends of the relationship between two tables, called Cardinality of
the relationship.
The two values of 1 or * are saying that the field in that relationship has how many of
that value per line in that table. Let’s check it with an example;
In the Stores table, we have one unique value per stor_id per row.

So if this field participates in one side of a relationship, then that side will take one as
the Cardinality indicator, Which is called ONE side of the relationship.
However, the stor_id in the Sales table is not unique per each data row in that table. We
have multiple rows for each stor_id. Or let’s say; in each store, there are multiple sales
transactions happening (which is normal, of course);
So if the stor_id in the Sales table is part of a relationship, that side of the relationship
will become the *, or what we call the “MANY” side of the relationship.
So based on what we know so far, If we create a relationship based on stor_id between
the two tables of Sales and Stores, here is the output:
This relationship can be read in two ways;
One-to-Many (1-*) relationship from the Stores table to the Sales table
Many-to-One (*-1) relationship from Sales table to Stores table
They are both the same, of course, and they will look exactly like each other in the
diagram view. Now that you know what the Cardinality is, let’s check all different types
of Cardinality.
Types of Cardinality
There are four types of cardinality, as below:
1-1: one-to-one
1-*: one-to-many
*-1: many-to-one
*-*: many-to-many

Let’s check each of these types one by one.


One-to-Many, or Many-to-One
This is the most common type of cardinality used in data models. This type of
cardinality means one of the tables has unique values per each row for the relationship
field, and the other one has multiple values. The example that you have seen previously
between the Stores and Sales table based on the stor_id is a many-to-one or one-to-
many relationship;
There are two ways of calling this relationship; One-to-Many or Many-to-One. Depends
on what is the source and destination table.
For example, the configuration below means from the Sales table to the Stores table
relationship is Many-to-One.
And below shows the relationship as One-to-Many from the Stores table to the Sales
table;
These two are both ending with creating the same relationship as below:
It means there is no difference in one-to-many or many-to-one, except the angle that you
are reading that from. If you look at this from the Stores' table, you have a “one-to-
many” relationship. If you look at this from the angle of the Sales table, you have a
“many-to-one” relationship. And they both are the same with no difference at all. So
from now on in this chapter, whenever you read many-to-one or one-to-many, you know
that you can read it the other way around too.
For the rest of the chapter, I will be using terms of FACT and DIMENSION tables,
which I will explain them separately in another chapter in details. A short explanation
of terms are as follows;
Fact table: the table that has the numeric values which we want either in
aggregated level or detailed output. Fields from this table usually are used as
the VALUE section of visuals in Power BI.
Dimension table: the table that has descriptive information, used for slicing and
dicing the data of the fact table. Fields from this table often used as Slicers or
Filters or Axis of visuals in Power BI.
Many-to-One Relationship between Fact and Dimension tables
Many-to-one is a relationship commonly used between the fact table and dimension
tables around it. The example above is between Sales (which is the fact table), and
Stores (which is a dimension table). If we bring another table into the model: Titles
(based on title_id in both tables: Sales and Titles). You can see the same many-to-one
relationship pattern exists;
Many-to-One relationship between Dimension and Dimension tables

This type of relationship, although it is often used in many models, can always be the
subject of investigating for better modeling. In the ideal data model, you should NOT
have a relationship between two dimension tables directly. Let’s check that as an
example;
Let’s say there is a different model from what you have seen so far in this example;
Sales table, and a Product table, and two tables for category and subcategory
information of the product as below;
As you can see in the above relationship diagram, all relationships are many-to-one.
Which is fine. However, if you want to slice and dice the data of the fact table
(SalesAmount, for example), by a field from the DimProductCategory table
(ProductCategory name, for example), it requires three relationships to be processed;
This will add the consumption of some of the processing power, and it also brings a lot
of other issues, such as filtering from the “many” side of the relationship to the “one”
side of it. This type of relationship is not recommended. I have dedicated a whole
chapter to this discussion of why you need to combine dimension tables to avoid such a
scenario.

A better model would be combining category and subcategory tables with the product
and having one single many-to-one relationship from the fact table to the DimProduct
table.
One-to-One Relationship
A one-to-one relationship happens only in one scenario when you have unique values in
both tables per column. An example of such a scenario is when you have a Titles table
and a Titles Details table! They both have one row per title. So If I create a
relationship, it would look like this:

When you have a one-to-one relationship between two tables, they are a good candidate
to be combined with each other using Merge in Power Query
[https://radacad.com/append-vs-merge-in-power-bi-and-power-query]. Because both
tables have the same number of rows most of the time, or even if one of them has fewer
rows still considering the compression method of Power BI xVelocity engine, the
memory consumption would be the same if you bring it in one table. So If you have a
one-to-one relationship, then think about combining those tables seriously.
It would be better if we combine both tables above in one table which has everything
about the Title in it.
Many-to-Many Relationship: Weak Relationship
You have multiple records for each value in the joining field between tables, for both
tables. If you think about it for a second, you can see that this scenario happens when
you have tables that are related to each other with no shared dimension in between!
Let’s check one example; Let’s say I have a Fact Inventory table and a Fact Sales table.
They both have multiple records per product. And I want to connect them together using
the Product ID. This has to be a many-to-many relationship because there is no product
ID field that has unique values in it.

What if you have more than one table with that scenario?
You will end up creating a many-to-many relationship between every two tables, and
then because it causes circular filtering in the relationship, you end up with an inactive
relationship! There are tons of other issues with the many-to-many relationships, and
most of the time, they are all result of not having a shared dimension in between. I
dedicated a whole chapter about what is a shared dimension, and how to create it to
avoid a modeling mistake like above.

The many-to-many relationship causes tons of issues, and that is


why it is called a weak relationship too. Most of the time, it can
be resolved by creating a shared dimension and creating one-to-
many relationships from the shared dimension to the fact tables.
AVOID this type of relationship in your model.

A better model for the above sample would be using shared dimensions as the diagram
below;

Summary
The cardinality of the relationship means having unique or multiple instances per value
for the joining field between two tables. The most common type of cardinality is one-to-
many or many-to-one, which happens between fact and dimension tables. However, you
can find one-to-one relationships too. One-to-one relationships are a good candidate to
be combined into one table. Sometimes, for some types of one-to-many relationships, it
is better to combine tables as well to create a flattened dimension. The type of
relationship that you should be avoiding is the many-to-many relationship that can be
resolved by creating a shared dimension.
Chapter 3: What is the Direction of Relationship
in Power BI?

Relationships in Power BI are a bit different from other database management systems.
In most of the systems, you just have a relationship, and there is no “Direction” for it. In
Power BI, however, there is a direction for the relationship. The direction of a
relationship plays a critical role in the way that filtering works in Power BI.
Understanding the direction of the relationship is an important step towards the
modeling of Power BI. In this chapter, you will learn about what the direction of the
relationship is and what is the difference between both directional or single-directional
relationship.
Sample Dataset
If you want to walk through the example of this chapter, create a new Power BI Desktop
file and get data from the AdventureWorksDW2012.xlsx Excel file and select these
tables; DimCustomer, DimProduct, FactInternetSales. Load the data into Power BI.
After loading data into Power BI, In the relationship tab, you should see all three tables
related to each other. You can see the direction of the relationship, which is usually
from one side of the relationship to the “many” side of it.
What is the Meaning of the Direction of the Relationship?
The most important question is what the direction of a relationship means? The answer
is; It means Filtering. Whatever direction of the relationship is, that means how Power
BI filters the data. In the above screenshot, you can see the direction of the relationship
is from DimCustomer to the FactInternetSales. It means any column from DimCustomer
can filter the data in the FactInternetSales. As an example; you can slice and dice the
SalesAmount (in the FactInternetSales table) using the EnglishEduction (in the
DimCustomer), as below;
So far, nothing seems strange or weird. You can slice and dice the data of the fact table,
also using columns from DimProduct because there is a relationship direction from
DimProduct to FactInternetSales. The problem (or let’s say the strange part) comes
when you want to filter in a different direction than what is defined in the relationship.
Let’s see an example.

The direction of a relationship means the way that filter


propagates in Power BI

Filtering Based on Different Direction


There are sometimes that you need to filter against the direction of the relationship.
Let’s look at an example. Let’s assume you want to get the number of products been sold
in each education category. You can create a table with EnglishEduction (from
DimCustomer) and ProductKey (from DimProduct) as the first step. Then in the field’s
list of the visual, change the aggregation of ProductKey to Count as shown in the
screenshot below;
The result would be the count of products for each customer education category. But
wait, the result doesn’t look correct! It shows 606 for every education category!
As you can see in the above screenshot, the count of ProductKey is 606 for every
EnglishEducation. The reason is The DIRECTION of the relationship. Let’s look at the
direction again.
As you can see in the above screenshot, only filtering from DimProduct or
DimCustomer to the FactInternetSales is allowed with the current direction of the
relationship. However, what we are trying to achieve in this example is a bit different.
We want to filter DimProduct based on the selection of Education in DimCustomer.

You do need to have a different direction in the relationship to get it working.


Both-Directional Relationship
For the example above to work, you need to change the direction of the relationship to
both-directional to get it working. Please note that you SHOULD NOT do this all the
time. The both-directional relationship has a significant drawback about performance
(we will talk about it later). For now, to see what the both-directional relationship
does, double-click on the relationship line between DimProduct and FactInternetSales
and make it both directional.

Now you should see the result correctly in the table;


The reason that both-directional relationship works here is that it will enable filtering
towards DimProduct as the below screenshot illustrates;

So, here you go; now you know what the direction of the relationship is, and what is the
difference between both directional and single directional relationship. Before you go,
you need to read one important critical note about the both-directional relationship,
however!
Be Careful! Performance Considerations for Both-Directional
Relationship
After you’ve done the example above, it seems that both-directional relationship is
good! Then you may think; why we should not be using it all the time! If you have been
using Power BI Desktop in the early days of the second half of 2015, the default type of
direction was both-directional. That time, I got many calls and emails from people that
their model is slow! Why you think was that? Because of both-directional relationship!

A both-Directional relationship is one of the ways you can kill


the performance of your Power BI Model!

Yes, you read it correctly. The both-directional relationship is causing performance


issues. Also, you cannot always create both directional relationship, because it will
create a loop of filtering sometimes! So what is the solution? The solution of both-
directional relationship is not short enough to talk about it in this chapter. I will just
point out two methods in this chapter.
Method 1: Change the Data Model! Design Appropriately
Yes, The right data model does not need many places to be marked as a both-directional
relationship. If your model needs the both-directional relationship in the majority of the
relationships, then your model is not designed well. I will explain in detail the modeling
principles in the next part of this book. Good modeling can resolve the need for both-
directional relationship.
Method 2: Using CrossFilter DAX function ONLY IF the first method
does not work
Only and only if you have designed your model properly, and still you cannot get what
you need, then you can write a DAX expression using CrossFilter to get the result you
want. Doing it this way is still using the both-directional relationship for that
calculation. However, the both-directional relationship would be used only for
calculating that single measure. All other times, performance should be normal.
Summary
The direction of the relationship plays a very important role in modeling in Power BI.
The direction of the relationship means the way that filter propagates in Power BI. The
single-directional relationship will filter one table based on the other one. Sometimes
you need to filter in a different direction, that is when the both-directional relationship
comes into play. However, both directional relationship comes with a cost of
performance issues. Do not use both-directional relationships blindly. Make sure you
have designed your model in the right way first, and if that doesn’t work, then try other
methods such as CrossFilter DAX functions. I will write later about how to resolve the
both-directional issue in a Power BI model.
Chapter 4: UseRelationship or Role-Playing
Dimension; Dealing with Inactive Relationships
in Power BI

In a Power BI model, relationships are important for passing filters. Filter propagates
through relationships. However, sometimes you create the relationship between two
tables, and the relationship is a dashed line. In this chapter, I’ll explain to you
everything you need to know about a dashed relationship or as it is called an Inactive
relationship. I will explain two different methods that you can deal with this kind of
relationship. So, ready? Let’s go through it.
Why Relationships in Power BI?
To start this chapter, the very first topic to discuss is to understand why a relationship in
Power BI is important? A relationship in relational database systems is important to
link tables to each other, but in Power BI, the relationship also plays another even more
important role; Filtering.
To understand how the relationship works, let’s check this example:
I have a sample Power BI file getting data from the AdventureWorksDW2012.xlsx
Excel file example, and I get information from two tables: FactInternetSales, and
DimDate. These two tables are NOT related to each other at the beginning.

Now, let’s create a simple column chart with the SalesAmount from the
FactInternetSales table, and the FullDateAlternateKey from the DimDate table. Because
the FullDateAlternateKey is a date field, Power BI brings the default hierarchy, and I’ll
see the visual slicing and dicing data by the highest level of the hierarchy, which is
Year.
But wait, it isn’t actually slicing and dicing! It is showing the same SalesAmount for
every single year from 2005 to 2010! The value is very close to $30 million, which is
actually the grand total of the sales in my dataset. The fact is that the
FullDateAlternateKey field is NOT filtering the FactSalesAmount table.
Relationship Means Filtering
Now, let’s create the relationship between these two tables, based on the OrderDateKey
in the FactInternetSales table and the DateKey in the DimDate table;
That’s it, let’s go and check the same visualization again:
As you can see, the same visual, this time filters by the date field. Or better to say,
DimDate can now FILTER the FactInternetSales table. All of that because of the
relationship. Without a relationship, we cannot filter data across tables just by itself.
You may need to do some DAX expressions instead.

Relationship in Power BI means Filtering and the ability to slice


and dice a table by another table.

Now that you now relationships are for Filtering let’s check out what the inactive
relationship is.
Inactive Relationship
The type of relationship you have seen above is called an active relationship. There is
another type of relationship called Inactive. Let’s see how an inactive relationship will
be created. In the previous example, we sliced and diced data by the OrderDateKey
field, because that was the field connected through the relationship to the DimDate
table. Now, let’s say we want to slice and dice data by the ShipDateKey. The very
simple approach is to create another relationship between the DimDate table and
FactInternetSales but this time to the ShipDateKey. Here is the result:
As you can see, this new type of relationship is different. It is a dashed line, compared
to the active, which was a solid line. This is an inactive relationship. You can only have
one active relationship between two tables. Any other relationships will become
inactive.

You can only have one active relationship between two tables.
Any additional relationships will become inactive.

An inactive relationship doesn’t pass filtering. It doesn’t do anything by itself. I still see
many people creating inactive relationships in their model, thinking that just the inactive
relationship by itself will do some filtering. It doesn’t. If I use the FullDateAlternateKey
from the DimDate table to slice and dice the SalesAmount from the FactInternetSales
table, which field I’m filtering based on? The field that is related through an Active
relationship is the base of the filtering. Here is a result for that (which is apparently the
same as what you have seen in the previous example because the inactive relationship
doesn’t do anything. It is just the active relationship that passes the filter);
Inactive Relationship Doesn’t pass the filtering by itself. It needs
treatment!

Yes, the inactive relationship needs a special treatment to work. Let’s see how this can
work. I explain two treatments for an inactive relationship; the Role-playing dimension
and the UseRelationship method.
Role-playing Dimension
A dimension that plays the role of multiple dimensions is called a role-playing
dimension in the data warehousing terminologies. In the above example, DimDate is
going to play the role of Order Date in some scenarios, and the role of Ship Date in
other scenarios, and also sometimes the role of Due Date in other times.
One method to deal with the inactive relationship is to remove the cause to create it! If
having multiple relationships between two tables is causing the creation of an inactive
relationship, one way to avoid it seems to be creating multiple instances of the same
table, and then you would need only one relationship, not more than that.
Let’s create a copy of the DimDate. One way to create the copy is to use a Calculated
Table with ALL DAX function in it;
ALL is a function that gives you the entire table. In this case, we are creating a copy of
the DimDate table and calling it ShipDate. Now you can create a normal active
relationship between ShipDate and the FactInternetSales table (I have removed the
inactive relationship from the previous section);
And now, as a result, you have slice and dice by the ShipDate table as well as the Order
Date (or let’s say DimDate table);
The role-playing dimension is one of the ways that you can
handle an inactive relationship, but be careful of memory
consumption!

Copy only small tables


The role-playing dimension method is actually copying the table and is doubling up
memory consumption. The extra memory consumption can be overlooked if the table is
small. A Date table is a small table. Every year is 365 rows, and for 20 years, the Date
table will have around 7,000 rows. It is very small compared to a fact table with
millions of rows. This solution is good for small tables. But don’t use this method for
big tables. If you have a dimension table with 5 million rows and 30 columns, then the
role-playing dimension method means the consumption of the same amount of space
twice or three times or more.

Avoid the role-playing dimension if you have a large dimension.


This method is only good for small tables.

UseRelationship Function in DAX


Another method to handle an inactive relationship is to use a function in DAX called
UseRelationship. This DAX function is literally saying to Power BI that for this
expression, use this relationship, even if it is inactive. Let’s see how this function
works.
If we continue the same example of slicing and dicing by Ship Date and assume that
there is no Ship Date calculated table created, then we can do it this way; Create the
inactive relationship between DimDate and FactInternetSales again based on the
ShipDateKey.
Now, let’s create a Measure in Power BI with the below expression:

Sales by Ship Date = CALCULATE(


SUM(FactInternetSales[SalesAmount]),
USERELATIONSHIP(
FactInternetSales[ShipDateKey],
DimDate[DateKey]
)
)

This measure calculates the sum of sales by ship date. The whole secret is the usage of
the UseRelationship function. This is a really simple function to use. You just need to
provide two input columns to it, the two columns that are two sides of the relationship.
Their order is not important.
UseRelationship (<column 1>, <column 2>)

The important tip to consider is that you HAVE to have an existing inactive relationship
for this function to work. Otherwise, you get the error below:

An inactive relationship must exist; otherwise, the


UseRelationship doesn’t work.

One table filters the other table based on multiple fields


The main benefit of using this method is that you can now have the DimDate table to
filter the fact table based on both ShipDateKey and OrderDateKey at the same time, as
illustrated below:
As you can see in the above screenshot, one date table filters the fact table based on
multiple fields. One is based on OrderDateKey, which is an active relationship, and the
other one is based on ShipDateKey through the usage of the UseRelationship method in
the measure.

This method doesn’t consume extra memory. However, you do


need to create a measure for every single calculation with the
UseRelationship function.

Summary
In this chapter, you learned about inactive relationships and how to handle them through
two methods; Role-playing dimension and UseRelationship function in DAX. The role-
playing dimension method is good for smaller tables where the extra memory
consumption is not the issue. UseRelationship method, on the other hand, can be a good
substitute when the tables are bigger. There are other benefits, such as getting one table
filtering based on multiple fields at the same time as you’ve seen.
Chapter 5: Relationship with Multiple Columns

You can create relationships in Power BI between tables. Relationships are useful for
some functions to work across multiple tables and produce the result. The relationship
between tables also makes visualization and report elements more efficient because the
result of selection in one chart can affect another chart from a different table. However,
there is a limitation in the Power BI relationship that you can’t create a relationship
based on more than one column. In other words, if you want to create a relationship
with two or more columns, you cannot! Fortunately, there is a workaround that I’ll
explain in this chapter.
Defining the Problem
Assume that we have a budget table with the fiscal year, fiscal period, and budget
amount. Here is a screenshot of this table:
If I want to do date-based analysis and date-based calculations, it is best to create a
relationship between the budget table and a date dimension.

To join these two tables, I can add a day column to the budget table, and then join them
based on three columns: fiscal year, fiscal period, and day (day of the month). So here
is the budget table with the new day column added and the month value a bit polished to
remove “Mth” from the month label;

Now, if I want to create a relationship between the date dimension and budget table
based on these three columns, I cannot! The create relationship dialog doesn’t allow me
to select multiple columns, and because with a single column, a Key field won’t be
identified so the relationship can’t be created.
Workaround
The workaround for this problem is easy. Power BI doesn’t allow relationship in model
based on multiple columns, but Power Query can join tables with as many as columns
we want. So what I can do as a workaround is to join the budget table to date dimension
in Power Query and fetch the date key. Then I’ll use the date key as a single field
relationship in Power BI modeling section.
First, I open Merge Queries from the Combine section of the Home tab;
Here is how I join two tables based on multiple columns: I can hold the CTRL key and
select columns one by one (in the right order of joining)
Then I’ll have the new table embedded as a result of the join;
So I’ll just pick the key field from the embedded table;
After save and closing the query editor window, I can create a relationship in the Power
BI model based on a single column;
Part 2: Dimensional Modeling

In this part, you will learn about the principles of dimensional modeling.
You will learn why star schema is needed, what are the components of it,
what are fact and dimension tables. You will learn how to build a star-
schema from an operational data store.
Chapter 6: Data Preparation; The First and
Foremost Important Task in Power BI

I’ve talked about Data Preparation many times in conferences such as PASS Summit,
BA Conference, and many other conferences. Every time I talk about this, I realize how
much more I need to explain this. Data Preparation tips are basic but very important. In
my opinion, this is the most important task in building in BI system. In this chapter, I’ll
explain why data preparation is necessary and what are five basic steps you need to be
aware of when building a data model with Power BI (or any other BI tools). This
chapter is totally conceptual, and you can apply these rules to any tools.
Why Data Preparation?
All of us have seen many data models, like the screenshot below. A transactional
database by nature has many tables and relationships between tables. Transactional
databases are built for CRUD operations (Create, Retrieve, Update, Delete rows).
Because of this single purpose, transactional databases are built in a Normalized way to
reduce redundancy and increase the consistency of the data. For example, there should
be one table for Product Category with a Key related to a Product Table because then
whenever a product category name changes, there is only one record to update, and all
the products related to that will be updated automatically (because they are just using
the key). There are books to read if you are interested in how database normalization
works.

I’m not going to talk about how to build these databases. In fact, for building a data
model for the BI system, you need to avoid this type of modeling! This model works
perfectly for a transactional database (when there are systems and operators do data
entry and modifications). However, this model is not good for a BI system. There are
several reasons for that, here are two most important reasons;
1. The model is hard to understand for a Report User
2. Too many tables and many relationships between tables make a reporting query
(that might use 20 of these tables at once) very slow and not efficient.

You never want to wait for hours for a report to respond. The response time of the
reports should be fast. You also can’t expect that the report users (of self-service report
users) understand the schema above. It is sometimes even hard for a database developer
to understand how this works in the first few hours! You need to make your model
simpler, with few tables and relationships. Your first and the most important job as a BI
developer should be transforming the above schema to something like below;
This model is far simpler and faster. There is one table that keeps sales information
(Fact_Sales) and a few other tables which keep descriptions (such as Product
description, name, brand, and category). There is one relationship that connects the
table in the middle (Fact) to each of the tables around (Dimensions). This is the best
model to work within a BI system. This model is called a Star Schema. Building a star
schema or dimensional modeling is your most important task as a BI developer.
How to Design a Star Schema?
To build a star schema for your data model, I strongly suggest you take one step at a
time. What I mean by that is choosing one or a few use cases and start building the
model for that. For example, instead of building a data model for the whole Dynamics
AX or CRM, which might have more than thousands of tables, just choose the Sales side
of it, or Purchasing, or GL. After choosing one subject area (for example, Sales), then
start building the model for it considering what is required for the analysis.
Fact Table
Fact tables are tables that are holding numeric and additive data normally. Examples are
these numeric and additive values are; quantity sold, sales amount, discount, cost, or
things like that. These values are numeric and can be aggregated. Here is an example of
a fact table for sales;
Dimension Table
Any descriptive information will be kept in Dimension tables. For example, customer
name, customer age, customer geoinformation, customer contact information, customer
job, customer id, and any other customer-related information will be kept in a table
named Customer Dimension.

Each dimension table should contain a key column. This column should be numeric
(integer or big integer depends on the size of dimension), which is auto-increment
(Identity in SQL Server terminology). This column should be unique per each row in the
dimension table. This column will be the primary key of this table and will be used in
the fact table as a relationship. This column SHOULDN’T be the ID of the source
system. There are several reasons why. This column is called the Surrogate Key. Here
are a few reasons why you need to have a surrogate key:
Codes (or IDs) in the source system might be Text, not Integer.
Short Int, Int, or Big Int are the best data types for surrogate key because these
are values that will be used in the fact table. Because the fact table is the largest
table in the data set, it is important to keep it in the smallest size possible (using
Int data types for dimension foreign keys is one of the main ways of doing that).
Codes (or IDs) might be recycled.
You might want to keep track of changes (Slowly Changing Dimension), so one
ID or Code might be used for multiple rows.

The surrogate key of the dimension should be used in the fact table as a foreign key.
Here is an example;

Other dimensions should also be added in the same way. In the example below, a Date
Dimension and Product Dimension is also created. You can easily see in the screenshot
below that why it is called star schema; the Fact table is in the middle, and all other
dimensions are around with one relationship from the fact table to other dimensions.
Design Tips
Building star schema or dimensional modeling is taking more reading than what you had
in this chapter. That is why we have more chapters in this book to explain things more in
detail. However, it would be great to leave some tips here for you to get things started
towards better modeling. These tips are simple but easy to overlook. The number of BI
solutions that I have seen suffer from not obeying these rules are countless. These are
rules that, if you do not follow, you will be soon far away from proper data modeling,
and you have to spend ten times more to build your model proper from the beginning.
Here are the tips:
Tip 1: DO NOT add tables/files as is
Why?
Tables can be joined together to create a more flatten and simpler structure.
Solution: Create a flatten structure for tables (especially dimensions).

Tip 2: DO NOT flatten your FACT table


Why?
Fact tables are the largest entities in your model. Flattening them will make them even
larger!
Solution: Create relationships with dimension tables.

Tip 3: DO NOT leave naming as is


Why?
Names such as std_id or dimStudent are confusing for users.
Solution: Set the naming of your tables and columns for the end-user.

Tip 4: DO NOT leave data types as is


Why?
Some data types are spending memory (and CPU) like decimals. Appropriate data types
are also helpful for engines such as Q&A in Power BI, which are working based on the
data model.
Solution: Set proper data types based on the data in each field.

Tip 5: DO NOT load the whole data if you don’t require it


Why?
Filtering part of the data before loading it into memory is cost and performance
effective.
Solution: Filter out part of the data that is not required.

Summary
This was just a very quick introduction to data preparation with some tips. This is the
beginning of this part about dimensional modeling. Other chapters in this part will help
you to understand the concepts more in detail.
Chapter 7: Why My Values are Not Showing
Correctly in Power BI?

To start understanding why we need a dimensional model, let’s go through a common


challenge. I see often people get confused about how to use tables in their model, they
get ambiguity of what table should be used for what, and if they bring a count of a field
from one table, why it doesn’t work, whereas compared to another table it works. This
chapter is about how to find the right table to get your values from and get that showing
in Power BI visual correctly.
Sample Model
The sample model that I use here is coming from the AdventureWorksDW2012.xlsx
Excel file. I have a few tables in my model as below: Tables are DimCustomer,
FactInternetSales, DimDate, and DimProduct.
Values that Exists Only in One Table
When you get a value which you can only fetch from one table, there is no problem.
Example: SalesAmount field exists only in the FactInternetSales table. So in
visualization, I can easily build a report like this:
If I want the count of orders, still because that should come from the table that includes
order details (which in this case, it would be the FactInternetSales), it is easy. I just
drag a field (no matter which one, because the count of any of the fields in the same
table is always the same), and then change the aggregation to count, and it works
perfectly fine:

Both visualizations above are showing slicing and dicing the data of orders (one is the
SalesAmount and the other one count of orders) by the EnglishEduction, which is from
the DimCustomer table.
Values that can be fetched from Multiple Tables
The problem happens when there is a value that you can fetch from multiple tables.
Let’s talk about one scenario as an example:
We have FullName (from DimCustomer table) of customers in the table below, and we
want to add the count of products that have purchased besides them as a value column.
In a scenario like this, one of the very first actions for many developers is to go and
search for something about Product in the fields pane.

The product search will bring a lot of fields if we narrow it down with using
ProductKey, we see that it is available in two tables: FactInternetSales and
DimProduct.
In a properly designed data model, the ProductKey has to be hidden because it is a
technical column. If you use the ProductKey from the DimProduct table, then you get this
output:
If you wonder why all count of ProductKey values in the visual above is 606? I have
explained it fully in detail in chapter three about the direction of the relationship in
Power BI. As you see in the model diagram below, the DimCustomer table cannot filter
the value in the DimProduct table because of the single-directional relationship between
the DimProduct and FactInternetSales table.
You might then think of changing the relationship to both-directional between the
DimProduct and FactInternetSales table, which is a big modeling mistake. Your model
should always look like this:

So, How should you achieve that? Well, by choosing the value from the correct data
table, FactInternetSales table.
If I do the same with getting the value from the FactInternetSales table, then we get the
correct value;
Depends on the logic of your calculation, you might need to get Count of that field or
Count (Distinct) of that (because there are duplicate ProductKey values in the
FactInternetSales table; a product can be sold multiple times, of course). As you see in
the above visualization, the value is shown correctly.
How Do I Know Which Table Should I Get the Value from?
Well, we solved the above scenario. However, in a more complicated model, where
you have more than two or three tables, finding the right table to get the value from can
be a challenge. Here is the golden rule to select the values:

Any field that is going to be used as a slicer, filter, Axis of a


chart, headers of rows or columns in a table or Matrix
visual should come from Dimension tables.

Examples of these fields are FullName of the customer. EnglishEduction of the


customer. Date of the order, Product name, Product color, Age of the customer. As you
can see, some of these are even numbers (age), but because they have used for slicing
and dicing, they should come from a Dimension table.
In the above visuals, BirthDate and YearlyIncome are coming from their Dimension
tables.

Any field that is going to be used as a value of a chart, table, or


Matrix visual, should come from Fact tables.

Examples of these values are; SalesAmount, Order Quantity, Count of Products been
sold, Count of Customers purchased specific products, etc. The Count of ProductKey
from the FactInternetSales table that you see above was one of the examples;
What If I don’t have Fact or Dimension tables designed property?
At this point, you might say that it is easy in a proper star-schema design where there
are fact and dimension tables designed properly. However, in my model, everything is
messed up. What should I do with that? That is a great question. My answer is this:
Having a proper Power BI model design is the building block of all other steps
afterward. If you don’t have the right data model, everything is going to be messed up on
top of it. You cannot build a house on top of a building block, which is not solid itself.
I’d say whatever stage or step of the Power BI development you are, consider re-
designing your model seriously. There are simple things that you can do to change your
model to a good data model, it would take time, but it would well worth it.
Summary
Values in your visuals should come from Fact tables, and slicers/filters/axis/column or
row headers should come from Dimension tables. For that, you do need to have proper
dimension and fact table design. However, that is not hard to implement.
Chapter 8: What is a Dimension Table and Why
Say No to a Single Big Table

Now is a good time to explain the two types of tables we deal with every day in a
reporting data model; Fact and Dimension table. In this chapter, I will explain what a
Dimension table is, and examples of it, and how it can be created. Although examples of
this chapter are all using Power BI and Power Query, the basics of modeling are valid
regardless of the tool that you use. If you heard about Dimension and have no idea what
it is, or if you haven’t heard about it and you want to build a data model for reporting,
this chapter is what you need to read.
Two Types of Tables
In a reporting data model, there are two types of tables. Before I go further on this, I
have to point out that there is a big difference between a data model build for a
transactional or operational system, compared to the data model build for a reporting
system. In this chapter, we are focusing on a data model for reporting systems. In a
reporting data model, we have two types of tables; Fact table, and Dimension table;
A Fact table is a table that keeps numeric data that might be aggregated in the reporting
visualizations.
A Dimension table is a table that keeps descriptive information that can slice and dice
the data of the fact table.
The definition above, although correct, can lead to creating heaps of different types of
tables and calling those a Dimension or Fact table. You do need to understand where
and how a Dimension table can be created, what are rules of thumb for creating such a
table, and what is a good dimension and what is bad. And that is what we are focusing
on in this chapter.
A Type of Field: Descriptive
A Dimension table is a table that has descriptive information, which is or can be used
for slicing and dicing. A Dimension table has fields that are descriptive. Descriptive
fields can be of datatype text, date and time, or even a number. Let’s have a look at
some examples;
The table below is a table full of information about customers, and they are all text data
type information;
This is a Dimension table because you use the data of this table mainly in visualizations
like this:

Most of the time, you use values from this table in an axis or legend of a chart of
visualization, but not as values. (You might use these as values in the table visual, but
that is an exception because the table visual, will show you everything). In the
screenshot above, the value is SalesAmount (coming from a different table), and it is
always sliced and diced by a field from the Customer table; Gender, EnglishEducation,
Lastname, etc. That brings us to the first part of the definition of the dimension table;

Fields from a Dimension table are always used for slicing and
dicing data of other tables (either through filters or slicers or
through axis or legend of the same visual). We call these fields
as Descriptive fields.
You also found that fields of the Customer table above were all text. That can be
different. Let’s check it out here; we have two new fields in the Customer table now;

The date field (BirthDate), and the numeric field (YearlyIncome) will also be used for
slicing and dicing as the below screenshot shows it;

No matter you have date and time fields or even numeric fields in the customer table,
they are used in the same way as text fields, they are used for slicing and dicing.
The data type of the descriptive fields can be text, date and time,
number, or anything else. The data type is not the important
property of these fields; the nature of use case for them is
important, which is slicing and dicing.

Another Type of Field: Values


Descriptive fields are not the only type of field that we work with it in a data model.
There is also another type of field, which is called as Fact. These fields are not used for
slicing and dicing. These fields are used as VALUEs in the visualization. In all
visualizations above, there was a field that was the Value part of the visual:
SalesAmount.

SalesAmount is not used for slicing and dicing. It is used as a value. In fact,
SalesAmount is sliced and diced by the descriptive fields of Gender, Education, and
Lastname. SalesAmount is not a descriptive field; it is a field that is used as a value and
can be aggregated sometimes (like what you see in SalesAmount by Gender or
SalesAmount by EnglishEducation). SalesAmount is a FACT field.

Fields that are the VALUE part of a visualization (not for table,
and a few other visuals), and sliced and diced by descriptive
fields, often aggregated are FACT fields.

Golden Rule: Fact and Descriptive Fields Not in the Same Table!
Now that you know what a Fact field is and what is a Descriptive field, it is time to
explain the first golden rule: Keep these two types of fields away from each other, in
separate tables. Fact fields should be in a table that does not have Descriptive fields in
it and vice versa. Your first question, I guess, is Why?! Well, let’s see.
Let’s assume that we have a table with both types of fields; descriptive fields and also
fact fields, like below;

All the seven first fields are descriptive fields, and the last two (SalesAmount and
Order Quantity) are fact fields. Using it in a model like above works fine, and you can
slice and dice the data of fact fields, by descriptive fields. However, what if you bring
another set of descriptive fields, which are not part of this table? For example, what if
you want to slice and dice the data of the fact fields by some fields from the Product
table, such as ProductName, Color, etc.? Then you end up bringing all those into this
table too, and end up with a super large table, which has customers, products, and sales
values! What if later on, you want to bring dates of the transactions, sales promotion,
store details? Each of these brings a whole lot of fields with themselves.
Say No to a Single Table Including Everything!
One of the other golden rules of data modeling is to say no to a scenario like the above
to happen. Do not bring everything into one large table with hundreds of columns. There
are several reasons for that. I just point a few;
Maintenance of such a big table would always be a challenge. Every time you
bring a new set of fields, you have to combine them into this table.
The level of detail stored in this table is dependent on the fields in it. If you
bring new fields, you need to store more details! This is also a maintenance
problem.
Combining is not always possible! What if you also want to bring a budget into
the same table? Most of the time is not possible because you may not have the
budget data at the level of the customer. You may have it only at the level of the
month and product. Then you end up with multiple very large tables.
Maintenance Challenge
Every time you add new fields, your table needs to be combined with the new table to
have everything in one place!

The level of detail is dependent on the fields in the table! If you add more fields, you
need to store more details.
Combining is Not Always Possible
If the level of details in one or two fact fields is different, then you cannot store them in
one table easily. You have to split into two;

This can become another big table, adding dates, and more details about
the product, etc. and then you need to create a “many-to-many”
relationships between the two tables, which cause heaps of other issues!
So it is not possible to have one table with everything in it, and as you realized, it is
better to decouple the fact fields from the dimension fields. Because then expanding the
model is much easier, and it won’t need a really high maintenance practice.
Dimension Table
Now it is time to go back to the definition of the dimension table;

Dimension table is a table full of Descriptive Fields and zero


Fact fields.

This table needs to be related to other tables, including Fact fields, to slice and dice
their data. So this table needs to have a field that is the unique identifier for each row.
This field in database design practices is called as Primary Key. The primary key is a
single field or combination of fields in the dimension table that can uniquely identify
each row. For example, the Customer table can be identified with the CustomerKey
column as below;

This is the field that will be used for the relationships of One-to-many between
dimension table and fact tables.
The relationship between Dimension table and Fact tables, most of the time, is One-to-
Many from dimension to the fact table. There are instances of having a one-to-one
relationship as well, but not very often.
So, we need to change our definition a bit now;

The dimension table is full of Descriptive fields and zero Fact


fields, and one or more Key fields. Key fields are row identifiers
of the dimension table.

If you have more than one key field (it is also called a compound key), then you cannot
create relationships between tables in Power BI using multiple fields, so you need
to use a workaround I mentioned in chapter five to create a single field for creating a
relationship.
When the Primary Key is Not a Primary Key!
The Primary Key is a column (let’s assume it is only one column for this example), that
is, row identifier. However, what if it isn’t!? Do you recall in your use cases a scenario
that something has to be a primary key, but it can’t be? Let me tell you an example; In a
reporting model for bank information, we have a table of credit cards. Each credit card
has a unique number, so can be used as the primary key;
However, if you look more in detail, a credit card will be renewed every few years,
with a new expiry date, but the same number!

This is an example of a field that should be somehow primary key (or maybe it is
already a primary key in the source operational system), but not in our model. We call
these fields; Alternate Key, Business Key, or the Primary Key in the source system. And
then, we need to create a new field in the Dimension table, which can be a real primary
key. A primary key of the dimension table, because of this reason, is called Surrogate
Key, and most of the times, can be generated as an auto-number index value;

There are other scenarios that you might end up creating an auto-number surrogate key
for your dimension as well. This was just an example to clarify things for you. If
scenarios like this happen, most of the time, you do need to also keep the Alternate Key
beside the Surrogate Key in the dimension table for slicing and dicing.
Dimension Tables are Wide
Another golden rule in designing dimension tables is to make them wide. Add fields that
are related to that; more fields in the dimension table means more power for slicing and
dicing the data. If you have just three fields of Customer first name, last name, and
gender, then it means you can only slice and dice data by these three. However, if you
have a customer table with all columns below, then you will have much more power in
slicing and dicing;

Dimension tables have many fields, and that is why they are Wide. And wider means
more slicing and dicing power.
Dimension Tables are Shallow
Dimension tables, also wide, but often, they are shallow. They do not have many rows.
They are often less than a million rows. However, there are always exceptions. For
example, if Facebook wants to create a customer dimension, it would be hundreds of
millions of rows for sure. A dimension table is not storing transactional data; it is
storing descriptive information. Transactional data is what it makes the number of rows
to grow.
Examples of Dimension Tables
Based on what you learned so far, you can now name some dimension tables in your
scenario. A table might be a dimension in one model, and with some difference in the
set of the fields, it might be a fact table in another model. Here are some samples of
dimensions, and their fields;
Customer: CustomerKey, First name, Last name, Full name, Age, Gender, Job title, etc.
Product: ProductKey, Product Number (Alternate key), Product Name, Color, Size, etc.
Store: StoreKey, Name, Address, etc.
Summary
In this chapter, you learned that there are two main types of fields; Descriptive and Fact
fields. These two fields should not be at the same table. So that is why we have tables
with Descriptive fields and call them Dimension tables. Dimension tables also have key
fields called the surrogate key as a row identifier. The relationship between a
dimension table and fact tables are one-to-many. In the next chapter, I will explain what
the fact table is and how it connects to a dimension table.
Chapter 9: Fact Tables

Fact tables are the core of analysis in a data model. In the previous chapter, I explained
what a dimension table is and why we cannot have everything in one big table. In this
chapter, you will learn about the fact table, and how it is positioned in a data model,
you will also learn how fact table and dimension table are related to each other to build
a proper data model. Examples of this chapter are built using Power BI. However, all
of these concepts can be used regardless of the technology.
What is a Fact Table?
A fact table is a table full of Facts! If you have read the definition of Fact from the
previous chapter, you know that fact is a numeric field that usually needs to be
aggregated and will be set as the value part of visualizations. Examples of Fact is Sales
Amount, Order Quantity, Profit, Cost, etc. A fact table is a table full of those fields. So
you may think a fact table is like the below screenshot.
The screenshot above can be a representation of a fact table based on the definition;
however, it lacks something very important. The data in a fact table should be sliced
and diced by the data of dimensions. For example, you should be able to see what was
the Sales Amount for each product category, for each client, in each store, etc. With the
structure of the table above, you cannot do that. The fact table has another set of fields
too; Keys from Dimension tables.
The screenshot below shows a fact table with keys from dimension tables;

As you can see in the above screenshot, A fact table includes two types of fields; Fields
from Dimension tables (Keys from dimension table, Surrogate keys from dimension
tables), and Facts (numeric and aggregatable fields).

A Fact table is a table in the data model, which


includes Facts and Keys from dimension tables.

Usually, fact tables are named based on their main entity of analysis. For example, if the
table above analyzing sales data, then it can be called FactSales, or just simply Sales.
The Grain of the Fact Table
In the below screenshot, I have two Sales fact tables, which are slightly different! The
first one has three dimension keys of ProductKey, OrderDateKey, and CustomerKey;
The second one, has more dimension keys, it also has PromotionKey and
SalesTerritoryKey;

The first fact table gives me the power to slice and dice data of sales by Product, Date,
or Customer dimension. However, the second fact table, in addition to those
dimensions, will allow me to slice and dice by Promotion and Sales Territory too. The
Grain of a fact table is the level of details stored in the fact table.
The grain for the first fact table is one record per combination of Product, Order Date,
and Customer.
The grain for the second fact table is one record per combination of Product, Order
Date, Customer, Promotion, and Sales Territory.
The more fields you have as a grain in your fact table means, the more dimension you
are connected to, and it means more power for slicing and dicing. On the other hand,
more fields also mean row numbers will increase too, and you will need more memory
to store the data.

If you build your fact table from the lowest grain (the most
detailed list of dimensions), you always get the ability to expand
easily in the future. Although you should keep in mind that it
means your fact table will have more data rows.
Relationship Between Fact Table and Dimension Tables
Every fact table will be related to one or more dimensions in the model. These
dimension tables are tables that their surrogate key (or primary key) is part of the fact
table. For example, the FactInternetSales in the below data model is related to
DimProduct, DimCustomer, DimDate, and DimSalesTerritory.

A fact table has a “many-to-one” relationship to every dimension. The direction of the
relationship is from the dimension table to the fact table. In other words, every
dimension can slice and dice the data of the fact table.
Star Schema: The Golden Schema for a Data Model
The relationship between the fact table and dimension tables around it forms the shape
of a Star. That is why this model is called the Star Schema. The Fact table is in the
center and dimensions around it.
Star Schema is one of the most important concepts of a data model for reporting. Based
on experience and research, the Star Schema is the best type of data model for
reporting. Star Schema means that the fact table and every dimension around it would
have one single direct relationship. If a dimension is related to another dimension and
the fact is not directly connected to the final dimension, this is not star schema anymore
and is called the snowflake model. The discussion of Star Schema is a big topic by
itself, and I explain it in one of the future chapters. For now, this is enough to understand
what is the star schema.
Different Types of Fact Tables
The fact table comes in different types, although most of the time, we all use one type of
it. However, in some scenarios, other types can be very helpful. Let’s check them out.
The types of fact tables that I explain in this chapter are; Transactional, Periodic
Snapshot, Accumulating Snapshot, and Factless Fact Tables.
Transactional Fact Table
This is the most common type of fact table. When a fact table is built from a
transactional data table, then the outcome is a transactional fact table. In a transactional
fact table, we have one row per transaction. The grain of this fact table is at the
transaction level.

The grain of a transactional fact table is at the transaction level.

An example of a transactional fact table is the FactSales that you have seen in the above
star schema. This table includes one transaction per row;

Periodic Snapshot Fact Table


If you don’t really need all the details in the transactional table, or the amount of data is
so huge that keeping the transactions doesn’t make sense, and you would, at the end of
the day, only query data at the aggregated level, then you can create an aggregated table
based on some periods (monthly, quarterly, yearly, etc.). This type of fact table is called
the Periodic Snapshot fact table. The grain of a periodic snapshot fact table is the
desired period and other dimensions.

The periodic snapshot fact table is aggregated on the desired


period.

For example, in the stock exchange market, values and rates are changing every
millisecond. You may not need all those details in the data model. You may create
reports on data by minutes or by hours. In that case, you can create a periodic snapshot
fact table of the desired period.
Here is an example of a snapshot fact table that is aggregated in month level.
Accumulating Snapshot Fact Table
When the snapshot is created through a process, for example; for creating a work order,
first, the work order request has to be raised, then it should be sent to the appropriate
department and manager, then the manager should approve or reject it, then depends on
that action, some other steps might occur. Every row in the accumulating snapshot fact
table will have details of one work order in such a case.

The accumulating-snapshot fact table is great for process-


oriented analysis, such as workflow.

The reason that this table is called accumulating-snapshot is that part of the data coming
later into the table. At the time of processing the table, we might not have the entire data
for a workflow or process.
The below screenshot is an example of an accumulating snapshot fact table. As you can
see, not all columns for every row is filled. Some part of the data might arrive later.
There are columns showing different stages of the process.

Factless Fact Table: Bridge Table


After the transactional fact table, I can say this type of fact table is the most common
type of fact table. A factless fact table is a fact table without any facts! It seems strange;
however, it is like that because of an important purpose. A factless fact table is to check
the NOT part of an analysis. For example, in a model like a star schema above with the
sales table, how can you tell that there was a promotion for a product in a day that no
sales been made for it?

Factless Fact tables are great for analyzing NOT existence of


the data.

A factless fact table doesn’t have any facts in it. It just has key fields from dimensions;
below is an example of a factless fact table for promotions on products on different
dates;

As you can see, this table doesn’t have any facts about it at all. It is just a relationship
table that relates the three tables of Product, Promotion, and Date. That is why these
tables are also called Bridge Tables.
Summary
Fact tables are storing facts and also keys to the dimension tables around. The
relationship between the fact table and dimension tables is “many-to-one” and allows
dimension tables to filter the data in the fact table. The relationship between fact and
dimension tables creates a star formed schema called a star schema. However, there are
lots of details about Star Schema, which I will talk about it in the coming chapters. You
also learned that there are different types of fact tables such as transactional, periodic
snapshot, accumulating snapshot, and factless fact table.
Chapter 9: Do You Need a Date Dimension?

I’ve heard this question many times; “Do you think we need a date dimension?”. some
years ago, the date dimension has been used more, but nowadays less! The main reason
is that there are some tools that are using their own built-in hierarchy of date. As an
example, Power BI has a built-in date hierarchy that is enabled by default on top of all
the date fields, which gives you a simple hierarchy of year, quarter, month, and day.
Having this hierarchy brings the question most of the time that do I need to have a date
dimension still? Or can I simply use this hierarchy of the date? In this chapter, I’ll
answer this question and explain it with reasons.
What is the Date Dimension?
Date Dimension is a table that has one record per each day, no more, no less! Depends
on the period used in the business, you can define the start and the end of the date
dimension. For example, your date dimension can start from the 1st of Jan 1980 to the
31st of December of 2030. For every year normally, you will have 365 records (one
record per year), except leap years with 366 records. Here is an example screenshot of
a date dimension records;
Date Dimension is not a big dimension as it would be only ~3650 records for ten years,
or even ~36500 rows for 100 years (and you might never want to go beyond that).
Columns will normally be all descriptive information about the date, such as Date itself,
year, month, quarter, half-year, day of the month, day of the year, etc.
Date Dimension will be normally loaded once and used many times after it. So it
shouldn’t be part of your every night ETL or data load process.
Why Date Dimension?
Date Dimension is useful for scenarios mentioned below;
1. It gives you the ability to slice and dice values by many date attributes such as
week number, half-year, day of the year, and etc.
2. It makes the analysis consistent.
3. It gives you the ability to do analysis based on public holidays (Easter Monday,
Good Friday, and etc.)
4. There are some BI tools extended functions that need to work with a Date
Dimension.

Let’s look at reasons one by one;


Powerful Slice and Dice
If you have worked with Power BI, you know that there is a date hierarchy built-in,
which gives you fields such as Year, Quarter, Month, and Day. This hierarchy normally
adds automatically to date fields. Here is an example of how it looks like;
This built-in hierarchy is a very good help, but what about times that you want to slice
and dice values based on something other than these four fields? For example, questions
that you cannot answer with this hierarchy are;
1. Weekday analysis of revenue (by Monday, Tuesday, …., Sunday)
2. Which week in the year generates the least revenue or most revenue?
3. What about Fiscal? What is the revenue for the Fiscal year?
4. And many other questions.

The fact is, when you use the simple hierarchy of Year, Quarter, Month, and Day, you
are limited to only slice and dice by these fields. If you want to step beyond that, then
you have to create additional fields to do it. That’s when a date dimension with all of
these fields is handy. Here are some fields that you can slice and dice based on that in
Date Dimension (and this is just a very short list, a real date dimension might have
twice, three times, or four times more fields than this! Yes, more fields means more
power in slicing and dicing)
You can answer questions above without a date dimension, of course. For example, to
answer the first question above, you will need to add Weekday Name because you
would need to sort it based on the Weekday number, so you have to bring that as well.
To answer the second question, you need to bring the Week number of year. For the third
question, you need to bring fiscal columns, and after a while, you will have heaps of
date-related columns in your fact table!
And the worst part is that this is only this fact table (let’s say Sales fact table). Next
month you’ll bring the Inventory fact table, and the story begins with date fields in the
Inventory fact table. This simply leads us to the second reason for the date dimension;
Consistency.
Consistency in Analysis
You don’t want to be able to slice and dice your Sales data by the week number of year,
but not having this feature in Inventory data! If you repeat these columns in all fact
tables, you will have big fact tables that are not consistent though, if you add a new
column somewhere, you have to add it in all other tables. What about a change in the
calculation? Believe me; you never want to do it.
Date Dimension, on the other hand, is a dimension that is shared between all fact tables.
You add all fields and calculations here, and fact tables are only related to this. This is
a consistent approach. Here is a simple view of a data warehouse schema with a date
dimension shared between two fact tables.

Public Holidays Insight


In most businesses nowadays, public holidays or special event’s insight is an important
aspect of reporting and analysis. Some of the analysis in this area is like;
Was the sales at Easter this year better than last year? (you don’t want to
calculate Easter in Power BI, of course)
How is the revenue is public holidays compared with weekends?
How were the sales for the store on opening day? (special event)
and many other questions.
One of the best use cases of date dimension is public holidays and special events. I
have shown previously in another blog post how to fetch public holidays live in Power
BI [https://radacad.com/custom-functions-made-easy-in-power-bi-desktop]. Calculating
some of the public holidays are time-consuming and not easy. Easter, for example,
changes every year. You can also have some fields in date dimension that mention what
the special day is (opening, closing, massive sale, etc.). Your date dimension can be
enhanced a lot with insight from these fields. Here is an example of public holidays
information in a date dimension;

Extended Functions for Time Intelligence


The BI Tools in the market have some extended functions that give you insight related to
date, named as time intelligence functions. For example, talking about Power BI, you
can leverage heaps of DAX functions, such as TotalYTD to calculate year to date,
ParallelPeriod to find the period parallel to the current period in the previous year or
quarter and many other functions. These functions sometimes need to work with a date
dimension; otherwise, they won’t return the correct result!
Yes, you’ve read it correctly, DAX time intelligence functions won’t work if you don’t
have a proper date dimension. A proper date dimension is a table that has a full date
column in one of the fields and has no gaps in dates. This means that if you are using a
Sales transaction table that has an OrderDate field as an input to DAX time intelligence
functions, and if there is no sales transaction for the 1st of January, then the result of
time intelligence functions won’t be correct! It won’t give you an error, but it won’t
show you the correct result as well. This is the worst type of error that might happen. So
having a Date Dimension can be helpful in this kind of scenario as well.
Do I Need a Date Dimension?
Now is the time to answer the question, “Do I need a date dimension”? The answer is:
Yes. Let me answer that in this way: Obviously, I can do a BI solution without a date
dimension, but would you build a table without a hammer?! Would you build a BI
solution that has some date columns in each fact table, and they are not consistent with
each other? would you overlook having a special event or public holiday dates insight
in your solution? Would you only stick to year, quarter, month, and date slicing and
dicing? Would you like to write all your time intelligence functions yourself?
Obviously, you don’t. Everyone has the same answer when I explain it all.
I have been working with countless BI solutions in my work experience, and there is
always a requirement for a Date Dimension. Sooner or later, you get to the point that
you need to add it. In my opinion, add it at the very beginning, so you don’t get into the
hassle of some rework.
How to Build a Date Dimension?
Now that you want to build a date dimension, there are many ways to build it. Date
Dimension is a dimension table that you normally load once and use it always. Because
the 1st of Jan 2017 is always the 1st of Jan 2017! data won’t change, except bringing
new holidays or special event dates. I mention some of the ways to build a date
dimension as below (there are many more ways to do it if you do a bit of search in
Google);
Date Dimension T-SQL Script
You can use a written T-SQL Script and load date dimension into a relational database
such as SQL Server, Oracle, MySQL, and etc. Here is a T_SQL example
[https://radacad.com/custom-functions-made-easy-in-power-bi-desktop] I wrote a few
years ago.
Power Query Script
I have written a series of blog posts explaining how you can do the whole date
dimension in Power Query. There is a chapter in this part that explains how to create a
date dimension using the Power Query script.
DAX Calendar Functions
You can use some functions in DAX to generate a date dimension very quickly as well.
Chapter 10: Power BI Date Dimension, Default,
or Custom?

If you have worked with Power BI for some time, you know that there are two types of
the Date dimensions; Custom or built-in/Default. It is always confusing for people,
which date dimension is good to use, and what is the difference between these two
approaches. Also, based on the selection of the type of Date dimension, your DAX
calculations may differ a little bit. In this chapter, I’ll explain all you need to know
about the default and custom date dimension in Power BI and help you to choose the
right one for your Power BI solution.
Power BI Default Date Dimension
The Power BI’s date dimension is still a bit of mystery for some people. Many people
still are not aware that there is a default or built-in date dimension. This date dimension
hasn’t existed at the very beginning of Power BI Desktop (which was July 2018). It
came a bit after the party. The purpose of this default date dimension is to ease the way
that time intelligence calculations work in Power BI and make the whole experience
easier for the user. Let’s see how this dimension works.
Power BI Creates a Default Date Dimension for every single date
field in your dataset
By default (you can change the default configuration), Power BI creates a date
dimension for every single date field in your dataset. The date dimension is a generic
date dimension with normal fields such as; Year, Month, Quarter, Day and etc. The
configuration in Power BI Desktop that allows the model to create the default date
dimension is here:
In the Power BI Desktop, File menu -> Option and Settings -> Options
In the Options window, under Current File, Data Load; Time Intelligence: Auto
Date/Time
If you have this item enabled, it means that Power BI automatically creates a hidden
date table for each field in the model that has a date or date/time data type. The reason
that it is not easy to find the default Date table is that it is HIDDEN! The main reason
for having this table hidden is that the user will face too much confusion if you see a
Date table per each date data type column. Here is a model created with the default
Date dimension:
In the screenshot above, you see that there are three date fields in the FactInternetSales:
DueDate, OrderDate, and ShipDate. There is a date dimension TABLE for each of these
fields that you cannot see here. But if you have an option in the Power BI settings
enabled, then you can see the Date hierarchy under that table, which shows there is a
table behind the scene.
Why a Date table for every single Date Field?
Well, now that you know there is a default Date dimension, your second question might
be: Why Power BI creates it multiple times for each date field?! The reason is that the
Date usually acts like a role-playing dimension. Sometimes it might be Due Date,
sometimes Ship Date, and sometimes, Order Date. All of these options are date fields,
but their values and behavior are different. If you have read chapter four of this book,
you know that one of the ways to do the role-playing dimension in Power BI is to create
copies of the Date dimension. This is what Power BI does behind the scene
automatically. Power BI Creates a template for the Date table first, and then copy it for
every single date or date/time field in the model. Here is a glance at the behind the
scene of the Power BI model:

Note that you cannot see the view above in the Power BI Desktop. You can use tools
like Power BI Helper [https://radacad.com/power-bi-helper] to get to that information,
though. In the screenshot above, you can see that there is a DateTableTemplate, and then
there are three date tables copied from that template.
What does the Default Date table look like?
There are many variations of the Date dimension in the world, and you may think, what
does the default Date table look like? What columns are available there, and what
columns are not? Here is a list of columns:
The view above can be generated with tools such as Power BI Helper. But if you are
interested to see the list of these column names in the Power BI Desktop, One way is to
see it when you write a DAX expression; after typing a dot (.) after the Date or
Date/Time field name, you get the list of fields;

This might have been the mystery for many people when they write a DAX statement;
What is the list of fields that comes after the dot (.) in front of a date field name? Now
you know the answer;
The Date column is NOT a column from the Power BI model
point of view. It is a TABLE, a hidden table, and because of that,
you can choose which column in that table you want to use
within your expression.

Writing Time Intelligence Calculations with the Default Date


Dimension
Writing DAX expressions for the time intelligence calculations using the default date
dimension is simpler. You just need to use your Date field name plus a “.[Date]” at the
end. This means that you are using the [Date] field of the hidden Date table for the
expression. For example, here is how a year to date calculation looks like:
Sales YTD = TOTALYTD(
SUM(FactInternetSales[SalesAmount]),
FactInternetSales[OrderDate].[Date])

If you don’t use the “.[Date]” then you won’t get the correct result, because Time
Intelligence calculations need a DATE column to work with, and with the “.[Date]”, you
choose the date column in the default hidden Date table.
As you can see, the calculation doesn’t work if you do not include the “.[Date]” in the
expression. But if you include it, then all is good to do. Writing time-intelligence based
expressions using the default Date Dimension is very easy, as you’ve seen here.
Default Date Dimension has a Built-in Date Hierarchy
One of the main benefits of using the default Date dimension is the built-in Date
hierarchy of Year/Quarter/Month/Day it provides. Whenever you drag a Date field,
Power BI automatically shows the hierarchy under visual, because there is a hidden
Date field with the built-in hierarchy behind the scene.

Default Date Dimension Consumes Memory


Oh yes! Of course, like any other table structures in the Power BI in-memory based
structure, every date table consumes memory. But it would do the same even if you
create your custom date dimension! Whenever you do the role-playing dimension
scenario, you are also consuming even more memory! The main difference is that the
Power BI default Date dimension will be created even if you do not want to do the date-
based analysis on a date field! For example, even if you don’t use DueDate in your
date-based analysis, still Power BI creates a date dimension for it. You cannot stop it
for one field. You have to either stop the default creation of the Date dimension for the
entire model or use it for the entire model; you cannot customize it per field. But with
the custom date dimension, you can.
Custom Date Dimension
You may think if the default Date dimension is available, then why should I have a
custom date dimension? Well, there are some reasons for doing that. Let’s first check
how you can use a custom date dimension.
Mark as Date Table
To use your custom Date dimension, you have to mark your table as a Date table. This is
a very important step for Power BI because then It will know that the table to be used
for time intelligence calculations is this table. You can, of course, have more than one
Date table to be marked as the Date table (Because of the same reason of role-playing
dimensions). If you have your custom Date table, here is how to mark it as a Date table;
Go to the Modeling tab in the Power BI Desktop, Then choose the custom Date table of
yours, and select Mark as Date Table.

You have to select a full date column as the Date column of the Date Table as well.
Usually, after this change, if you look at the icon of Date fields under your custom Date
table, you will see them differently (without the default Date hierarchy), which shows
the table now successfully marked as a Date table.
Writing Time Intelligence Calculations with Custom Date Dimension
The prerequisite for this step is to mark the table as Date Table, which we have done in
the previous step, now you can write a DAX expression as easy as below:
Sales YTD = TOTALYTD(
SUM(FactInternetSales[SalesAmount]),
DimDate[FullDateAlternateKey])
As you can see, in this expression, we do not need “.[Date]” to get to the date field.
Because there is no hidden Date table for this column, you just refer to the column to get
the Date field. In fact, if you use the “.[Date]” here, you get an error, because there is no
hidden Date table here.
Default or Custom Date Dimension?
Now that you know all about the default Date dimension and custom Date dimension, is
time for the main question: What is the difference?! when to choose what? The answer,
like many other situations, is Depends! If you want to use generic date-based analysis
and you want to build a model easier and quicker, then the default Date dimension is
very helpful in those scenarios. But if you want to do a special date-based analysis (as
an example; public holiday based analysis), then custom Date dimension will give you
more power. Let’s look at differences in details:
Modeling with the default Date Dimension is Much Easier
It is true that using a custom Date table means taking care of relationships, marking as a
date table, creating the default hierarchy and etc. If you use the custom Date dimension,
you have to spend more time on doing all these configurations, whereas the default Date
table will take care of everything for you.
If you have a Date field, and you are not using it, Remove It!
When you have a date field, Power BI automatically creates a table for it, create the
relationship of that to the main date field, and the table consumes memory! If you do not
intend to use that field, then remove it to save memory. This is also important for any
other fields that you do not use in the model; remove it to get better memory
consumption, but it is even more important for Date fields because, behind the scene,
you will have not just a field, but also a table.
Customized and Complex Analysis is Easier with Custom Date
Dimension
While the default Date dimension is easier for generic date-based analysis, The custom
Date dimension is a very powerful option when it comes to customized analysis. As an
example, let’s say you want to create a date-based analysis based on public holidays.
How can you do that with the default Date dimension? Well, the answer is to create a
list of public holidays as a table and merge or join it to the date field, which means kind
of creating your custom Date dimension. Here in this blog post
[https://radacad.com/custom-functions-made-easy-in-power-bi-desktop], I explained an
example of fetching public holidays for a custom Date dimension in Power BI.
Another example is when you need more than the default hierarchy when you want to
create a weekly hierarchy, financial calendar hierarchy, and many other scenarios. The
default Date dimension is good for generic analysis, but not when it comes to more
customization.
Common Mistake to Avoid!
One big common mistake that I’ve seen a lot, and it comes from the confusion of the
default Date dimension vs. custom Date table, is that we see both in one model! Let’s
see what I mean:
Let’s say you want to use a custom Date dimension, and you add it to your model, you
create the relationship to the date field in the fact table, and then you DO NOT Mark it
as a Date Table! That is where all mistakes start! When you do not mark it as Date
Table, then you allow Power BI to create the default date dimension for the date field
even under this table (considering that the creation of default date tables in enabled in
the options, which is ON by default). As a result, you have a custom Date table, but you
also have a default Date dimension for the date field under your custom date table! It
means you are using extra memory twice! And your DAX expressions also becomes
even more wrong like this:

Sales YTD = TOTALYTD(


SUM(FactInternetSales[SalesAmount]),
DimDate[FullDateAlternateKey].[Date])
This is Wrong! You will get the correct result in the visualization, but doing it this way
is Wrong! Because if you are going to use the default Date table, then what is the point
of adding extra custom Date dimension? If you are going to use the custom Date table,
then you HAVE TO mark it as Date Table.
Summary
The date dimension and its behavior in Power BI can be confusing if you don’t know
about the default Date dimension and how to use your custom Date dimension. In this
chapter, I explained the differences between these two and elaborated on the difference.
Chapter 11: Script to Create Date Dimension in
Power BI using Power Query

Sometimes, you want to create a date dimension very fast, and you want just the script
to copy and paste. So that’s what this chapter is about, here you will find the full script
to create the date dimension in Power BI using Power Query that you can simply copy
and paste it into your solution.
Columns Included in the Date Dimension
This date dimension (or you might call it a calendar table) includes all the columns
related to the calendar year and financial year as below;
How to use the Script?
Create a new blank query in Power BI:

In the Power Query Editor window, go to the View tab, and click on Advanced Editor
Copy and Paste the entire script here (replace the existing text in the Advanced Editor
window:

Script
You can download the script for the date dimension from the book materials.
Configuration
You need to configure the Date table based on your need. The first few lines are the
configurations that you can set based on your need;

Sample Records
Here are some sample records of this table:

Best Practice Suggestion


Because the Date table is a table that is needed not only in one Power BI file but in
many others, I suggest creating a dataflow entity for the date dimension. In one of the
future chapters of this book, I explained that.
Considerations
There are a few things you need to consider if you are using this script;
This Date dimension does NOT include public holidays information. If you
wish to get that, use this approach [https://radacad.com/custom-functions-made-
easy-in-power-bi-desktop].
This Date dimension is not supporting scenarios with fiscal weeks. For those
scenarios, some changes need to be applied to the script.
If you want to use this date dimension in multiple Power BI files, consider
using a dataflow (as mentioned in the previous step).
The script might have an up-to-date version of it with more columns in the date
dimension. You can download the latest version from here
[https://radacad.com/all-in-one-script-to-create-date-dimension-in-power-bi-
using-power-query].
Chapter 12: Script for Creating TIME Table in
Power BI with Hours, Minutes, and Seconds
Buckets

This is a script that you can simply just copy and paste into your Power BI solution, and
get a Time Table with all columns commonly needed.
Why Time Table?
The date dimension gives you the ability to slice and dice your data by different date
attributes, such as year, quarter, month, day, fiscal columns, etc. Time dimension, on the
other hand, will give you the ability to slice and dice data in the level of hours, minutes,
seconds, and buckets related to that, such as every 30 minutes, or 15 minutes, etc.
Time table SHOULD NOT be combined with the Date table. The main reason is the
huge size of the combined result. Let’s say your date table, which includes one record
per day, has ten years of data in it, which means 3,650 rows. Now, if you have a Time
table with a row for every second, this ends up with 24*60*60=86,400 rows just for the
time table. If you combine the date and time table, you will have
3,650*86,400=315,360,000 rows. 315 Million rows in a table are not good for a
dimension table. Even if you store one record per minute in your time table, you would
still end up with over 5 million rows.
So don’t combine the Date and Time table. These two should be two different tables,
and they both can have a relationship to the fact table.
Columns Included in the Time Dimension
This time dimension (or you might call it a time table) includes all the columns as
below;

How to use the Script?


Create a new blank query in Power BI:

In the Power Query Editor window, go to the View tab, and click on Advanced Editor

Copy and Paste the entire script here (replace the existing text in the Advanced Editor
window:
Script
You can download the script for the Time dimension from the book materials.
Sample Records
Here are some sample records of this table:
Best Practice Suggestion
Because the Time table is a table that is needed not only in one Power BI file but in
many others, I suggest creating a dataflow entity for the Time table.
Considerations
There are a few things you need to consider if you are using this script;
This Time dimension is in the grain of a second. Which means the table has one
record per second.
There are some time buckets already created, including hour buckets of 12, 8, 6,
4, 3, and 2, and minute buckets of 30, 15, and 10. I will add more buckets into
this in the future.
If you want to use a Date dimension as well, use this script to generate a Date
dimension.
The script might have an up-to-date version of it with more columns in the time
dimension. You can download the latest version from here
[https://radacad.com/script-for-creating-time-table-in-power-bi-with-hours-
minutes-and-seconds-buckets].

Here is an example of data analyzed by a couple of columns from this Time table;
Chapter 13: Star Schema and How to Build it

You might have heard about dimensional modeling or star schema, but you may have
vague ideas about what it is, or alternatively, you might have used it, but not sure if it is
the right way of doing it. In this chapter of the back to basics series, I’m going to explain
what is Star Schema, and how you can build it, what are the best and worst practices for
it.
Dimensional Modeling
For every BI or reporting system, you have a process of designing your tables and
building them based on that design. This process is called dimensional modeling. Some
others call it data warehouse design, which is the same thing. Dimensional modeling is
the process of thinking and designing the data model, including tables and their
relationships. As you see, there is no technology involved in the process of dimensional
modeling. It is all happening on your head and ends up with sketching diagrams on the
paper. Dimensional modeling is not the diagram in which tables are connected to each
other. It is the process of doing that.
There are multiple methods of building the diagram of tables and their relationships.
Let’s check these, one by one.
Transactional Database diagram
This is a diagram only useful for transactional systems, where the operations such as
insert, update, delete, and retrieve data happens often. In such systems, it is important
that the model is designed in a way that minimizes the efforts of doing an insert, update,
or delete. We are not talking about that method here. In a reporting system, the
transactional database design won’t be helpful. It will make the report slow in
performance. So what are the designs for a reporting model?
Snowflake Schema
The diagram of tables can be in all shapes; however, there are two big categories when
it comes to design a diagram for reporting systems; Snowflake and Star Schema.
Snowflake is when there are many relationships between tables, and when you have to
pass through multiple relationships to get from one table to another. Here is an example:
To path from FactInternetSales tables (which is the fact table) to DimProductCategory
(which is the dimension table) needs passing through three relationships. More
relationships will slow down the performance, and also it will bring the need for the
both-directional relationship. The diagram above is a snowflake schema. In a snowflake
schema, there are many intermediate tables to get from one table to another. To remind
you what snowflake looks like, here is a view of that;
Dimension tables are not directly connected to the fact table. They are related to other
dimensions in between.
Star Schema
Star Schema is when there is a direct relationship between the fact table and each
dimension. Each fact table can have a connection (or relationship) to multiple
dimensions, so it will come up with a view like this:

You cannot unsee the Star in the layout above. The fact table is the heart of the star, and
all dimensions are around it as points of the star. The star schema might no have exactly
five points (pentagram). It might have more or less. The important part of this design is
not the number of points for the star. It is the direct relationship between the fact table
and every dimension.
Star Schema means the fact table in the heart of the star, and a
single relationship to each dimension around it as points of the
star.

If we want to re-design the model you have seen in the snowflake example, it should
look like this:

In this example, the DimProduct includes all details of Subcategory and Category in it
too. And it is directly related to the fact table.
Star Schema Can Include Multiple Fact Tables
A common misunderstanding is that you should have only one fact table in a star
schema. Well, that is true somehow, but the fact is that you can have a combination of
star schemas to build a data model. Look at the example below;
We have two fact tables; FactInternetSales and FactResellerSales. They have some
shared dimensions in between (DimProduct and DimDate), and they also have their
separate dimensions (DimCustomer for FactInternetSales, and DimReseller for
FactResellerSales). The model above is a perfect star schema design. Because each fact
table still needs one single relationship to a dimension. A model like above, give us the
ability to filter both fact tables based on shared dimension in between;
The Importance of the Star Schema
Star Schema is the best way of designing a data model for reporting. You will get the
best performance and also flexibility using such a model. A lot of modeling challenges
and issues in Power BI implementation these days can be resolved with a good star
schema design. Star Schema is like a conductor of an orchestra. An orchestra without a
conductor can play music, like a Power BI implementation without a star schema.
However, the music won’t be synchronized, it won’t be the best it can do, and it won’t
be what the audiences are after. However, if the Power BI implementation leverages the
Star Schema (like the orchestra with a good conductor), it would play its full capacity,
answer all the requirements, and stun the audience.
[image reference:
https://i2.wp.com/www.irishtimes.com/polopoly_fs/1.1664308.1390496586!/image/image.jpg_gen/derivatives/b
ox_620_330/image.jpg]

Star Schema for a Data Model is like a Conductor for an


Orchestra. The best analytics outcome would be possible with
the star schema, like the best music outcome with a good
conductor.

How to Design Star Schema


Mastering Star Schema design can take years of experience. There are some
fundamental rules to learn at the beginning, and then there are heaps of tips and tricks
along the way. There are exceptions that you need to learn how to deal with them. There
are books under this subject explaining how to design star schema. It is impossible to
explain it in one chapter. However, I can point out some of the fundamental rules that
help you to start.
Define Dimension Tables
The design of everything in one huge table is not flexible, neither performs well.
Dimension tables are descriptive tables that describe the happening of an action (which
is the fact). Fields from dimension tables are often used as slicers or axis of visuals.
Define Fact Tables
Fact tables are the heart of the star schema. You have to be careful when you work with
them. Fact tables are huge and deep tables that can make your data model very big. Fact
tables are tables that explain the happening of action, such as sales, production, etc.
The fact table includes fields which are playing the VALUE part of visuals and
calculations, called facts. Fact is a numeric value that can be aggregated.

Flatten Dimension Tables


It is important that we keep the Star Schema design, which is a single relationship
between fact and dimension tables. You cannot get there if you have a dimension for
every single field. Having a dimension for Category, another for subcategory, another
for Brand, another for Color, and then one for Product details is not good for Star
Schema design. That way of modeling is good for a transactional database. You have to
combine tables as much as possible (As much as the meaning is still the same entity) to
achieve a flatten dimension.

Do NOT Flatten Fact Table


Although flattening dimension tables are good for a star schema design, the same
approach doesn’t work for the fact tables! The reason is that the result of flattening is to
create a wider table and bring everything in one table. We do that for dimensions
because we want them to be one dimension serves it all for one entity. And to avoid
extra relationships between dimensions. Flattening the fact table will lead us into the
trap of having everything in one big huge table. We will lose flexibility and also
performance.

Don’t Expect an Action from Dimension Table


Dimension tables are descriptive tables and should have a single-directional one-to-
many relationship to a fact table. However, most of the time, people come to me and
say, I want to filter my dimension data by the fact table, so I need a both-directional
relationship. The reality is that you never want to filter something by fact table because
the fact table is just a table full of numbers! You want to filter that by something which
filters the fact table. So you want to filter a dimension by another dimension at the end.
If the existing fact table doesn’t give you that option, it is either of these two: The fact
table doesn’t include all the facts, and you can add more facts into that, which brings the
desired output. Or the current fact table should not be used for this purpose, and you
might need another fact table. This is a tricky part of the modeling, and I will dedicate a
separate chapter to it later on.

No! You Don’t Have Just One Table!


When I teach in my classes about the star schema, dimensional model, fact and
dimension tables, I hear this a lot: “This stuff are interesting; however, in my case, I just
have one single table!”. My answer is always this: “No! You don’t have just one Table!
Have a closer look!”. Then I go through the table with them. Their table is just one
table, but it has descriptive fields and facts all in one table. Of course, it will work for
that single table. However, analytics requirements don’t end with that single dataset.
After using that for a while, end-users will come up with new requirements and saying
that “OK, Nice, now we want that data to be combined with this data,” and that is when
the end of an era for that single table model comes fast. If the author of the model, in the
beginning, thought about such thing, It would have been easily extensible with adding
more fact and dimension tables. But now, it means to re-do the whole model. So, here I
Say again: “No! You don’t have just one Table! Have a closer look!”
Summary
In this chapter, you learned what is dimensional modeling, and also learned that Star
Schema is the best way to design your model. You also learned that Star Schema doesn’t
mean one single fact table. It can include multiple fact tables. In the end, you learned
about a few fundamentals of designing a star schema.
Part 3: Star Schema in Action

Now that you learned about the principles of data modeling and star-schema, you are
ready to learn some sample models creating using the principles mentioned.
Chapter 14: Combining Dimension Tables in
Power BI using Power Query

Is it good to have too many dimension tables? Can you combine some of those tables
together to build one flatten dimension table? How much should you flatten it? Should
you end up with one huge table, including everything? In this chapter, I’m answering all
of these questions and explaining the scenarios of combining dimensions.
Sample Dataset
The dataset for this model is the AdventureWorksDW2012.xlsx Excel file. For this
example, we need these tables: DimProduct, DimProductCategory,
DimProductSubcategory, and FactInternetSales.
In the dataset, we have three tables related to Product information; DimProduct has the
details of every single product;
DimProductCategory has information about all categories of products;

DimProductSubCategory is the second level of the hierarchy, where we have


subcategories under each category;
As you can see, every record in the subcategory table belongs to a category, or let’s say,
have a value in their ProductCategoryKey column, which is then the key to find the
higher level category in the DimProductCategory table. If we want to design a hierarchy
of Products, this is how it would look like;

Design Challenge
Let’s say we load all of the tables above (plus the FactInternetSale table, which is our
transactional sales table) into a Power BI model. This is how the relationship diagram
looks like:
Note that Power BI automatically creates a couple of these relationships, but you need
to create the one which is highlighted yellow yourself, connect ProductSubcategoryKey
from DimProduct to ProductSubcategoryKey in the DimProductSubcategory table.

The Need for Both-Directional Relationship


Sometimes
The directions of filters, as you can see in the above diagram is from category to
subcategory, and then to the product, and finally to the fact table. If we filter data in the
fact table using a column from any of these three tables, then we get slicing and dicing
working perfectly; higher levels of hierarchy filters lower levels of it easily as below;
however, what about this: we want to filter the higher level of the hierarchy using the
lower level! For example, I want to know what the product category that Mountain
bikes belong to is?

As you can see, the visual above isn’t helpful. Product Subcategory cannot filter the
ProductCategory table, and the reason is obvious; the relationship is single directional
between these tables from the category to the subcategory.
Sometimes, depends on the requirements, the design above leads
to the need for both-directional relationships, which is not
recommended.

Creating a Hierarchy of all three tables


Let’s say you want to make the report design consistent and want to create a hierarchy of
all of these three fields (category, subcategory, and product) and then use it in all
visuals.
Note that it is possible to create a hierarchy of fields directly in each visual by dragging
and dropping fields after each other. However, creating the hierarchy in the model
makes this process more consistent, and then you can just drag the hierarchy in every
visual you want.
The challenge is that you cannot create a model hierarchy between fields that are not on
the same table! You may need to create calculated columns in the Product table using the
RELATED DAX function to achieve this process. This is so much process to achieve
something really simple.

You cannot create model hierarchies between fields that come


from multiple tables.

Extra Relationships! Why?!


The next challenge of the existing model is that we have so many relationships to cover
something really simple. In this model so far, we have three relationships. In the real
world scenarios, you don’t have only four tables, you will have hundreds of tables, and
if you follow this approach, you will have hundreds of relationships! Relationships will
cost processing power when it comes to slicing and dicing. You won’t notice it for
small models with a few tables, but you will see the difference when the model grows.
If you don’t think much about your data model, and just add tables and relationships to
it, soon you will end up with a model much more complicated than this:

The Model is Confusing


When you have more tables in your model, you will make it more confusing at the
reporting side. There are too many tables having product information on it! Why such
confusion for such a simple model?!

Where is this designing concept coming from?


So the design above has many challenges. Let’s see where the design concept is coming
from? The design that you see, which has one table per attribute or function, is coming
from transactional database design. In the world of transactional databases, it is
important to design the database in a way that you can apply CRUD (Create, Update,
Retrieve, and Delete) operations easy and fast. And one of the best ways of doing that is
to separate each entity as a table; category as a table, subcategory as another table, the
color of the product as a table, brand as another table, and etc. in such a design, you
will have probably more than ten tables storing different pieces of information about the
product itself. That design is great for transactional systems, but it is not working for
reporting systems. You need a different design for reporting.
Flattening Dimension Tables
In a reporting data model, your approach should be designing a star schema and having
a fair amount of dimension tables. Your dimension tables can include everything about
that entity. For example, if you have a Product dimension, then it can include color,
brand, size, product name, subcategory, and category of the product, all in one table.
This way of design will avoid all challenges and issues I mentioned earlier in this
chapter and many other challenges too.
So the solution to our design challenge is to flatten the Product table by combining
DimProductCategory, and DimProductSubCategory with it. Let’s see how it is possible;
Combining or Flattening tables into one Dimension
There are different methods you can do this flattening process, you can flatten it using T-
SQL if you have relational databases as a source, you can use DAX to do it, or
alternatively, you can do it using Power Query, which is the method I am explaining
here.
Go to Edit Queries in the Power BI Desktop;

In the Power Query Editor window, click on Product table, and from the
Combine section, select Merge Queries;

In the Merge Queries window, check that the DimProduct is the first table, and then
select ProductSubcategoryKey from it, then select DimProductSubcategory as the
second table, and also the ProductSubcategoryKey column in there too, and then click
on OK.

Merge table is an operation which flattens two tables based on matching field(s). I have
written fully in detail what is Merge and the difference of that with append
[https://radacad.com/append-vs-merge-in-power-bi-and-power-query], and also what
are different join kinds in the merge operations [https://radacad.com/choose-the-right-
merge-join-type-in-power-bi]. In this operation, the default join kind would work for
us, but make sure to read the difference between all kinds of joins and their output
samples here [https://radacad.com/choose-the-right-merge-join-type-in-power-bi].
The Merge operation will create a new column in the DimProduct table (at the end of
all columns), which then can be expanded into details coming from
DimProductSubcategory. This way, we are combining product and subcategory together;

I only selected EnglishProductSubcategoryName, and ProductCategoryKey (this one is


needed to merge it at the next step with DimProductCategory). Here is the result;

As you can see, the two columns above are now part of the Product table.
We do the Merge Operation one more time, and this time will select
DimProductCategory as the second table using the ProductCategoryKey in both tables to
merge.

And then expanding it to the details of DimProductCategory, which in this case is only
ProductSubcategoryName
Finally, we will have all category and subcategory details in the DimProduct as below;
You can see that after the second merge, I also deleted the ProductCategoryKey column,
which is not needed anymore.

Do not load intermediate tables


It is important to set the “enable load” property of the two intermediate tables
(DimProductCategory and DimProductSubcategory) to unchecked now. These two
tables are not needed to be loaded into the Power BI model directly. They are feeding
the data into the DimProduct, and that is the only table we need in the model. To learn
more about the Enable Load option and the performance tips about it, read my chapter
here [https://radacad.com/performance-tip-for-power-bi-enable-load-sucks-memory-
up].

Now, if you load the data into the Power BI model, you will have a simple model such
as below;
You can easily get slicing and dicing working correctly;

You can also build a model hierarchy and use it everywhere you want in the report;
How about other tables? Should I flatten all dimensions into one?
The method above was helpful, so you might think that let’s do that for other tables. Why
not combining customers and products together as an example! Well, flattening has a
golden rule; Flatten it as long as entities have a meaningful relationship with each other!
The product category and product have a meaningful relationship. Every product
belongs to a category. However, products and customers don’t have a relationship
unless there are sales transactions made! We do already have a sales transaction table,
which is the heart of this model, and flattening dimension tables into that, will make it
huge, and also it will reduce our flexibility if we want to connect another table later on
to product or customer. So in this scenario, we will not flatten customers and products.
We will keep them as their own dimensions, with a relationship to the fact table,
building a star schema.
Summary
In summary, flattening dimension tables will avoid a lot of challenges in the future in the
reporting system. It is recommended to flatten attributes into one big dimension for each
entity. The product dimension can have all information about the color, size, brand,
product number, category, and etc. and the Customer dimension can have everything
about the customer job title, education, age, and etc. You learned through this chapter,
what are some of the challenges when dimensions are not flattened, and you learned
how you could use the Power Query Merge command to combine them together.
Chapter 15: Creating a Shared Dimension in
Power BI Using Power Query

What is a shared dimension, and why do you need that in your Power BI model? In this
chapter, I will explain how it can prevent many issues, as well as the need for a both-
directional relationship or many-to-many relationship.
Sample Dataset
To follow the example in this chapter, use the excel file named “Shared dimension data
source.xlsx” In this sample dataset, we have three tables as below;
The inventory table shows the inventory details in each day in each warehouse.
The sales table shows sales transactions.

The manufacturing table; shows summarized information about the cost of producing
each product based on date.
Design Challenge
When you load the three tables above in a model, you can see that they are not related to
each other.

Many-to-many Relationship Issue


If you try to create the relationship yourself based on Product, for example, between the
two tables Inventory and Manufacturing, you get the message pop up about the need for
a Many-to-Many relationship!
Don’t click on creating the relationship. A many-to-many relationship is not a perfect
type of relationship to be used. The reason that the relationship can only be created as a
many-to-many relationship is that the Product column in none of these tables has unique
values. In fact, there is no single product table with a list of unique values of products,
so there is no single table that can be used as a source of a One-to-many relationship.

When none of the two tables on both sides of a relationship,


doesn’t have unique values for the relationship field, then many-
to-many will be suggested. However, It is not recommended to
use it. Read the rest of the blog chapter to learn how to fix it
using a shared dimension.

Both-directional Relationship Issue


Another issue that usually happens is when you have a unique list of values; however,
you still want to slice and dice based on both tables. Let’s say you want to create the
relationship between the Inventory table and the Manufacturing table, but this time is
based on the Date field.
Because we want to slice and dice data of Inventory by Dates selected from the
Manufacturing table, and vice versa, then the relationship needs to be both-directional.

A both-directional relationship usually happens when you want


to use fields from both tables for slicing and dicing. A both-
directional relationship has a significant effect on performance
and is not recommended. Read the rest of this chapter to learn
how it can be fixed using a shared dimension.

Another issue with the both-directional relationship is that you cannot apply it to all
relationships in your model because it might create a circular reference scenario!
Master List Does Not Exist!
The third issue of design with the three tables above is that there is no master list! There
are some products on each table, and we do not necessarily have all products on each
table. Or there are some dates in each table, and we do not necessarily have all dates in
each table (Power BI will create an auto date dimension which can resolve this issue
only for Date fields, but what about other fields such as Product?).
To explain the issue, I’ve created both directional relationships between all three tables
to make sure that they are all filtering each other. All of the relationships are based on
Date fields.

Then I created a table with the Date field from the Sales table as a slicer and three table
visuals from each table. The date slicer should be able to filter all three tables based on
the Date selection;
If I select a field in the date slicer, it will filter all three tables (because of both-
directional relationships). However, if you look closely, the two dates mentioned in the
above screenshot, and some other dates in the Inventory and Manufacturing tables,
doesn’t exist in the slicer. Because the date slicer is coming from the Sales table, and
the Sales table doesn’t have those dates in it! We will have the same challenge with the
Product slicer if we add it.

If you use a field as a slicer that doesn’t have all possible values
in it, then it cannot show the right data from all tables. It is not
recommended to design it this way. Read the rest of the chapter
to learn how a shared dimension can fix this challenge.

Shared Dimension: Solution


I just mentioned three of the challenges you might have with a design like above. In
reality, you don’t have just three tables, you will have much more, and you will have
many more challenges with a design such as above. The best practice to design such a
model is to create a shared dimension. A shared dimension is a dimension that is shared
between multiple fact tables.
Well, now, in the design above, what are our dimensions? Date and Product. What are
fact tables? Sales, Inventory, and Manufacturing. The challenge is that there is no
dimension table. Dimensions are fields inside the fact tables, and this creates
inconsistency and design approaches. What you should do is to build the two tables
separately. Because the Date and Product tables will be tables that slice and dice all of
the fact tables and will be related to all fact tables, we call them shared dimensions. A
shared dimension is just a dimension that is shared between multiple fact tables.
a design sketch of tables above with shared dimensions would be like this:

Creating Shared Dimension


Now that you know what the shared dimension is, and how it can be helpful, let’s see
how we can add it to our design. You can build the shared dimension in many ways,
using DAX calculated tables, using t-SQL (if sourced from database systems), or in
Power Query. Because Power Query is applicable regardless of the data source you
select, and also because the data transformations step is better to be done in Power
Query rather than DAX, I am going to show you how to do it in Power Query.
Go to Edit Queries in the Power BI Desktop;
Prepare sub-tables
In the Power Query Editor window, right-click on Inventory table, and create a
reference from the query;
Reference will create a copy of the existing query, with reference to the existing query,
which can now have extra steps in it. In the new query, right-click on the Product table
and remove all other columns.
The Inventory table should now look like this:

Right-click on the Inventory (2) table and uncheck the Enable Load option for it. This is
to save performance and avoid loading extra tables into the memory of Power BI
Desktop.
Do the same process now for the other two tables; Manufacturing and Sales;
create a reference from each table
Only keep the Product table and remove other columns
uncheck the enable load in the new query

You should now have the new three tables with one Product column only in each:
Set all column names to be the same
The next step is to make sure the column names are exactly the same. Because we are
going to append the three tables, if we have different names, then it would create extra
columns. Names should be an exact match, and remember that Power Query is a case-
sensitive language; “product” is different from “Product” in the Power Query world. In
our sample model, the two tables, Inventory, and Manufacturing, have the column name
as Product, but in the Sales table, it is called Product Name, rename it to Product.

Append all three tables


Append all three tables together to create one table with all Product values in it. If you
like to learn more about Append, read my article here [https://radacad.com/append-vs-
merge-in-power-bi-and-power-query].
Then in the Append command window, select Three or more tables, and all the new
tables in it;

The output of the append would be one table, including all Product values; you can
rename this query to Product.
Because this is a table you want to be loaded into Power BI Desktop, make sure the
Enable Load of this table is checked. This table is our master list, including all product
values. However, there are duplicate values in it that have to be removed.
Remove Duplicates
A dimension should have a unique list of values, so we need to remove duplicates for
the key field here.
Before using remove duplicate, make sure to read this article
[https://radacad.com/remove-duplicate-doesnt-work-in-power-query-for-power-bi-
here-is-the-solution] about important tips you need to know before applying remove
duplicate in the Power Query. In a nutshell, because Power Query is case-sensitive, and
because space at the end of text values, and other special characters may end with
keeping duplicate values, this is how you would remove duplicates in few steps;
Clean transformation
Trim transformation
Transform to Upper Case
Remove Duplicates
Now you have your Product shared dimension ready! Repeat this process for any other
shared dimensions with the relevant fields. However, for the Date table, we would do it
differently.
Date Dimension
You learned in an earlier chapter how to create a date dimension using Power Query
script.
Best Practice Design: Star Schema and Shared Dimensions
After loading the tables above, you can create a one-to-many relationship single
directional from Product and Date tables to all other fact tables. This is the final design
based on our example;
The above design uses two shared dimensions, and avoided all challenges mentioned;
It doesn’t need both-directional relationships
It doesn’t need many-to-many relationships
Product and Date tables are master tables which can be the source of any
slicing and dicing
To make sure that you won’t use incorrect fields for slicing and dicing, make sure that
you hide Date and Product columns in all the three fact tables as below;
This solution can now have proper reporting capabilities as below;
Summary
Nothing is worse than a bad data model design. Bad data model design causes using
relationships that decrease the performance, it causes writing a lot of unnecessary DAX
expressions to cover for the wrong design, and at the end of the day, it performs slow. In
this example, you learned one of the basics, but the most fundamentals of designing
Power BI data models. Using a shared dimension in your model will avoid both-
directional and many-to-many relationships. You learned how easy it is to create such a
dimension. This method can always be used in your Power BI data models.
Chapter 16: Combine Tables or Create
Relationships?

Despite all these chapters about the importance of star-schema and design principals of
fact and dimension tables. This might still be the question you may ask: “When is good
to combine tables vs. loading them in the Power BI and creating a relationship between
them?” This chapter is going to answer this question.
Relationships
In Power BI models, you can load tables into the model and create relationships
between tables. The sample below is an example of that type of relationship;
Combining tables
You can also combine some of those tables together and have a simpler model with
fewer relationships, like the example below;
To combine tables with each other, you can use the Merge [https://radacad.com/append-
vs-merge-in-power-bi-and-power-query] transformation in Power Query.
Relationship or Combined?
So the question for this chapter: Should you combine the Product, ProductCategory, and
ProductSubcategory tables into one Product dimension? Or should you leave them as is,
and create relationships between tables in the Power BI? Let’s talk about the pros and
cons of each method.
Performance
One of the distinguishing features of a reporting system is the performance of it. If you
are building a reporting system, and the loading the report takes five minutes, then what
is the benefit of this report instead of doing it in the operational system? Because it
would have taken that much in the operational system anyway!
When you build a reporting system, you should target creating reports that load fast.
Having extra relationships doesn’t help with performance. The DAX queries behind the
scene would be longer.
If I have a visualization like this:
The DAX query generated by this visual is different when tables are combined or when
they are separately connected using relationships.
This is the DAX query when separated product tables with relationships:
// DAX Query
DEFINE
VAR __DS0Core =
SELECTCOLUMNS(
KEEPFILTERS(
FILTER(
KEEPFILTERS(
SUMMARIZECOLUMNS(
'DimProductCategory'[EnglishProductCategoryName],
'DimProductSubcategory'[EnglishProductSubcategoryName],
'DimProduct'[EnglishProductName],
"CountRowsDimProduct", CALCULATE(COUNTROWS('DimProduct'))
)
),
OR(
OR(
NOT(ISBLANK('DimProductCategory'[EnglishProductCategoryName])),
NOT(ISBLANK('DimProductSubcategory'[EnglishProductSubcategoryName]))
),
NOT(ISBLANK('DimProduct'[EnglishProductName]))
)
)
),
"'DimProductCategory'[EnglishProductCategoryName]",
'DimProductCategory'[EnglishProductCategoryName],
"'DimProductSubcategory'[EnglishProductSubcategoryName]",
'DimProductSubcategory'[EnglishProductSubcategoryName],
"'DimProduct'[EnglishProductName]", 'DimProduct'[EnglishProductName]
)
VAR __DS0PrimaryWindowed =
TOPN(
501,
__DS0Core,
'DimProductCategory'[EnglishProductCategoryName],
1,
'DimProductSubcategory'[EnglishProductSubcategoryName],
1,
'DimProduct'[EnglishProductName],
1
)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
'DimProductCategory'[EnglishProductCategoryName],
'DimProductSubcategory'[EnglishProductSubcategoryName],
'DimProduct'[EnglishProductName]
If tables are combined, the code is like below:
// DAX Query
DEFINE
VAR __DS0Core =
SUMMARIZE(
'DimProduct',
'DimProduct'[EnglishProductCategoryName],
'DimProduct'[EnglishProductSubcategoryName],
'DimProduct'[EnglishProductName]
)
VAR __DS0PrimaryWindowed =
TOPN(
501,
__DS0Core,
'DimProduct'[EnglishProductCategoryName],
1,
'DimProduct'[EnglishProductSubcategoryName],
1,
'DimProduct'[EnglishProductName],
1
)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
'DimProduct'[EnglishProductCategoryName],
'DimProduct'[EnglishProductSubcategoryName],
'DimProduct'[EnglishProductName]

The second code is much simpler. And it would perform faster when you have many
large tables.
Simplicity vs. a Mess
Another big deterministic of a good reporting model is the ability to make it easy-to-
understand for data visualizers. If a data visualizer has 500 tables in a model all inter-
related, he/she would have a hard time to understand the model.
Having a few tables and relationships is always helpful to make the model simple to
understand.
The complexity of the relationship
Relationships come with a cost of complexity. You might need a both-directional
relationship sometimes, or you might get into the problem of having an inactive
relationship and trying to resolve that. Less relationship means fewer problems of this
type when it is unnecessary.

Avoid this trap: A big table including everything


The above items might lead you to the thinking that if you have no relationship at all,
then your model is the best. No, It is not. If you combine everything in one table, you
will have other problems.
When everything is in one big table, then changes would become really hard. Every new
table should be combined with the other tables. What if there is a table that you want to
use in multiple Power BI files? How do you want to re-produce the steps to generate
that table when everything is part of one big table?
Star-Schema
Here is the answer to the question of this chapter. You should not have tons of
relationships, and you should not create one big table with everything in it. So what is
the solution then? The answer is to design a totally different schema. Design a new
layout of tables and relationships. Design something that is different from the operating
system’s database. Design something that helps to get the best of both worlds.
This model has many names; dimensional model, star schema, data warehouse, etc.
Through the process of creating this model, you combine some of the tables
(dimensions) and keep some of them to be connected through the relationships
(dimension to fact table relationships).
Build it from your existing tables
What if you don’t have a reference table that you can use as a dimension? What if there
are only mapping tables in your data source? Many have faced the same challenges. If
you don’t have a reference table, you can create them from your mapping tables.
Here is an example, in which I explained how you could build reference (dimension)
tables;
Just because you can, it doesn’t mean you should
Just because you can load all the tables into the model and then create a relationship, it
doesn’t mean it is the best thing to do. Just because you can merge all the tables using
Power Query, it doesn’t mean you should do it.
The process of data modeling is not based on CAD DOs; it is based on SHOULD DOs.
Power BI modeling starts with a pen and paper
This is very important that you start your modeling by a pen and paper. Yes, no tools or
services are needed except just a pen and paper. You need to draw the dimensional
model, the layout of the tables. You need to list fields in your dimension and fact tables.
All of these are much better to be done using a pen and paper. You can then take the next
step of implementing it using Power BI.
A good model
A good model should consider many things. It should first start with the requirement.
How are you going to build a database diagram of a reporting system when you don’t
know what values should be shown in the report?
A good model also requires an understanding of the existing data and how that data can
be transformed into the star-schema model.
Don’t build your model on this rule that; this is the format that I get data, so I can’t
really do much about it. This is wrong! You need to transform that format.
My report works just fine; why bother?
Why bothering if your report is running just fine? Or if it doesn’t have any performance
problems? Do you really need to think about the model if you got only a few tables, and
everything works as it should? Why spending time on something that the users won’t
see?
Yes, you need to work on your model. If you didn’t do that from the beginning of your
implementation, do it now. Your model gets bigger and bigger every time. You will have
more tables, more relationships, bigger challenges.
Building a proper data model is like building a base for a 100 stories building. Would
you first build the 100 stories and then come and fortify the base?! of course not. You
should do it as soon as possible, or your whole building might collapse at some point.
Your model, your world
Your Power BI data model is your world. It is up to you to decide in which world you
want to live in. Do you prefer to live in a model that is too complicated that every time
someone asks for a change, it ends up with hours and often days of work because it is
not designed in the way it should? Or do you prefer to build a model that is a high
performance, simple, easy to change, and simple to integrate? That is your choice; your
model, your world.
Chapter 17: What Fields to Hide in Your Power
BI Solution?

There are some ways to tidy up the Power BI solution, and one of them is hiding fields
from the report view. This option, although simple, has a significant impact on making
your Power BI solution very tidy and clean and easier to maintain. However, you might
wonder what fields have to be hidden in Power BI? In this chapter, I will explain how
to hide a field in report view, and then what are fields which are a good candidate for
this option.
Hide in Report View
When you have too many fields in the Power BI model, the report tab (or visualization
tab) would be a nightmare to investigate. You might look for a field and find multiple
instances of it, or on the other hand, find fields that are not important for visualization
purposes anyways. Here is an example of seeing fields like that;
There are two ProductKey fields, which will confuse users, which one to use for the
visualization. Or there are many key columns in the below report, which doesn’t really
make sense for the user that what is actually a key column?! what is the usage of those?
When a field is not needed to be in visualization, we can simply hide it from the Report
view. This option is possible by just a right-click on the field and choosing Hide in
Report View.
This option is available in all tabs in Power BI Desktop (Report, Data, or Model tab).
However, the label for that option in some tabs is “Hide in report view”; in some others
is just “Hide.” When you use that option, you would still see that field in the Model and
Data tab, but greyed out as below:

However, you won’t see that hidden field in the Report tab any more:
This is how you can use the Hide option to clean up your model.
Hidden Field can be used in a Calculation
It is important to know that a hidden field can still be used in a calculation. So if you
don’t use the field directly, you can make it hidden and use the calculation that generates
from it in the visualization.
Hiding Multiple Fields
You can also easily hide multiple fields in Power BI Desktop. The best place to do that
is to go into the Model tab. Then select all fields that you want to hide (using Ctrl or
Shift with selection), and then in the Properties tab beside it, turn the Is Hidden property
to On.
This option was not previously available, so you had to use Shift+F10 to select multiple
items and then set this property for all of them, but luckily the new Model tab is capable
of doing that.
What Fields Has to be Hidden?
Now that you know what is hiding fields, and how to hide fields or even multiple fields,
let’s see what fields are a good candidate to be hidden. Let’s first start with fields that
should NOT be hidden!
Do NOT Hide
Obviously, if you are using a field in any visualization, then you should not hide it!
That brings us to the conclusion that if you do not use a field in visualization, then you
can hide it. Well, there is a big exception:
Consider Remove before hiding
If you have a field that you do not use in any visualization, but also you don’t use it in
any calculations, or relationships, or anything at all in the Power BI report, then that
field should not be hidden. That field should be removed from the model. You can use
Power Query to remove any unnecessary fields. If there is a field that you do not use in
the model at all, then don’t load it into the model, because every field consumes
memory, making those fields hidden won’t help the performance of your model.
Hide These Fields
If there is a field that you need to bring into the model for any reason, but won’t use it in
visualization, then this is a good candidate to be hidden. here are some of these types of
fields:
Fields used in a Relationship
Fields used to Sort Other Fields
Fields used in Hierarchy
Fields used in DAX Calculations, but not in visuals directly
Fields used in a Relationship
In the example below; DimProduct and FactInternetSales are related to each other using
ProductKey:
We cannot remove the ProductKey from this model because we need it to connect the
two tables. However, for users, the ProductKey is unnecessary, because they will use
other Product columns such as Product Number, Name, Size, Color, etc. So these fields
are a good candidate to be hidden from the report view (from both tables).
Fields used to Sort Other Fields
For example, in the below Date table, the EnglishMonthName column is sorted by the
MonthNumberOfYear column;
MonthNumberOfYear column is needed in the model for sorting, but most probably not
for the visual itself. The MonthNumberOfYear column is a good candidate to be hidden.
Fields used in Hierarchy
When you have a hierarchy of fields, most of the time, you don’t need fields
individually. Because you can drag and drop them individually from the hierarchy if you
want, having them not hidden will confuse users in the report view:
Fields used in DAX Calculations, but not in visuals directly
There are many times that you don’t use a field directly in a visual, but you create
variations of calculations from it, and then use those calculations in visuals in Power
BI. Those fields are good candidates to be hidden.
Tools Can Help: Power BI Helper
Power BI Helper [https://radacad.com/power-bi-helper] is a FREE tool that can help
you determine what fields are used in all the Power BI report visuals, and then you can
hide the rest, it would help you to tidy up your model.
Summary
Hiding fields in Power BI is very simple but effective for cleaning up your data model.
In this chapter, you learned how you could hide one or multiple fields in Power BI. You
also learned what fields are a good candidate to be hidden. Now it’s time to go and
check your Power BI Model to see if you have any fields that need to be hidden? You
can always use a tool such as Power BI Helper to find that out easier.
Chapter 18: Build Your First Star Schema Model
in Action

Dimension and fact tables are the two essential parts of building a data model, and their
relationship in the form of star schema is the best practice of doing the modeling.
However, these are all theories until you really experience it in a real-world example.
In this chapter, I want to put everything you learned in the past few chapters into
practice and show you how we can design a star schema for a dataset. I will be using a
movie dataset for this example.
Sample Dataset
The sample dataset used here is coming from two sources;
The movie's sales information is fetched from the Box Office Mojo website
[https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW].
The movie’s rating information is fetched from the IMDB
[https://www.imdb.com/chart/top/].
If you are interested in how to do the data mashup and preparation for this dataset, read
my blog here [https://radacad.com/get-started-with-power-query-movies-data-mash-
up].
However, you don’t need to do the steps for data preparation. This dataset is now ready
to use. Start with the Power BI file in the book materials names “Movies.pbix.”
Movies Sales information is a data table like this:

Not Everything in One Table


Let’s start the modeling. The table of Movies Sales, as you see in the above screenshot,
has everything in one table. It is obvious that you can load it into Power BI, and start
slicing and dicing and building visuals based on it, and it would work fine. However,
That is where all mistakes start. If you load this table as is, into Power BI, and after a
few months, you realize that you also have another table somewhere. Such as the IMDB
Rating table below (This table called Movies Rating);
Most of the time, the problem appears first when you want to slice and dice both tables
by one field. Let’s say you want to see the average rating of movies in a year, and also
see the total sales of movies in that year too. Because values are in two different tables,
you need to create a relationship. And for creating the relationship, you do need to have
a table with unique values of that field; otherwise, you might create a many-to-many
relationship, which also might need a both-directional relationship. You get into a lot of
troubles that would not happen if you had shared dimensions.
The process above is itself complicated just between two fields between two tables.
Think that what if you also want to have Movie title as the common slicing and dicing
field. What if there is also a third table? This can easily become a headache, and all of
that frustration happened because we loaded the initial table just as is, without doing
modeling in it. So, it is fair to say again that: There is No Single table of Data. You have
to build a Model for it.

There is No All in One Single table model. You have to Build the
Model for it.

Fact Table
Let’s break our table of Movies Sales into Fact and Dimensions. For a reference, here
is the Movies Sales table again;

Facts are all numeric values that might be aggregated at some stage. For example, facts
in this table are: Lifetime Gross (sales) and Rank can be a fact, and also can be a
dimension. Let’s leave that for now in the fact table. The fact table usually is build
based on the existing main table that we have, so we can call the Movies Sales table
as FactMoviesSales.
FactMoviesSales will have these facts in it:
Lifetime Gross
Rank
Dimension Tables
On the other hand, there are fields that we use as Slicer, filter, or axis of our visuals,
such as Year and the Title of the movie. Now that you know what descriptive fields are,
you can derive dimensions from it. However, not all of those fields into one single
dimension. Not every single field into their own dimension too. Both are wrong. You
should find out which fields together can build a dimension.
Descriptive Fields are:
Year
Title

If a movie is made in multiple years, then there will be multiple instances of movies for
a year. On the other hand, the year of the production of a movie is part of the movie
attributes anyway. So we need a Dimension table with both of these two fields in it.
Please note that not always you get all descriptive fields combined in one dimension.
This depends on the scenario.
To build the Movie Dimension, you can create another copy of the fact table. This can
be done using the Duplicate (Not reference, because we might need to merge this table
back to the fact table later on).
The new Movie Dimension, which I’m going to just call it Movie, will have only the
two descriptive fields. You can remove all other fields.

You also need to make sure that your dimension table doesn’t have duplicate entries. So
select the combination of all three fields, and then Remove Duplicates.
Note that in Power Query, Remove duplicates is sensitive to the upper case and lower
case, and also characters such as space at the beginning or end of the text values.
After doing all these actions, you have the Movies dimension ready. However, there is
still one more step left.
Linking Dimension(s) and Fact Tables
When you bring multiple tables into Power BI, you will need to create a relationship
between those, and the relationship can only be created based on one field in each table.
In our case, we have three fields to connect to (Title, Studio, and Year). We need to
create a single field for the connection. There are two methods to do that; one is to
create a Key column for the dimension table; this can be an auto number or Index.
Another is to create a concatenated field. Because Power BI, at the end of the day, will
apply compression on the fields, there are not many differences between the two
methods above in this example. However, if you have a really long text field, that the
concatenation would cause some issues, then the key column can be a better approach.
Concatenating fields is easy. So I show you the method of Key Column instead. I add a
new Index column to the Movie table;
This column is going to be what we call as Surrogate Key in the Dimension table. Or
let’s call it MovieKey.
The next step would be bringing the MovieKey into the FactMovieSales table too. For
that, we use a Merge from the FactMovieSales table to Movie table, based on the three
fields of Title, Studio, and Year (in the same order in both tables);
After the Merge, we can fetch the MovieKey from the Movie table to expand.
And then, you can remove the two fields you used for the join from the FactMovieSales
table. This is what your FactMovieSales will look like;
Note that there are a lot of details that I skipped in this example for simplifying the
process. for example, you have to take care of all these;
Making sure you have unique combinations. If you get multiple records from the
Movie table for one row in the fact table, then you need to investigate why, and
you might need to change the combination of fields in the dimension table then.
If there are blank/null values in the MovieKey column, you need to investigate
why it didn’t found a matching row.
When you combine this with other tables, you need to make sure that the Movie
table is maintained as a master list, which is a big topic for itself.

Now you can load these two tables into Power BI and create a relationship between
them based on the MovieKey. As you notice in the below screenshot, MovieKey is
hidden because it is a technical field, and the user doesn’t need to know about it at all.
Make sure the relationship between the fact table and dimension tables are always one-
to-many and single-directional from the dimension table to the fact table.
So here you go, now you have a star schema. It is at the moment just one fact table and
one dimension, but it is a star schema nonetheless.
Adding More Tables
The main benefit of the Star Schema is visible when you add more tables into that. For
example, now if I bring the Movies Rating table, which looks like below:
Then we can connect it to the Movie dimension. However, because the Movie
dimension is not yet a master list and doesn’t have all movies in it, you might need to
merge the list of movies into there and create one master list. I explained a bit about a
process like this in the chapter about the shared dimension in this book.
Bringing the MovieKey in the FactMoviesRating would make the table like the below
screenshot;
Loading this table also into the Power BI will result in another fact table related to the
Movie dimension
You can then add more dimensions and fact tables later into this model, and extend the
model to be always a star schema. Now with the power of star schema and shared
dimensions, I can easily build visualizations like below without the need for bi-
directional or many-to-many relationships or extra calculations.

Summary
Building a star schema design is not an easy job; It will take time. Probably 70 percent
of your time in developing a Power BI solution is spent on building your data model
using Power Query. This chapter was a quick run through some of the principles and
concepts that you have learned in the previous set of chapters in the Back to basics of
Power BI Modeling. As you have noticed, It is not possible to explain every single step
in full detail. It might end up being ten or more chapters for such a small example.
However, I explained the main topics and the order of things you need to follow to build
your model in the right way.
Chapter 19: How to Use Time and Date
Dimensions in a Power BI Model

You have learned about date and time dimensions separately. You may wonder how they
will be used together in a data model. In this chapter, I’ll explain that.
Date Dimension
The date dimension or calendar dimension is one of the most common tables in a
reporting data model. This is a table with one row per date with all different date
attributes.
Time Dimension
Because the date and time combined with each other in a table will make it big, The
time dimension is another table with one record per time span required for analysis.
Time and Date Tables in a Data Model
The time and the date tables should not be related to each other; their relationship
should be made in the fact table. In below, you can see that the fact table’s data in a full
date and time column (not separated);
For connecting this table to the time and date tables, you have to separate date and time,
which you can do that easier in the Power Query Editor.
Separate Date and Time
Go to transform data;

In the Power Query Editor, click on Add Column, and then Under Date, Select Date
Only.
Do similar thing this time with Time, and Time only;

Now that you have the date and time columns separately, you can remove the main date
and time column;
now you can close and apply the Power Query Editor window, and create the
relationship between this table and the Date and Time dimensions;

Having both date and time in one axis


Another question I got in the comments was that using this approach can we have the
date and time in one axis? Yes, you can. You just need to drag them into the axis one
after each other. This forms a hierarchy that you can expand into.
In the example below, I have both date and time from the model above demonstrated in
a single axis;
Chapter 20: Budget vs. Actual: Zero Complexity
Model in Power BI

What if we want to have budgets and actuals in the same data model? What if there are
two tables with different granularity? That can also be solved with a zero complexity
model. In this chapter, I’m going to explain through an example, how two tables with
different granularity (like budget and actual) can be modeled easily using Power BI.
Sample Dataset
The sample dataset used here is the AdventureWorksDW2012.xlsx Excel file.
Sales Star Schema
A fact table is a table that contains the happening of action; it keeps numeric and
aggregatable data in it. An example of a fact table is the FactResllerSales table in the
model below, which is connected to DimProduct, DimDate, DimSalesTerritory, and
DimEmployee.
To create this model;
Get data from AdventureWorksDW2012.xlsx Excel file data source from above
Select all the tables mentioned,
Remove the relationship between DimSalesTerritory and DimEmployee
Set the relationship between FactResellerSales and DimEmployee as Active
Create a relationship between OrderDateKey in the FactResellerSales table
and the DateKey in the DimDate table

The model above is a perfect representative of a star schema model. This model can
easily answer all questions regarding sales. However, it seems difficult to think about
star schema when you want to bring another model with different granularity.
Sales Quota: Different Granularity
For this example, instead of the budget data, I have Sales Quota, which is under a
different granularity. The concept and principles remain the same. We have a table with
a different granularity, which we want to connect to this model. The Sales Quota has the
quota of sales for each employee in each quarter. The granularity of this table is per
employee and per quarter, while the granularity of the sales table is per employee, per
day, and per combination of SalesTerritory and Product also.
If you get the FactSalesQuota from the AdventureWorksDW2012.xlsx Excel file, you
will notice that there is a Date column, and the DateKey, which I have removed in my
example below, because they will be confusing for you to understand the table is
showing quarterly data (we will calculate the date key later on in this chapter)
Here is how the data in the Sales Quota table looks like: (FactSalesQuota in the
AdventureWorksDW2012.xlsx Excel file)

Challenges of Two Different Granularities


If you want to build the star schema for Sales Quota and Sales (similar to Actual vs.
Budget), then the challenge would be;
How to connect The Sales Quote to the date table?
If I create a Quarter dimension, then how to connect the Quarter dimension to
the date dimension? Does this create a snowflake scenario?

I’m going to explain an easy method to solve it, and then it will be a proper star schema
again. Let’s check it out.
Connect SalesQuota to the Date Table
Although the sales quota information is quarterly based, you can still connect it to the
date dimension. This would avoid creating an extra dimension for the quarter and the
snowflake connection between the quarter dimension and the date dimension. The only
thing to consider is that you have to consider a specific date in each quarter as your
default value. For example, we can consider the first day of each quarter as our date
value.
To achieve this purpose, we need to create a column in the SalesQuota table with has
the DateKey in it (the DateKey used in my sample is in this format YYYYMMDD), so It
can be something like 20190101, 20190401, and etc.). This is what we are going to
build in this part:
To create that field, you can easily do these steps:
1. Create month number based on the quarter number
2. Create the first date of that month as a custom column

Your budget data might already be in the month level; in that


case, you can skip this part 1 and jump into part 2.
To create the first month of the quarter from the quarter number, you can follow a simple
calculation of ((<quarter-1>*3)+1), as below:

The calculation above is done in the Power Query Editor.


The second part is to create a DateKey based on it;
To do this part, first, you need to make sure the month number is a two-character length
field. You can do that with three steps: 1. converting the data type to text. 2. adding a
prefix of “0” to this column. 3. Extracting the right two characters of this column.
Note that if your date key is a full date field, then you need to
take a different approach.
Convert the data type to text:
Then add a prefix to the column:

and add zero as the prefix.


Then extract the last two characters

The final result would be:


Now that we’ve got everything needed, the last step is to create a custom column which
is the concatenation of Year, Month, and the day; (Make sure that before this step, you
changed the data type of CalendarYear to Text)

And then change the data type of the DateKey to the Whole Number; you can also
remove these columns: CalendarYear, CalendarQuarter, and First Month of the Quarter
I haven’t explained the details of each step, because your data might not be at quarter
level, and the date key that you use might have a different format. You might even use the
Merge option in Power Query to connect this to an aggregated version of the date
dimension, based on year and quarter, and then fetch the first of each quarter from the
expand column. No matter how you do it, The main point of all these steps is to make
sure you can connect your table to the date dimension without needing an extra
dimension.
Star Schema Everywhere
After adding the DateKey to the Sales Quota table, you can connect it to the date
dimension and also the employee table. As you can see, we still have Star Schema
everywhere. Note that you can have multiple fact tables and multiple star schema’s in
one model.
There is no need for a bi-directional relationship or connecting a dimension to another
dimension. This model can answer your budget vs. actual questions easily!
Sample Analysis
To show you some sample analysis, I fetched from the Sales Quota vs. the Reseller
Sales in the above scenario; I have created some visual reports. All these visuals work
without needing any complicated DAX calculation or bi-directional relationship. All of
that is possible because the data model is set up right!
Simple DAX Measures
I only created two measures for this calculation! very simple measures: Bud vs Act:
Bud vs Act =
var Act=SUM(FactResellerSales[SalesAmount])
var Bud=SUM(FactSalesQuota[SalesAmountQuota])
return
Bud-Act
and Bud vs Act %:
Bud vs Act % =
var Bud=SUM(FactSalesQuota[SalesAmountQuota])
var Act=SUM(FactResellerSales[SalesAmount])
return
DIVIDE(act,Bud)

Sample Reports
And here are my reports:

By Employee:
And the details which are a drill through from the other two pages;

Benefits of this Model


I showed you a fully-star-schema model design for Budget vs. Actual in this chapter.
Benefits of this model are basically all benefits of the star schema, including but not
limited to:
There are no extra relationships. Direct single relationship between each fact
table and each dimension connected to it.
There is no need for a bi-directional relationship.
There is no need for a many-to-many relationship.
There is no need for complicated DAX calculations.
Fast performing model that answers the budget vs. actual requirements.
Improvements
There are still things that can be improved in this model. For example, if you want to
analyze data at a lower granularity than what you have, you need some extra
consideration. For example, if you want to analyze budget vs. actual on a daily basis
(notice that the budget data is quarterly, not daily), then you would need some more
calculations.
Talking about improvements and extensions of the use of star schema leads to the topic
of design patterns in Power BI. And that topic is a big topic, which I am going to write
about in another book.
Summary
Star Schema is not for unrealistic and practice models. It works perfectly in real-world
scenarios. My expertise is to come up with the proper model in all types of scenarios.
When you have the proper model, your reporting solution will work fast, it would be
scalable, and it would not have much difficulty in it.
Chapter 21: One Dimension Filters Another
Dimension in Power BI

I often get this question after I explain about the star-schema. The question is that both
my dimensions are filtering my fact table values, but how can I get one dimension
filtering another dimension? For example, how can I get only the list of products that
have been purchased by this customer? In this chapter, I’ll explain how you can do that
with the star-schema.
Sample dataset
The sample dataset for this example has only three tables below:
DimCustomer and DimProduct both have a one-to-many single directional relationship
to the FactInternetSales. This is a simple star-schema model.
We also have a report like below;

In the report above, with the selection of the FullName (from the DimCustomer table)
slicer, or Color (from the DimProduct table), the table visual gets filtered. But what
about other scenarios?
Only showing Products that this customer purchased
What if your requirement is to filter the Color slicer based on only products that the
selected customer has purchased? In the above report, the Product slicer doesn’t get
filtered by the selection of the Customer slicer.
Why Doesn’t Filtering Happen Automatically?
You may ask why the product slicer doesn’t get filtered? The reason is that the direction
of the relationship is from the Product table to the Fact table, which means only the
product table can filter the fact table’s data, not the other way around.

This is normally the case that in a star-schema, the fact table is in the middle and gets
filtered by all dimensions, but dimensions can’t filter each other.
How to Filter One Dimension By Another?
If you have this requirement that one dimension should get filtered by another
dimension, there are two ways to implement it. One is to use a both-directional
relationship (which I don’t recommend). Another is to use Measures for filtering.
Why not a both-directional relationship?
A both-directional relationship comes with two major problems. The first problem is
the significant performance impact of this. The second problem is that you cannot apply
both-directional relationships all the time; it brings circular reference and ambiguity to
your data model.
Using a Filter Measure
Another option is to use measures as a filter, which is what I called here as Filter
Measures. You can create a measure like this:
Sales = SUM(FactInternetSales[SalesAmount])
This is a very simple measure that calculates the sum of the SalesAmount from the fact
table.
You can use this measure in the Product slicer (Color slicer) as a visual level filter and
set the value to be greater than zero. This means that only show Product Colors that
there are sales for them based on the current selection of other slicers in the report.

This means that now the Customer slicer can filter the product slicer, without having the
both-directional relationship in the data model.
You can use this measure also to filter the customer slicer based on the product slicer if
you want it the other way around.
This method is not just for slicers; it can work for any other visuals too.
If a simple expression like above doesn’t work, you can take one step further and create
a measure using the CrossFilter function such as below;
Filter Product =
CALCULATE(
SUM(FactInternetSales[SalesAmount]),
CROSSFILTER(
FactInternetSales[ProductKey],
DimProduct[ProductKey],
Both)
)
However, using a measure like above normally is not necessary, because as you have
seen, the simple measure works like a charm.
Summary
In summary, the simple trick is to create a measure that calculates a value from the fact
table. Then use that measure as the visual level filter of the visual, and set the filter to
be greater than zero (or what is the threshold of having an activity in the fact table).
Part 4: Calculations
This part talks about how does the calculation fits into the Power BI modeling. Should
the calculation be part of the data preparation and built into the structure of the star
schema, or should it be done in DAX calculation after the modeling? This part takes you
through some examples to understand the concept using real-world examples.
Chapter 22: M or DAX? That is the Question!

What is the main difference between M and DAX? Why can we do a calculated column
in two different places? What are the pros and cons of each? Which one should I use for
creating a profit column? Why I cannot do all of it in only one; DAX or M! Why two
different languages?! Why is the structure of these two so different? If any of these are
your questions, then you need to read this chapter. In this chapter, I’ll go through the
differences between these two languages and explain why, when, where to use each.
What is M?
M is the scripting language behind the scene for Power Query. M is the informal name
of this language. The formal name is Power Query Formula Language! Which is really
long, and even Microsoft refers it to M. M stands for many things, but one of the most
common words of it is Mashup. Which means this language is capable of data mashup
and transformation. M is a functional language. And the structure of M script can be
similar to this:
M is a step by step language structure. Usually (Not always), every line in M script is a
data transformation step. And the step after that will use the result of the previous step.
It is usually easy to follow the structure of the M language for a programmer. Because it
is understandable with programming blocks of Let and In, and some other programming
language features alike.
What is DAX?
DAX is the Data Analysis eXpression language. This is the common language between
SQL Server Analysis Services Tabular, Power BI, and Power Pivot in Excel. DAX is
an expression language, and unlike M, it is very similar to Excel functions. In fact, DAX
has many common functions with Excel. However, DAX is much more powerful than
Excel formula in many ways. Here is an example DAX expression:

DAX calculations are built in a way that makes sense mostly for Excel users. Normally
Excel users are very comfortable with this language. Everything goes through functions.
DAX doesn’t have programming blocks in it and is a combination of function uses,
filters, and expressions.
Example Usage of M
M can be used in many data transformation scenarios. As an example, it can be used to
Pivot or Unpivot Data, or to Group the data based on a number of columns. Here is how
a Pivot/Unpivot can work in Power Query;

Example Usage of DAX?


DAX can be used for many calculations for analyzing data. For example, calculating
Year To Date, Calculating Rolling 12 Months Average, or anything like that. Here is an
example which based on selection criteria in the report and few simple DAX
expressions, we can do a customer retention case with DAX;
Calculated Column Dilemma
The main question of choosing between DAX and M comes from the calculated column
dilemma, in my opinion. You can create many calculated columns in both M or DAX,
and it is confusing; where is the best place to do it, or why there are two different
places to do it?! As an example, you can create a full name that is concatenated of
FirstName and LastName column. You can do that in M, and also in DAX. So this
question comes up that: Why two different places? Which one is best to use? Can we
always use one language?
To answer this question, I would like to use another example; There are many types of
knives, and you can use almost all of them to cut cheese!

reference: http://forkitchen.blogspot.co.nz/2008/10/what-are-different-types-of-kitchen.html
Almost every knife in the above picture can be used for cutting cheese except one of
them! So why are there so many knives for cutting cheese?! The answer is that; these are
not knives for cutting the cheese! Each knife is good for doing one special case. For
cutting bread, the bread knife gives you the best result. For cutting a fillet, you normally
need another type of knife. But as you agree, for some cases (such as cutting the
cheese!), you can use many of these knives. Let’s now go back to the original question;

Why can I create the same calculated column in


DAX or M?
These two languages are built independently. They built in a way that they can handle
most business-related solutions. So, as a result, there are some use cases that both
languages are capable of doing it. As an example, both of these languages can easily be
used to create a concatenated column of two other columns.

Which one is best?


The quick answer is Depends! It depends on the type of usage. If you want to create a
concatenated column, Power Query (M) is a better option in my view, because that is
normally like the ETL part of your BI solution, you can simply build your model and
data sets in a way you like it to be. But if you want to create something like Year To
Date; Obviously you can do that in Power Query or M, but it will be lots of code, and
you have to consider many combinations of possibilities to create a correct result, while
in DAX, you can simply create that with the usage of TotalYTD function. So the answer
is, there is no best language between these two. The type of usage identifies which one
is best. Normally any changes to prepare the data for the model are best to be done in
M, and any analysis calculation on top of the model is best to be done in DAX.

Two Languages for Two different Purposes


There are many programming languages in the world; each language has its own pros
and cons. JavaScript is a language of web scripting, which is very different from
ASP.NET or PHP. The same thing happens here. When M was born, it meant to be a
language for data transformation, and it is still that language. DAX was created to
answer business analysis questions.
What Questions Can DAX Answer?
DAX is the analytical engine in Power BI. It is the best language to answer analytical
questions in which their responses will be different based on the selection criteria in the
report. For example; You might want to calculate the Rolling 12 Months Average of
Sales. It is really hard if you want to calculate that in M, because you have to consider
all different types of possibilities; Rolling 12 months for each product, for every
customer, for every combination and etc. However, if you use a DAX calculation for it,
the analytical engine of DAX takes care of all different combinations selected through
Filter Context in the report.
What Questions Can M Answer?
M is the Data Transformation engine in Power BI. You can use M for doing any data
preparation and data transformation before loading that into your model. Instead of
bringing three tables of DimProduct, DimProductSubcategory, and
DimProductCategory, you can merge them all together in Power Query, and create a
single DimProduct including all columns from these tables, and load that into the model.
Loading all of these into the model and using DAX to relate these with each other means
consuming extra memory for something that is not required to be in the model. M can
simply combine those three tables with each other, and based on the “Step-Based”
operational structure of M; they can be simply used to create a final data set.
As a Power BI Developer, Which Language Is Important to Learn?
Both! With no hesitation! M is your ETL language, and DAX is the analytical language.
You cannot live with only one. If you want to be an expert in Power BI, you should be
an expert in both of these languages. There are some cases that one of the languages will
be used more than the other one. However, you will need a very good understanding of
both languages to understand which one is best for which purpose, and easily can use it
in real-world scenarios.
Chapter 23: Scenarios of Using Calculated
Tables in Power BI

The name speaks for itself; these are tables created by calculation. As these are in-
memory tables, their calculation is based on DAX. There are many benefits of using
Calculated tables, such as using them for role-playing dimensions (for example, having
more than one date dimension in a model). There are some DAX functions and
expressions that return a table as a result, and using them as a table in your model
sometimes is really helpful. For example, you might want to create a table for the top 10
customers and then use that as the main source table in many reports. In this chapter, I’ll
explain to you some use cases of calculated tables.

Role-Playing Dimension
The very first functionality that appears in mind when we talk about Calculated Tables
is the ability to create role-playing dimensions. Role Play dimensions are dimensions
with the same structure and data rows that play different roles in our data model. For
example, the Date Dimension is a generic dimension. However, in a sales transaction
table, you might have more than one date column to relate to the date dimension. In the
example below, we have three date fields in the FactInternetSales table: Order Date,
Ship Date, and Due Date.

The three fields should be related to three different date dimensions (because the
tabular model which Power BI is based on that doesn’t support role-playing dimensions
built-in). So what you can do is just load the date dimension once. Here is the example
date dimension loaded in Power BI Desktop:
Now I can create role-playing dimensions with creating a Calculated table:
This will create a table in memory for me and allows me to write a definition of the
table
The language for table definition is DAX. We want an exact copy of the DimDate table
here. So I can simply use ALL function in DAX as below:

Ship Date = ALL(DimDate)


As soon as I type the expression above and press Enter, I’ll see the result underneath it
as data rows, and also a list of columns in the Fields pane. I’ve created my role
dimension as simple as that. Now I can set up the relationship;

For the relationship above, I have also created a Due Date dimension and renamed the
original DimDate to Order Date.
In-Memory Structure, Less Refresh Time
Calculated table loads into memory, so your Power BI file size will increase. However,
you don’t need to read them again from the external data source. Yes, you can create
multiple views in the source database and connect to them through the Get Data section
with Power Query. However, their data need to populate from the source database
every time a refresh happens (either scheduled or manual).
Without Calculated Tables

Here is an example of three date tables loaded from the external data source:
With Calculated Tables

Here is only one date dimension loaded (for the role-playing dimension example
above):

As you can see, this is much more efficient in terms of reducing the refresh time.
However, the memory consumption would be the same in both methods.
The date dimension was a narrow data set example. You might need role-playing for big
data tables, so Calculated tables will save you a lot of time in refreshing data in such
cases.
DAX Table Functions
There are some DAX functions that return a table. For example, ALL function which I
used in the role-playing sample above. ALL was a simple example of a DAX function
that returns the whole copy of the source table. Let’s have a look at some other
examples and see how it works in other scenarios.
Top 100 Customers as a Calculated Table
There are many examples that a business considers the top 10 or top 20 customers and
filter down the whole dashboard and set of reports only for them. Usually, the main
reason is that the top 10, 20 customers will bring most of the revenue to the business.
Fortunately, there is a TOPN function in DAX, which helps us to build such
calculations. The TOPN function returns a table. With the TOPN, we can choose how
many rows we want in the result set, and the grouping function to be applied (if there is
any) and the aggregation (if there is any).
In this example, I want to show you how to use a calculated table to generate a list of
top 100 customers. As a business requirement, I want to visualize the total revenue from
the top 100 customers and compare it with the total revenue of the whole business.
There might be different ways to calculate and visualize it, but I want to do it with a
calculated table as a sample scenario.
Summarize
Summarize is a DAX function that generates a grouped by list from a table. Summarize
works similar to Group By in T-SQL. So if I want to create a table with CustomerKeys
and their total sales amount, I can write this expression:

Customer Sales = SUMMARIZE(FactInternetSales,FactInternetSales[CustomerKey], "Total Sales",


SUM(FactInternetSales[Total Sales]))

Here are details about parameters I passed in the expression above to Summarize
function:
The first parameter is the source table. The FactInternetSales is the source table
that I want the group by (summarize) operation to be applied to it.
The second parameter is the Group-by column. The CustomerKey in the
FactInternetSales table is the column that I want to use as the key for grouping.
The third parameter is the output column name. I named the output calculated
column name as Total Sales.
The fourth parameter is the output column’s calculation. Here I write the
calculation for the output column, which is simply the sum of the Total Sales
Column.

As a result, I will have a table with CustomerKey and Total Sales.

TOPN
Now that we have a list of customers with their total sales, it is easy to get the top 100
customers. I can simply use the TOPN function like this to create another calculated
table (I could do this example with only one calculated table instead of two, but I only
did it with two tables to help you understand the logic better);
Top 10 Customers = TOPN(100,'Customer Sales','Customer Sales'[Total Sales],DESC)

And expression above means:


The first parameter is the number of rows to return; 100 for the top 100
customers.
The second parameter is the source table. Customer Sales is the source of this
operation which we want to fetch the top 100 customers from it.
The third parameter is the Order-By column. I want to order the table based on
the Total Sales of each Customer.
The fourth parameter is the Order-By expression (ASC, or DESC). To get the
top 100 customers, I have to order it by Total Sales DESC.

and here is the result:


I also renamed the Total Sales column to Top Customer Sales (as you see in the
screenshot above).
Now I can simply build a report in Power BI to show the difference between Total
Sales and Top Customer Sales:
Great, We’ve used calculated tables for getting some insight out of the top 100
customers and compared it with the total.
Limitations
As any other DAX related limitations, calculated tables very first limitation is memory.
You need to have enough memory to use these tables. This limitation is also an
advantage, on the other hand, because the in-memory structure makes these calculations
really fast.
The other limitation which I like to mention at this stage is: Not Inheriting Formatting.
By not inheriting formatting, I mean the calculated table doesn’t inherit format from the
source table. In some complex scenarios where the calculation comes from many tables,
that might not be necessary. But for our simple role-playing example above; If my
original date dimension has some formatting configuration. Such as setting DateKey to a
“Do Not Summarize” or some other configuration, then I would like to see the same in
the calculated table fetched out of this.
The formatting applied to the calculated table columns also will be overwritten after
each change in the DAX expression.
Chapter 24: Measure vs. Calculated Column

What is the difference between a Measure and a Calculated Column? In what situation
should we use each of these? In this chapter, I’m going to explain what is the difference
between DAX Calculated Column and Measure.
Read this if you have any of the questions below
This chapter is written for you if you want to understand the difference between
Calculated Column and Measure in DAX, and have any of the below questions;
What is a Calculated Column?
What is a Measure?
When should I write a calculated column or measure?
What is their difference in Performance?
What are operations that I cannot do with these?
And many other questions about the difference between these two types of
calculations in DAX.
What is a calculated column?
Calculated Column is a column like any other column created in the table. However, the
result of a calculated column is comping from calculating an expression (DAX). The
calculated column leverages a DAX expression that applies to every row in the dataset,
and the result of that will be stored in the new column.
Example: Profit as a calculated column
Consider a table that we have sales and costs information in it. Calculating Profit in
such a table would be simply deducting costs from sales for every row. So this
basically would be a calculated column.

Expression:
Profit = FactInternetSales[SalesAmount] - FactInternetSales[TotalProductCost]

Row by Row Calculation: Row Context


One of the very important concepts about the calculation that you apply in Calculated
Column (In the majority of the cases, not always); is that the calculation in one row at a
time, or in other words, row by row calculation. In below table; you can see the
calculation result for every row stored into the new column;
Row by row calculation called Row Context in DAX terminologies.
Stored in Memory
Calculated Column stores the values in the memory, like any other columns in the table.
The calculation happens at Refresh time, and the result will be stored in the memory.
This means that the more calculated columns you have, the more memory consumption
you will end up with, and your refresh time will be longer as well. However, usually,
calculations are simple, so your refresh time might not be affected too much.
Calculated Column highlights
Based on the above explanations, here are highlights of a calculated column;
Row by row calculation: Row Context (usually, not always)
Stored in the memory (consumes RAM)
calculated at the time of refreshing the report (either scheduled basis, or
manual)
What is a Measure?
A measure is usually a calculation that works on an aggregated level basis. This
aggregation can be as simple as a sum of sales or can be a little bit more complex, such
as calculating monthly average sales in a rolling 12 months period. Measures have a
dynamic nature; they affect a subset of data from one or more tables. Hence, the subset
of data can be changed through the filters applied in the Power BI Report; then, the
calculation will have to be evaluated dynamically. So Measures are not pre-calculated;
they will be calculated on the fly when adding it in the report.
Example: Sum of Sales
Measures are usually aggregations. A very simple aggregation we can use as an
example is a sum of sales.

Aggregation can be done with a number of functions in DAX, such as Sum, SumX,
Average, Calculate, and heaps of other aggregation functions. Now, let’s answer the
most important question:
How to see the Value of the Measure?
Measures are calculated on the fly. This is, in fact, one of the most conceptual
differences between a measure and a calculated column. Okay, measure values are
calculated on the fly, so how can you see the value?! The answer is by putting that into a
report!
If I drag the measure above in a report as a card visual, then I would get a result;
When there is no filter applied in the report, this will return the grand total of sales,
$29.36M. However, if I add a slicer in the report, and select a value in it, I’ll see a
different result;

Now the measure calculation only shows me the sum of sales for the year 2007, which
is $9.79M.
Filter Context
Measure evaluates on the fly. If there is a slicer value for 2007, then the calculation will
be done on the subset of data, which is for 2007. If there is a table in visualization
somewhere that slice and dice data by the Education category, the result of the measure
will take that into account as well. We can then say this;
Measure evaluates the value based on the subset of data selected
by filters, slicers, or slicing and dicing components of visuals in
the report. This filtered dataset, called Filter Context.

Filter Context basically is a combination of all filters that affect the calculation of
measure. There is much more to talk about when we say filter context. However, this
should be enough to understand the rest of this chapter.
Measures do not consume RAM; they consume CPU
Based on what you’ve learned above, measure calculation is done on the fly. This
means to measure value is not stored in the memory. The measure will not consume
Memory or RAM at all. On the other hand, Measures consume CPU because their
calculation should be done right at the time of visualizing it. If you change a filter or
slicer, the calculation should be done again. Because the response time should be fast,
then this calculation happens by CPU.
What is the side effect?
If you have many measures in your report, and their calculation is also complex
calculation, then with changing every filter or slicer, you end up with a lot of rounding
circles, which shows CPU is desperately working hard to calculate all values.
Measures highlights
Based on the above explanations, here are highlights of a Measure;
Calculated based on all filters: Filter Context (usually, not always)
Is not Stored and is not pre-calculated
Calculated on the Fly when you put it on a report page when you change a
slicer, filter, or click on a column chart or any other visual to highlight, and it
affects this measure’s value.
Measure consumes the CPU for calculation.
When to use Measure, and when to use Calculated Column
Now that you know about these two types of calculations, we come to the critical
question: When to use which? Do you need a Measure for your calculation or
Calculated Column? The answer to this question is depends on what you want to
calculate? This is an important question that you should be asking yourself when you
want to create a new calculation:

Is the calculation row by row? Or is it an aggregation? Is it


going to be affected by filter criteria in the report?

If the calculation is a row-by-row calculation (example: Profit = Sales – Cost, or Full


name = First Name & ” ” & Last Name), then Calculated Column is what you need.
If the calculation is an aggregation or it is going to be affected by filter criteria in the
report (example: Sum of Sales = Sum(Sales), or Sales Year to Date =
TotalYTD(XYZ)), then Measure is your friend.
Let’s go through some examples;
Example 1: Calculating the age of customers
The age of customers does not change based on filters! It is only dependent on one thing;
the birthdate of the customer. And in the customer table, usually, you have the birthdate
as a field. So this calculation can be simply a calculated column, which evaluates row
by row for every customer.
Example 2: Calculating Sales Year to Date
Year to date calculation depends on the filter criteria in the report, and also it is an
aggregation. It becomes very complicated to calculate year to date for all variations of
fields (per day, per month, per customer, per product, and etc.). So this needs to be a
Measure.
Every time you put this measure into a report, it calculates based on the filter criteria of
the report;

Calculated Column or Power Query?


When it comes to calculating row by row, then Power Query is a better option in the
majority of the cases. I have explained before what is M or DAX, and what are
scenarios that you need to use each. Calculated Columns (in the majority of the cases,
not always) can be implemented by Power Query.
Measure: The Hidden Gem of DAX
You can do a Calculated column in the majority of the cases in Power Query as well,
and in fact, it is much better to do that in Power Query in those cases. This means the
hidden gem of DAX is Measure. Measure calculation is dynamic, on the fly, and based
on filters applied in the report. The dynamic nature of measure calculation makes it the
invincible feature of DAX or Power BI. You have seen in the above calculation that
Year to Date value is showed by month. If you simply bring Day value in the table, then
this calculation will evaluate on a daily basis and works still perfectly fine;
If you do it on a quarter level, the year to date calculation evaluates on the quarter level;

This Dynamic nature of Measure calculation in DAX is something that you cannot find
in many tools. That is why Measures are so commonly used in DAX. In fact, 70 percent
of your time when you write DAX is used for writing measures, if not more!
Summary: Calculated Column vs. Measure in a nutshell
Let’s wrap up it all and go through a comparison of these two types of calculations in
DAX;
Part 5: Better Data Model

You can take your Power BI data model to the next level if you leverage other
components of the Power BI ecosystem. In this module, you will learn how these
components can make your model even better.
Chapter 25: Move Your Shared Tables to
Dataflow; Build a Consistent Table in Power BI

In this chapter, I’m going to explain the best practice for creating shared tables in detail.
The method described in this chapter ensures that the solution is keeping a single
instance, the one version of the truth, for your Power Query table through Dataflow, and
build a consistent table in your Power BI solutions. Part of the content of this chapter is
copied from Microsoft’s documentation article that I wrote some time ago.
What is Power BI Dataflow?
Dataflow is Power Query scripts running on the cloud and storing the result in Azure
Data Lake storage. The output of the dataflow is not dependent on a Power BI dataset
and can be stored separately.
Shared Table in Power Query
It often happens that you work with multiple Power BI (*.pbix) files at your day to day
job, and it also happens that you might need one table to be available in two or even
more separate files. Customer, Date, Product, Vendor, and etc. are all examples of
tables that might be needed in multiple files. You might use the Product table in the
“Sales.pbix” file for sales analysis, and also in the Inventory.pbix file for warehouse
inventory analysis. The diagram below shows a Date table that is used in two different
*.pbix files;
The date dimension is not the only shared table you may have. Product, Customer, Sales
transactions, Vendor, Warehouse, and many other tables might be needed in multiple
files, and as a result, can be shared tables. You need to know the right way to handle
these tables.

Any Power Query table that you may use in multiple *.pbix file is
a shared table.

Bad Way: Copying


Shared tables between multiple files happen, and it is important how you deal with
them. There is a bad way and a good way. Let’s check the bad way first;
What do you do in those scenarios? Probably copying the Power Query M script from
the Advanced Editor of one of the files, and then pasting it into the other file as a new
blank query. It works, but it brings a whole lot of complications and issues with it. In
the example below, I have this table in my Sales.pbix file, and the table is called
DimProduct.

DimProduct already includes a lot of applied steps.


Now I want to have the same DimProduct in the Inventory.pbix file. What I will do is to
copy the M script from Advanced Editor of one file, and then paste it into the other file!

It is easy, but here come challenges;


What if you want to edit transformations on this table after some
time?
You need to go to one of these (which you may call it the original), apply changes, then
copy and paste again into the other file!
What if there are more than two files?
What if the DimProduct is needed in not only Sales.pbix and Inventory.pbix, but also in
BOM.pbix, Customer Retention.pbix, and many other files. Do you copy it across all
these? And you won’t miss anything?
There is no Single Version of the Truth for this Code
As soon as you create copies, you will lose the single version of the truth. Now your
code is everywhere, and you have to maintain it everywhere! There is no single place
that you change the code, and all references get fixed automatically. There is no single
version of the truth for your code. As a result, the maintenance of this code is very high.
Good Way: Maintain a Single Table
Now that you know some of the issues of the bad way of copying let’s talk about a good
way of implementing it. A good way of implementing this would be keeping a single
copy of that shared table, and any other shared tables. When you keep one single copy,
then that would be the source of all other tables. When you change that, all other tables
will change automatically. Dataflow is not the only way that gives you a single copy.
you can do it using other methods too;
Run a separate ETL (extract, data transformation, and load) process and storing
the output in a shared data source. For example, using ADF or SSIS to run
transformations, and then store the output in a SQL Server database.
Create a Power Query Custom Connector, which does all the transformation on
the table and creates it ready for you.
Use Dataflow.

The first two methods are the BI developer methods of doing things, and the fact is that
probably 90 percent of Power BI users are not BI developers. On the other hand, some
of them involve other tools and services (such as SQL Server, ADF, SSIS), which you
might not have. Dataflow, however, is a perfect method, that despite being a good
solution, it is also easy to implement for even a business user with a small knowledge
of Power Query. And that is the method we are talking about here.
Dataflow for Shared Table
The highlight of the process is that dataflow will run the Power Query script, and it will
store the output in Azure Data Lake storage, and then using Get Data from Dataflow in
Power BI, we get data sourced from that single table. You can use that table then in
multiple *.pbix files, as much as needed. If you want to change the transformation, there
is only one place to change; the dataflow script of that table. This would be a consistent,
low-maintenance, and robust solution.

Now let’s see how you can do that in action;


Create the dataflow with the shared Table
If you already have queries in Power Query, either in Power BI Desktop or in Excel,
you might want to migrate the queries into dataflows. The migration process is simple
and straightforward. In this article, you'll learn the steps to do so.
Copy queries from the Desktop
Follow these steps to copy queries from Power Query in the desktop tools (such as
Excel or Power BI):
1. In Power BI Desktop, open Power Query Editor by selecting Transform Data in
the Home ribbon.
2. In Excel, this option is under Data > Get Data > Launch Power Query Editor.

If you have folders


If you've organized your queries into folders (which are called groups in Power Query),
then copy them using the method below.
In the Queries pane, select the folders you want to migrate to the dataflow by holding
the Ctrl key on the keyboard, and then the mouse-left-click. Once you've selected all the
folders, copy them all using Ctrl+C.

If you don't have folders


If you aren't using folders in the Power Query, then you can select queries using the
same approach of holding the Ctrl key on the keyboard, and then the mouse-left click.
Once you've selected all the queries, copy them all using Ctrl+C.
Paste the copied queries into a dataflow
1. Create a dataflow if you don't have one already.

Choose to create a dataflow with adding new entities:


If you don’t have a query already, start with a blank query

Just click on continue to get to the Edit Queries page.


2. Paste the copied folders or queries in the Queries pane of the dataflow's Power
Query Editor using Ctrl+V.
The image below shows an example of copied folders.

Connect the on-premises data gateway


If your data source is an on-premises source, then you need to perform an extra step.
Examples of on-premises sources can be Excel files in a shared folder in a local
domain, or a SQL Server database hosted in an on-premises server.
A dataflow, as a cloud-based service, requires the on-premises data gateway to connect
to the on-premises data source. If the source is an on-premises source, you
should install and configure the gateway for that source system, and then add the data
source for it. Read here to learn all about the on-premises
gateway[https://radacad.com/the-power-bi-gateway-all-you-need-to-know]. Once
you've completed these steps, you can select the on-premises data gateway when
creating the entity in the dataflow.

The gateway isn't needed for data sources residing in the cloud, such as an Azure SQL
database.
Configure connection
In the next step, configure the connection to the data source using the Configure
connection option, enter credentials, or anything else needed to connect to the data
source at this stage.

Verification
If you've done all the steps successfully, you should see a preview of the data in the
Power Query Editor.
Some Power Query Desktop functions require a gateway in Power
Query Online
Some of the functions might require a gateway, even if their source is not on-premises.
Among these are functions such as Web.BrowserContents and Web.Page. If this happens, you
might get an error message indicating which specific function isn't supported. The figure
below shows an example of one of these scenarios.

If a scenario like this happens, you have two options. You can set up the gateway for
that data source, or you need to update the query in the dataflow's Power Query editor
using a set of steps that are supported without the need for the gateway.
Refresh the dataflow entities
After migrating your queries to the dataflow, You must refresh the dataflow to get data
loaded into these entities. You can refresh a dataflow manually or configure an
automatic refresh based on a schedule of your choice.

Get Data from Power Query Desktop


You can now get data from dataflow entities in Power BI Desktop using the dataflow or
Common Data Services connectors (depending on what type of dataflow you're using,
analytical or standard).
Then set up the connection to the dataflow using Power BI account, and then select the
table;
You can do this process in as many as Power BI files you have. They will be all
referencing to the dataflow table.
What are the benefits of this method?
Using Dataflow is much easier than services such as Azure Data Factory, or
tools such as SSIS. You don’t need to be a developer to implement this.
You don’t need any extra licenses or services. It is all part of your Power BI
Pro account. (for some specific types of entities [https://radacad.com/linked-
entities-and-computed-entities-dataflows-in-power-bi-part-4], you might need
premium. However, here is a workaround [https://radacad.com/workaround-
for-computed-entity-in-power-bi-pro-dataflow-in-power-bi]).
You have a single version of the truth for your shared table in the dataflow. Any
changes can be made in the dataflow script now.
The transformation step is now separated from your *.pbix data refresh, and this
can make the refresh faster [https://radacad.com/how-to-use-dataflow-to-make-
the-refresh-of-power-bi-solution-faster].
You can use the table generated by the dataflow in *.pbix files in other
workspaces too.
Summary
Dataflows have many benefits, I have written only about a few of those so far in my
chapters. This chapter was focusing on creating a consistent transformation code using
Power BI dataflows. You learned a low-maintenance way of creating shared tables
between multiple Power BI files. The method explained here doesn’t need any extra
licenses or services, and is easy enough to be done by anyone with a little bit of Power
Query knowledge. I strongly recommend you to read more about dataflows in my article
here [https://radacad.com/what-are-the-use-cases-of-dataflow-for-you-in-
power-bi].
Chapter 26: Power BI Shared Datasets: What is
it? How does it work? And Why should you care?

In this chapter, you will learn about:


What is a shared dataset in Power BI?
How can the shared dataset help in Power BI development?
Where is the place of a shared dataset in the Power BI architecture?
How shared dataset works behind the scene in the Power BI service?
What are Certified and Promoted Datasets?
What is the Dataset in Power BI?
When you create a Power BI report (or let’s call it a *.PBIX file), the report has two
components (if the data connection mode [https://radacad.com/directquery-live-
connection-or-import-data-tough-decision] is import data
[https://radacad.com/connection-types-in-power-bi-import-data-or-scheduled-refresh-
a-detailed-look-inside-and-out]); A report and a dataset. When you are in the
environment of Power BI Desktop, you can’t see the separation that easy unless you go
to the task manager and see the dataset running behind the scene under the Power BI
Desktop task threads.
However, when you publish the PBIX file into the service (the Power BI website), you
can easily see that there are two objects; A report and a Dataset.
The report is the visualization layer of your Power BI implementation
The dataset includes the data, tables, relationships, calculations, and
connection to the data source.

You can schedule the refresh for the dataset and connect to on-premises sources
(through a gateway) or cloud-based sources.

What is a Shared Dataset?


Now that you know about the dataset let’s talk about the Shared Dataset. A shared
dataset is a dataset shared between multiple reports. For a long time, you could create a
new report from an existing dataset through the Power BI website. This feature has been
available from the early days of Power BI;

About April 2017, The ability to create a report from Power BI Desktop that can point
at an existing dataset, and has a live connection to the existing dataset, became
available.

Those days, it was called; Get Data from Power BI service. Nowadays, this feature is
renamed as getting Data from Power BI Dataset;
A shared dataset is a dataset that is shared between multiple reports. Multiple reports
are connecting to one shared dataset. When that dataset gets refreshed, all of those
reports will have the new data. A shared dataset is one step closer to the multi-
developer tenant in the Power BI [https://radacad.com/power-bi-architecture-for-multi-
developer-tenant-using-dataflows-and-shared-datasets] environment.
Sharing Datasets Across Multiple Workspaces
For a long time, sharing datasets was only possible inside a workspace. You could not
use a dataset from workspace one as the source for a report in workspace 2.
However, recently, the feature became available, and you can share the dataset even
across multiple workspaces. This is an amazing update and changes the way that Power
BI Development works in the future.
When you get data from a Power BI dataset through the Power BI Desktop, you have the
option to select which dataset you want to get data from;
How does Shared Dataset work behind the scene?
When you share a dataset in the same workspace, everything is clear. You have one
dataset to schedule refresh, and multiple reports connected to it. However, when you
use a dataset shared from another workspace, you get something that might look a bit
different.
Linked Dataset
When you get data from a Power BI dataset, which is workspace 1, and then save your
report in workspace 2, you get something like a copy of your dataset in the workspace
2. This might look strange! You might say, this is not a shared dataset; it is a copied
dataset. The fact is that what you see is just a link. Power BI will bring a link to that
dataset into the new workspace; this link helps you to understand when the dataset gets
refreshed last time.
Here is what a linked dataset looks like, and you can see the difference between that
with normal datasets.
You cannot manually refresh or refresh based on the schedule on a linked dataset. The
refresh action can only be configured in the main dataset. The linked dataset is just a
link, showing you when was the last date and time for the refresh, and an easier way to
generate more reports from that dataset.

A linked dataset is not a copy; It is a link to the original dataset.

Certified and Promoted Datasets


When Power BI developers use the function of Getting data from the Power BI dataset,
They see all datasets from all workspaces that they have access to. This might be a bit
confusing. There might be tons of datasets shared in the environment. The developer
ends up with questions such as: Which of these is the one I can use? Which of these are
valid to use? Which of these are reconciled and tested? Which of these are reliable to
use? And etc.
A labeling system is available to the Power BI datasets (and Power BI dataflows), that
helps in this scenario. You can mark some of the datasets as certified and as promoted.
To set a dataset to a certified state, an approval process that can assure the dataset
passed some of the tests. You can clarify through this labeling system, that what are
datasets good to be used as the source, and what is not. You can build the concept of
Gold, Silver, and Bronze datasets. Having gold datasets as datasets that are fully tested
and reconciled, and then down to other levels, where the Bronze dataset is a dataset that
hasn’t been through any testing yet.
To use this labeling system, the creator of the dataset can go to the setting of the dataset;
In the settings, you can set the Endorsement level as below;
As you can see, the Certified option might not be available. The Power BI tenant
administrator has the authority to enable that labeling, and give access to whom needed
in the Tenant Settings;

The labeling system helps Power BI developers to then see what the level of
certification that a dataset has to be used as a shared dataset is, and then can select
based on that respectively;
Shared Dataset in the Power BI Architecture
I have another book about the architecture of Power BI. in one chapter specifically, I
explained how Dataflow and Shared datasets could play an important role in the multi-
developer tenant of Power BI implementation [https://radacad.com/power-bi-
architecture-for-multi-developer-tenant-using-dataflows-and-shared-datasets].
In a nutshell, using the Dataflow makes sure that you can bring the data well prepared in
a central area, which you can call it a centralized data warehouse in the Azure Data
Lake. And using the shared datasets, you can build data marts that can be used by
multiple reports. Here is how the architecture works in the diagram view;

Instead of having silos of Power BI reports and files everywhere, You can build an
architecture that works best with multiple developers, less redundancy in the data, in the
code, and the logic, and easier maintenance approach. I highly recommend reading the
article mentioned above to learn more about this architecture and learning how the
shared dataset is located in this architecture as a key element.
Summary
Using a shared dataset, you can have centralized data models (data marts) that can serve
multiple reports. You can reduce the maintenance time, the redundancy of the code, and
the data through this approach. Having the labeling system of the certified or promoted
dataset is also a great way of putting some process and governance in place to make
sure the shared datasets has been through some process of testing and reconciling.
Chapter 27: Reduce the size of the Power BI file
The size of a Power BI file can be an important factor in the performance of the model.
In this chapter, you will learn a few ways to check what tables and columns are making
your Power BI file big and what to do about them.
Power BI Helper to determine the big columns
Doing the proper modeling is always a challenge in Power BI, especially when you are
working on a model for a while. One of the aspects of modeling is to make sure you do
not bring very heavy memory-intensive columns. We used DMVs to find out the amount
of memory that each column occupies in the model.
Here is a Power BI model which has the size of …

We can open it through Power BI Helper [https://radacad.com/power-bi-


helper], and in the Modeling Advice tab, there is a new section that tells us how much
space each column consumes:
It is clear that the columns Prefix and Sales ID in the above list are the most expensive
columns. And obviously, this is because of the cardinality of that. If we remove this
column (considering that it is not required in the model), then here is the result after
saving the *.pbix file:

Wow! from 107 MB down to 11MB, saving of 90% memory size just with removing
heavy but unnecessary columns.
This very simple feature can give you an idea about expensive columns for the
compression engine of Power BI, and then you can control the model based on that.
Chapter 28: Design Tip for Power BI: Be Careful
of Dates

I have written about Date Dimension in previous chapters, and it makes sense now to
explain one of the most common performance challenges that I see in some of the
models. Sometimes, your model size grows significantly big, and you have no idea
why!? In this chapter, I will show you one of the performance issues which might cause
because of specific date values and how to fix it.
Sample Dataset
I have a sample dataset, with one table and three rows! As below;
This is not a big table at all, as you can see. However, when I save the *.pbix file, the
size of the file seems a bit crazy!

Yes, you saw it right. You can try it yourself, and you would probably get the same
experience. Now let’s see why this happens and how to fix it.
Why? Investigating Using Power BI Helper
If you have a large *.pbix file, you can investigate what are the columns and tables,
causing the highest storage consumption, using Power BI Helper. You can download
Power BI Helper for free from here [https://radacad.com/power-bi-helper]. I opened
the file above in Power BI Helper, and in the Modeling Advice tab, this is what I see:
As you can see in the above output, the Date field in the Date table is the biggest column
in this dataset. Taking 150MB runtime memory! This is considering that we have only
three distinct values in the column! It seems a bit strange, isn’t it? Let’s dig into the
reason more in deep.
Why? Default Date Dimension
Power BI creates a default date dimension for every single date field in your dataset.
Values in that date table would be from the minimum date value in that field (column) to
the maximum date value, with one row for each day. There is an option to disable the
default date dimension if you want and create your own date dimension. However,
having the default date dimension will make many of the date calculations simpler and
easier, and that is why many people are using it that way.
There are a few parts of the previous paragraph that you need to read more carefully:

Power BI creates a default date dimension for every single date


field in your dataset.

This now means that the Date field in our sample table is not just a field, but it is a
table, which we call the default date dimension table.

Values in that date table would be from the minimum date value
in that field (column) to the maximum date value, with one row
for each day.

The minimum value in that column is the 1st of Jan 1990, and the maximum value is 31st
of Dec 9999! This means that in the date dimension, we will have all days (one day per
row) from 1st of Jan 1990 to 31st of Dec 9999. That is about 8,009 years, which makes
almost 3 Million rows in the default date table. The date column of that table, will have
3 million distinct values, and that is why we have such a big running memory for this
column;
Now that you know why the file size is so big let’s see how we can fix it with a simple
solution.
Solution: Not Applicable Date Value
The 31st of Dec of 9999 makes no sense when it comes to date calculations. This value
is probably coming from an operational system that puts the default value of the dates (if
not entered, or to show a far date which is inaccessible) using a date which is thousands
of years in the future.
If you are using the default date dimension, I do recommend not to use this approach and
replace any value like that with null. You can do that in Power Query with a simple
Replace Value option. Here is how to do it:
Open Edit Queries, to go to Power Query Editor window;
Then right-click on the column, and do the Replace Values
Replace the 31st of Dec 9999 with null;

Then click on Close and Apply in the Power Query Editor window. Our data table now
looks like below (null values will be shown as blank in Power BI)

And now, if we save the *.pbix file again. This is a new size:

The file size shrank from 26MB to 66KB! That is almost 400 times smaller! All of that
with such simple change. And now all columns are small in the in-memory storage
consumption;
Summary
Thanks to Rui for giving me the idea to write about this as a blog chapter. As you have
seen, the default date table doesn’t know that the furthest date value you have in 9999 is
just a “not applicable” value. The default date table will consider that as your biggest
date value and build a whole table based on that, which may cause performance issues
like what you have seen in this example. Change your “not applicable” date-values to
null to fix it very simply.
Book wrap up

Congratulations on finishing the book. I like to thank you for taking this step towards
having a better data model in Power BI. Although this book is finished, the journey
towards a better data model continues.
To leverage the learnings from this book, I encourage you to start applying the practices
you learned right away in your Power BI implementations. If you feel concerned or have
a question about a particular scenario, feel free to reach out to me directly using
RADACAD [https://radacad.com] website, I’d be more than happy to look into your
question.
Wishing you the best
Reza Rad
September 2020
Other books from Reza Rad
Power BI from Rookie to Rock Star
This is a series of four books, over 1200 pages, available for free to download from
here [https://radacad.com/online-book-power-bi-from-rookie-to-rockstar].

Row-Level Security in Power BI


Here [https://www.amazon.com/Row-Level-Security-Power-BI-different-
ebook/dp/B082SFR2J4]
Pro Power BI Architecture
Here [https://www.amazon.com/Pro-Power-Architecture-Deployment-Microsoft-
ebook/dp/B07KQZ1ZDR]

You might also like