Module 1 - BI Introduction
Module 1 - BI Introduction
2 Business Statistics
3 Descriptive Analytics
4 Diagnostics Analytics
5 Data Visualization
01 02 03 04
BI Terminology Business Intelligence in Corporates Power Query Data Preparation
1. Data issues
1. Self-service BI & Analytics 1. Get Data
1. BI vs BA 2. Analytics structure and Coordination Model 2. Data formatting
2. Profile Data
2. Technology in BI 3. BI Success (Organization) 3. Data blending
4. BI Success (Individual) 3. Transform Data
3. Data, Analysis, Analytics 4. Data Preparation Checklist
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations?
7. BI process
8. Decision making with BI
9. Decision Bias
Lesson 1: Business Intelligence Terminology
Business Intelligence & Business Analytics
Lesson 1: Business Intelligence Terminology
Business Intelligence & Business Analytics
Lesson 1: Business Intelligence Terminology
Technology in Business Intelligence
Lesson 1: Business Intelligence Terminology
Technology in Business Intelligence
Lesson 1: Business Intelligence Terminology
Data, Analysis, Analytics
5 months ago, Bank ABC decreased totally Top 4 reasons due to Attrition in Bank:
10.200 bio. VND of Loan portfolio in Attrition (1) Dissatisfaction about services (50%)
(2) Lower rate in another banks (30%)
(Ending Loan portfolio = Beginning Loan + (3) Change another loan package in the bank(10%)
(4) Death (10%)
New loan – Attrition - Maturity)
Lesson 1: Business Intelligence Terminology
Understanding Data – Categories of Data
Lesson 1: Business Intelligence Terminology
Understanding Data – Structures of Data
String data can be Numeric data are Date/time contains a The Boolean type is Images
declared in a number numbers which can specific date, or a sometimes also called Maps
of different ways be whole numbers, combination of both a logical type and is a Report objects
depending on the such as Integers or date and time conditional flag Sound
character set required numbers with decimal representing either
and the anticipated places true or false
length of the string: Byte
any kind of Integer
characters, Fixed Decimal
alphanumeric, Float
including symbols. Double
Lesson 1: Business Intelligence Terminology
Understanding Data – Data Types Exercise
Lesson 1: Business Intelligence Terminology
Understanding Data – Data Types (Excel vs DAX)
Whole Number A 64 bit (eight-bytes) Numbers that have no decimal places. Integers can be positive or negative numbers, but must be
integer value 1, 2 whole numbers between -9,223,372,036,854,775,808 (-2^63) and 9,223,372,036,854,775,807 (2^63-1).
Decimal Number A 64 bit (eight-bytes) Real numbers are numbers that can have decimal places. Real numbers cover a wide range of
real number 1, 2 values:
Negative values from -1.79E +308 through -2.23E -308
Zero
Positive values from 2.23E -308 through 1.79E + 308
However, the number of significant digits is limited to 15 decimal digits.
TRUE/FALSE Boolean Either a True or False value.
Text String A Unicode character data string. Can be strings, numbers or dates represented in a text format.
Maximum string length is 268,435,456 Unicode characters (256 mega characters) or 536,870,912
bytes.
N/A Blank A blank is a data type in DAX that represents and replaces SQL nulls. You can create a blank by
using the BLANK function, and test for blanks by using the logical function, ISBLANK.
1. BUSSINESS INTELLIGENCE INTRODUCTION
01 02 03 04
BI Terminology Business Intelligence in Corporates Power Query Data Preparation
1. Data issues
1. Self-service BI & Analytics 1. Get Data
1. BI vs BA 2. Analytics structure and Coordination Model 2. Data formatting
2. Profile Data
2. Technology in BI 3. BI Success (Organization) 3. Data blending
4. BI Success (Individual) 3. Transform Data
3. Data, Analysis, Analytics 4. Data Preparation Checklist
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations?
7. BI process
8. Decision making with BI
9. Decision Bias
Lesson 2: Business Intelligence in Corporates
Self-service BI & Analytics
ORGANIZATIONS EXIST TO
CREATE VALUE
ORGANIZATIONS HAVE TO BE
Understanding
Agile BI
and turning it into action in order to
achieve a desired business outcome. Agile BI (speed-to-value)
Now, more than ever, business leaders
Data must be relevant need access to the right information at
Information must be meaningful the right time in order to act before
Insight must be actionable decision windows close.
Describing
Defining
Analytical Organizations
Functions: Roles:
– Reporting – Database Analysts
– Ad-hoc Analytics – Data Analysts
– Modeling – Modelers
– Data Scientists
– Etc.
Lesson 2: Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Analytical Organizations
How centralized or decentralized should these
organizations be?
Centralized
Functions
Collaboration
Lesson 2: Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Analytical Organizations –
In a Centralized model, a set of analytical
Centralized activities are accomplished through a
central clearinghouse
Example: An enterprise analytics team
serves the needs of marketing, finance,
operations, customer care, etc. with
respect to reporting, ad-hoc analysis, and
statistical modeling
Key Advantages: Key Disadvantages
– Consistency – Responsiveness
– Optimal management of – Lack of context / expertise
bandwidth & focus on can limit effectiveness in more
enterprise priorities complex tasks
– Maximum efficiency in low- – Requires large group and
level tasks consistent overall loading
Lesson 2: Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Key Disadvantages
Key Advantages:
– Lack of consistency in methods and
– Extremely responsive
sources
– High degree of context and
– Effort & data duplication very likely
expertise attained
– Requires largest overall resources, and
– Efficient localized used of
can be expensive, esp. when
contracting resources
contractors used
Lesson 2: Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Board of director
Planning &
Marketing Analytics Strategy Financial
Analytics Quality Control Sale analytics R&D analytics IT analytics
analytics Manager analytics analytics
Goal analytics
Expanded Marketing
QC Analyst Business Planning Financial
Team Sales Analyst R&D Specialist IT Specialist
Analyst Analyst specialist Analyst
Advanced
Analytics Result Analytis Analytics Result
Specialist
Data engineer
Core Team
Insights Data
Focus
(Declutter) Storytelling
Lesson 2: Business Intelligence in Corporates
7. BI process
Lesson 2: Business Intelligence in Corporates
8. Decision Making with BI
Lesson 2: Business Intelligence in Corporates
9. Decision Bias
There are three main cognitive traps that regularly skew decision making:
• The Confirmation Trap
• The Overconfidence Trap
• The Overfitting Trap
Lesson 2: Business Intelligence in Corporates
9. Decision Bias
Common The Confirmation Trap The Overconfidence Trap The Overfitting Trap
traps
Mistakes This is a sobering prospect for decision makers: • More information=> increase overconfidence without Overfit models generally do a
• Pay more attention to findings that align with improving accuracy (More data in and of itself is not a suspiciously good job of explaining many
our prior beliefs guaranteed solution) nuances of what happened in the past,
• Ignore other facts/ patterns in the data • It prevents us from questioning our methods, but they have great difficulty predicting
motivation, and the way we communicate our findings the future
to others
• It makes it easy to underinvest in data and analysis;
when we feel too confident in our understanding, we
don’t spend enough time or money acquiring more
information or running further analyses.
Tips • Specify in advance the data and analytical • Compare your perfect experiment ideal with your In order to overcome this bias, you need
approaches on which you’ll base your actual data to understand where it might fall short. to discern between the data that matters
decision Identify places where you might be able to close the and the noise around it.
• Actively look for findings that disprove your gap with more data collection or analytical • Split data into train/validation/test
beliefs Techniques • Look for relationships that measure
• Don’t automatically dismiss findings that fall • Asking yourself why your analysis might be wrong, and important effects related to clear and
below your threshold for statistical or practical recommends that you do this for every analysis you logical hypotheses before digging into
significance. perform nuances.
• Assign multiple independent teams to analyze • Ask others with knowledge about the project to • Construct alternative narratives
the data separately, to determine whether imagine its failure a year into the future and to write
the differences are due to error, inconsistent stories of that failure. In doing so, you’ll benefit from the
methods, or bias. wisdom of multiple perspectives, while also providing
• Treat your findings like predictions, and test an opportunity to bring to the surface potential flaws in
them the analysis that you may otherwise overlook.
• Keep track of your predictions and systematically
compare them with what actually happens.
1. BUSSINESS INTELLIGENCE INTRODUCTION
01 02 03 04
BI Terminology Business Intelligence in Corporates Power Query Data Preparation
1. Data issues
1. Self-service BI & Analytics 1. Get Data
1. BI vs BA 2. Analytics structure and Coordination Model 2. Data formatting
2. Profile Data
2. Technology in BI 3. BI Success (Organization) 3. Data blending
4. BI Success (Individual) 3. Transform Data
3. Data, Analysis, Analytics 4. Data Preparation Checklist
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations?
7. BI process
8. Decision making with BI
9. Decision Bias
Get and
Transform (M)
(20-25%)
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION
Power Query is an ETL tool (“a what?” I hear you say). ETL stands for Extract, Transform and Load. Let us look
at each of those words individually to get a better understanding.
•Extract – Data can be extracted from a variety of sources; databases, CSV files, text files, Excel workbooks,
specific cells on the same worksheet, websites and even some PDFs. Basically, if there is data stored
somewhere in a structured or semi-structured format, Power Query can get to it and pull it out.
•Transform – Once the data has been extracted in the previous step, it can be cleaned up (i.e., remove
spaces, split columns, change date formats, fill blanks, find and replace etc) and reshaped (i.e., unpivot,
remove columns etc). When data is extracted from different sources it is unlikely to be consistent, the
transform process is used to make it ready for use.
•Load – Once the data has been extracted and transformed, it needs to be put somewhere so that you
can use it. From an Excel perspective, it can be pushed into a worksheet, a data model, or another query.
To summarize, Power Query takes data from different sources and turns it into something which can be
used. As a tool, this is pretty useful already. But here is the best part. Once the ETL process has been
created, it can be run over and over again with a single click. Which can save hours of work every week.
DATA PREPARATION
DATA PREPARATION
DATA PREPARATION – Get Data
Overview of Supported Data Sources
DATA PREPARATION – Get Data
Two parts:
1.Using data in the same workbook
2.Using data from external files and external workbooks.
Using data in the same workbook
DATA PREPARATION – Transform Data
DATA PREPARATION – Transform Data
DATA PREPARATION – Transform Data
Add/ Modify Columns
In reality, most
datasets will give us
the majority of the
data we want, but
there still might be a
bit of data that's
missing or that
needs to beslightly
modified.
DATA PREPARATION – Transform Data
Insert in Applied Steps
DATA PREPARATION – Transform Data
Move in Applied Steps
DATA PREPARATION – Transform Data
Insert in Applied Steps
DATA PREPARATION – Transform Data
Add/ Modify Columns
DATA PREPARATION – Transform Data
Merge
DATA PREPARATION – Transform Data
Column Data Type and Formatting
3 Basic steps
Power Query (or Get & Transform as it’s known in Excel 2016 and
later)
Ken Puls, Mynda Treacy, Oz du Soliel were all talking about how
amazing Power Query is.
https://docs.microsoft.com/en-us/powerquery-m/
DATA PREPARATION – Transform Data
DATA PREPARATION – Transform Data
Writing Power Query Functions
DATA PREPARATION – Transform Data
Writing Power Query Functions
DATA PREPARATION – Transform Data
Writing Power Query Functions
DATA PREPARATION – Transform Data
Writing Power Query Functions
DATA PREPARATION
Replace
DATA PREPARATION – Transform Data
Sorting Data in Columns and Data Categorization
DATA PREPARATION – Transform Data
Sorting Data in Columns and Data Categorization
DATA PREPARATION – Transform Data
New Tables
01 02 03 04
BI Terminology Business Intelligence in Corporates Power Query Data Preparation
1. Get Data 1. Data issues
1. Self-service BI & Analytics
1. BI vs BA 2. Analytics structure and Coordination Model 2. Profile Data 2. Data formatting
2. Technology in BI 3. BI Success (Organization) 3. Transform Data 3. Data blending
4. BI Success (Individual)
3. Data, Analysis, Analytics 4. Data Preparation Checklist
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations?
7. BI process
8. Decision making with BI
9. Decision Bias
Lesson 4: Data Preparation
Why Data Cleansing?
Lesson 4: Data Preparation
Why Data Cleansing?
Lesson 4: Data Preparation
Why Data Cleansing?
Lesson 4: Data Preparation
Data Issues – Types of Data Issues
1. Dirty
Data
Data
Issues
2.
3.Outliers Missing
data
Lesson 4: Data Preparation
Data Issues – Dirty Data
Dirty Data contains some kind of errors in them, or in a format that’s unfriendly or unusable
Lesson 4: Data Preparation
Data Issues – Dirty Data: Extra Characters
Extra characters can be currency symbols, number signs… We’d need to remove these before
changing between field types
Lesson 4: Data Preparation
Data Issues – Dirty Data: Extra Characters (Example)
Lesson 4: Data Preparation
Data Issues – Dirty Data
No: Yes:
Why?
Because it is easier to analyze data when it is stored in its smallest parts
Lesson 4: Data Preparation
Data Issues – Missing data
Real Data
Downward BIAS
Lesson 4: Data Preparation
Data Issues – Solutions for Missing data
SOLUTIONS
1. Deleting Missing Data
2. Imputation
3. Advanced methods
Lesson 4: Data Preparation
Data Issues – Missing data: Deleting Missing Data
Imputation
In statistics, Imputation is the process of
substituting values in the data where the
value are missing (we impute values, we
are making them up). We are creating
fake data in order to develop a model
that makes sense and is as close to
reality as we can get it
Lesson 4: Data Preparation
Data Issues – Missing data: Imputation
Lesson 4: Data Preparation
Data Issues – Missing data: Imputation
Data types
Nominal -- categorical data , gender, type of school.(categories)
Ordinal data-- Ordinal Number is a number that tells the position of something in a list, such as 1st, 2nd, 3rd,
4th, 5th etc.
Interval data -- 6-7,8-9,100-200
Ratio -- Ex: Number of Applications/Number of Loans
Lesson 4: Data Preparation
Data Issues – Missing data: Imputation
Lesson 4: Data Preparation
Data Issues – Missing data: Imputation (Example)
Lesson 4: Data Preparation
Data Issues – Missing data: Advanced methods
If your business and results could be significantly off by using a simpler method,
you might want to explore these options
Identifying outliers in the data helps us understand how vulnerable our model would be to a small
set of observations.
Lesson 4: Data Preparation
Data Issues – Outliers: Identify
Could be correct, but it’s just abnormal then the analysis and Outliers didn’t change the
modeling process SHOULD INCLUDE that data. That said, it is results, the regression line
legitimate to create models without the data as well to compare retained its original lope, then
results, but it should be noted which models do and don’t contain it can be legitimate to
the outliers (2 options: include outliers vs exclude outliers) remove that observation
Lesson 4: Data Preparation
Data Issues – Outliers: Dealing with outliers
4. Truncation NOTE: We see here where age and income are fairly
random with no association between how old a
person is and how much income they have. But the
outlier creates the slope of the line by just being
present... so without outlier 1 (row 10), we have a
steep positive slope, but without outliers 2 (row 14)
and 3 (row 15), we have a negative slope.
In other words without the outlier we wouldn’t really
be able to draw a legitimate line at all, but the
presence of the outlier is what creates the model
effect. In cases such as this, we should definitely
Where we know that a certain value can remove the outlier and investigate other predictor
only be below a given maximum and yet variables.
a value is reported above that
Lesson 4: Data Preparation
Data Formatting
1. Transposing
2. Aggregating Data
3. Cross Tabulation
Lesson 4: Data Preparation
Data Formatting - Transposing
Lesson 4: Data Preparation
Data Formatting – Transposing - Example
Lesson 4: Data Preparation
Data Formatting – Transposing - Example
Power Query
(1) Select columns => Right Click => Unpivot
(2) Select other columns => Right Click => Unpivot other columns
Lesson 4: Data Preparation
Data Formatting - Aggregating Data
Lesson 4: Data Preparation
Data Formatting - Aggregating Data - Example
Lesson 4: Data Preparation
Data Formatting - Cross Tabulation
Lesson 4: Data Preparation
Data Formatting - Cross Tabulation - Example
Lesson 4: Data Preparation
Data Blending
Unioning allows you to take multiple datasets and deal with them as one
Lesson 4: Data Preparation
Data Blending – Joining Datasets
Lesson 4: Data Preparation
Data Blending – Fuzzy Matching
The match threshold is the minimum score achieved by the fuzzy matching for
it to be considered to be a match
Lesson 4: Data Preparation
Data Blending – Fuzzy Matching - Example
Lesson 4: Data Preparation
Data Blending – Fuzzy Matching - Example
Lesson 4: Data Preparation
Data Blending – Spatial Matching
There aren’t fields that can be Gray area: How many customers fall
used to join them together within a store trade area is to match
them and assign a store number to them
Lesson 4: Data Preparation
Data Blending – Spatial Blending - Example
Customer Information
Spatial Data
Lesson 4: Data Preparation
Data Problems Checklist