11/03/2024
CHAPTER 03
Performing the Test Plan and
Analyzing Results
Objectives
• Understand four categories of Data Analytics.
• Describe some descriptive analytics approaches, including summary
statistics and data reduction.
• Explain the diagnostic approach to Data Analytics, including
profiling and clustering.
• Understand predictive analytics, including regression and classification.
• Describe the use of prescriptive analytics, including machine learning and
artificial intelligence.
1
11/03/2024
Contents
• Four main categories of data analytics.
• Descriptive analytics
• Diagnostics analytics
• Predictive analytics
• Prescriptive analytics
Four main categories of data analytics.
• Descriptive analytics are procedures that summarize
existing data to determine what has happened in the past.
• Diagnostic analytics are procedures that explore the current
data to determine why something has happened the way it
has, typically comparing the data to a benchmark.
• Predictive analytics are procedures used to generate a
model that can be used to determine what is likely to happen
in the future.
• Prescriptive analytics are procedures that model data to enable
recommendations for what should be done in the future.
2
11/03/2024
Four main categories of data analytics.
Each stage takes additional effort but provides additional value.
Exhibit 3-1 Four Main Categories of Data Analytics
5
Descriptive analytics
Descriptive analytics help summarize what
has happened in the past.
• A financial accountant would sum all the sales transactions within
a period to calculate the value for Sales Revenue that appears on
the income statement.
• An analyst would count the number of records in a data extract to
ensure the data are complete before running a more complex analysis.
• An auditor would filter data to limit the scope to transactions that
represent the highest risk. In all these cases, basic analysis provides
an understanding of what has happened in the past to help decision
makers achieve good results and correct poor results.
3
11/03/2024
Descriptive analytics
Descriptive analytics examples:
• Summary statistics describe a set of data in terms of their location
(mean, median), range (standard deviation, minimum, maximum),
shape (quartile), and size (count).
• Data reduction or filtering is used to reduce the amount of
observations to focus on relevant items (that is, highest cost, highest
risk, largest impact, etc.). It does this by taking a large set of data
(perhaps the population) and reducing it to a smaller set that has the
vast majority of the critical information of the larger set.
Descriptive analytics
Summary statistics
Statistic Excel formula Description
Sum SUM() The total value of all numerical values
• Summary statistics The center value; sum of all observations divided by the
Mean =AVERAGE()
describe the number of observations
The middle value that divides the top half of the data from the
location, spread, Median =MEDIAN()
bottom half
Minimum =MIN() The smallest value
shape, and
Maximum =MAX() The largest value
dependence of a set Count =COUNT() The number of observations
Frequency =FREQUENCY() The number of observations in each of a series of numerical or
of observations.
categorical buckets
Standard The variability or spread of the data from the mean; a larger
=STDEV()
deviation standard deviation means a wider spread away from the mean
The value that divides a quarter of the data from the rest;
Quartile =QUARTILE()
indicates skewness of the data
Correlation =CORREL() How closely two datasets are correlated or predictive of one
coefficient another
Exhibit 3-3 Description of Summary Statistics
8
4
11/03/2024
Descriptive analytics
Summary statistics
Mean vs Median
Examples Mean Median
5, 3, 9, 7
9, 7, 8, 5, 4
8, 4, 4, 6, 2, 6, 6
7, 3, 5, 7, 1, 17, 13, 7
9
Descriptive analytics
Summary statistics
Mean vs Median: When to use?
It’s best to use the
Mean to describe the
center of a dataset when
the distribution is mostly
symmetrical and there
are no outliers.
10
5
11/03/2024
Descriptive analytics
Summary statistics
Mean vs Median: When to use?
When a distribution is
skewed, the Median does a
better job of describing the
center of the distribution
11
Descriptive analytics
Summary statistics
Mean vs Median: When to use?
The Median also does a
better job of capturing the
central location of a
distribution when there are
outliers present in the data
12
6
11/03/2024
Descriptive analytics
Summary statistics
Standard deviation
Examples Variance Std. deviation
10,15,20,25,30
5,10,15,20,20,25,30,30,35,40
13
Descriptive analytics
Summary statistics
Quartile
14
7
11/03/2024
Descriptive analytics
Summary statistics
Quartile
Examples Q1 Q2 Q3 IQR Min Max
40, 42, 47, 53, 54, 59, 65
5,10,15,20,20,25,30,30,35,40
15
Descriptive analytics
Data reduction involves the following steps:
• Identify the attribute you would
like to reduce or focus on.
• Filter the results.
• Interpret the results.
• Follow up on results.
Exhibit 3-4 Use Filters to Reduce Data
16
8
11/03/2024
Descriptive analytics
Data reduction
Fuzzy matching locates approximate matches
• Useful for
identifying
relationships in
imperfect data.
Exhibit 3-5 A Fuzzy Matching Shows a Likely Match of an
Employees and Vendor
17
Diagnostic analytics
Diagnostic analytics provide insight into why things happened
or how individual data values relate to the general population.
Two common methods of diagnostic analytics include
Profiling and Clustering.
More diagnostic analytics include Similarity matching and
Co-occurrence grouping
18
9
11/03/2024
Diagnostic analytics
Diagnostic analytics methods:
• Profiling identifies the “typical” behavior of an individual,
group, or population by compiling summary statistics about
the data (including mean, standard deviations, etc.) and
comparing individuals to the population.
• Clustering helps identify groups (or clusters) of individuals
(such as customers) that share common underlying
characteristics—in other words, identifying groups of similar
data elements and the underlying drivers of those groups.
19
Diagnostic analytics
Diagnostic analytics methods:
• Similarity matching is a grouping technique used to identify
similar individuals based on data known about them.
• Co-occurrence grouping discovers associations between
individuals based on common events, such as transactions
they are involved in.
20
10
11/03/2024
Diagnostic analytics
Profiling
• Profiling involves gaining an understanding of the typical
behavior of an individual, group, or population (or sample).
• Profiling can be used to develop complex models to
predict potential fraud.
• Profiling is done primarily using structured data—data that are
stored in a database or spreadsheet and are readily searchable.
21
Diagnostic analytics
Data profiling typically involves the following steps:
1. Identify the objects or activity you want to profile: What data do you
want to evaluate? Sales transactions? Customer data? Credit limits?
2. Determine the types of profiling you want to perform: What is your
goal? Do you want to set a benchmark for minimum activity? Have you
set a budget that you wish to follow?
3. Set boundaries or thresholds for the activity: This is a benchmark that
may be manually set or automatically set.
22
11
11/03/2024
Diagnostic analytics
Data profiling typically involves the following steps:
4. Interpret the results and monitor the activity and/or generate a list of
exceptions: Here is where dashboards come into play.
5. Follow up on exceptions: A plan should be taken to validate, correct, or
identify the causes of the abnormal behavior.
23
Diagnostic analytics
Profiling
Z-Scores - Standardizing Data for Comparison
Where:
• z = Z-score
• x = the value being evaluated
• μ = the mean
• σ = the standard deviation
24
12
11/03/2024
Diagnostic analytics
Profiling
Z-Scores shows spread and outliers.
The higher the Z-score
(farther away from the
mean), the more likely
a customer will have a
delayed shipment
(blue circle).
Exhibit 3-7 Z-Scores Provide an Example of Profiling That Helps Identify Outliers 25
Diagnostic analytics
Profiling
Box plots or whisker plot
• Displays the five-number summary of a set of data including the
minimum, first quartile, median, third quartile, and maximum
• The five-number summary divides the data into sections that each
contain approximately 25% of the data in that set
26
13
11/03/2024
Diagnostic analytics
Profiling
Box plots show spread and outliers
EXHIBIT 3-8 Box Plots Provide an Example of Profiling That Helps Identify Outliers
(in This Case, Categories with Unusually High Average Days to Ship) 27
Diagnostic analytics
Data profiling in management accounting
Variance analysis
• Internal auditors analyze
travel and entertainment
expenses for violations
of internal controls.
• Managers use profiling
to compare variances
from target ranges.
Exhibit 3-9 Variance Analysis Is an Example of Data
28
Profiling
14
11/03/2024
Diagnostic analytics
Data profiling in auditing
Benford’s Law
• In the continuous audit,
an auditor may use
Benford’s Law to
evaluate the frequency
distribution of the first
digits from a large set
of numerical data.
Exhibit 3-10 Benford’s Law Applied to Large Numerical
Data Sets (including Employee Transactions) 29
Diagnostic analytics
Benford’s Law is a diagnostic analytics that
compares actual to expected values.
30
15
11/03/2024
Diagnostic analytics
Cluster analysis shows natural groupings of data.
• Clustering is used to identify
groups of similar data
elements and the underlying
drivers of those groups.
• Clustering algorithms
calculate the minimum
distance of all observations
and groups those elements.
Exhibit 3-11 Clustering Is Used to Find Three Natural
Groupings of Vendors Based on Purchase Activity
31
Diagnostic analytics
Clustering in auditing
• Internal auditors can use
clustering to identify
groups of transactions
that may indicate risk or
fraud in insurance or
other payments.
Exhibit 3-12 Cluster Analysis of Insurance Payments
32
16
11/03/2024
Diagnostic analytics
Hypothesis Testing for Differences in Groups
• A two-sample t-test for equal means is used to determine
if the difference between the means of two different
populations is significant or not.
• Begin by setting the Null Hypothesis H0 (no relationship)
and the Alternative Hypothesis HA (expected relationship).
33
Diagnostic analytics
Hypothesis Testing for Differences in Groups
• Significance level (α)
• A measure of the strength of the evidence before rejecting the null
hypothesis and concluding that the effect is statistically significant.
• The probability of rejecting the null hypothesis when it is true.
• The p-value: a number calculated from a statistical test to
help decide whether to reject the null hypothesis.
• How?: Compare p-value to α.
34
17
11/03/2024
Diagnostic analytics
Hypothesis Testing for Differences in Groups
EXHIBIT 3-13 T-Test Assessing for Significant Differences in Average Shipping Times across Categories 35
Predictive analytics
Predictive analytics examples:
• Regression estimates or predicts the numerical value of a
dependent variable based on the slope and intersect of a line
and the value of an independent variable.
• Classification predicts a class or category for a new
observation based on the manual identification of classes
from previous observations.
• Link prediction predicts a relationship between two data
items, such as members of a social media platform.
36
18
11/03/2024
Predictive analytics
Regression helps predict expected outcomes.
• Regression is a statistical method that
attempts to determine the strength and
character of the relationship between one
dependent variable (usually denoted by Y)
and a series of other variables (known as
independent variables).
• Regressions allow the accountant to develop
models to predict expected outcomes.
Exhibit 3-14 Regression
37
Predictive analytics
Regression helps predict expected outcomes.
Regression analysis involves the following process:
1. Identify the variables that might predict an outcome.
2. Determine the functional form of the relationship
(linear of nonlinear?).
3. Identify the parameters of the model (β, P-value).
4. Evaluate the goodness of fit (R2)
38
19
11/03/2024
Predictive analytics
What are some examples of regression?
• In managerial accounting, regression may predict
employee turnover:
• Sales volume = f(advertising spending, and economic indicators
such as GDP or inflation)
• In auditing, regression may be used to determine
the appropriateness of allowance accounts:
• Allowance for loan losses amount = f(current aged loans, loan
type, customer loan history, collections success)
39
Predictive analytics
Classification predicts which class an
individual belongs to
• Identify the classes you wish to predict.
• Manually classify an existing set of records.
• Select a set of classification models.
• Divide your data into training and testing sets.
• Generate your model.
• Interpret the results and select the “best” model.
40
20
11/03/2024
Predictive analytics
Classification begins with decision boundaries.
• Training data are existing data
that have been manually
evaluated and assigned a class.
• Test data are existing data
used to evaluate the model.
• Decision trees are used to divide
data into smaller groups.
• Decision boundaries mark
the split between one class
and another.
Exhibit 3-16 Example of Decision Trees and Decision
Boundaries
41
Predictive analytics
What else do you need to know about classification?
• Pruning removes branches
from a decision tree to
avoid overfitting the model.
Exhibit 3-17 Illustration of Pruning a Decision Tree
42
21
11/03/2024
Predictive analytics
What else do you need to know about classification?
• Linear classifiers are useful for
ranking items rather than simply
predicting class probability.
• These are useful for determining
the important values, such as
valuable customers, or which
transactions are most likely
fraudulent.
Exhibit 3-18 Illustration of Linear Classifiers
43
Predictive analytics
What else do you need to know about classification?
• Support vector machine is a
discriminating classifier that
is defined by a separating
hyperplane that works first to
Exhibit 3-19 Support Vector Machines
find the widest margin (or
biggest pipe) and then works
to find the middle line.
Exhibit 3-20 Support Vector Machine Decision Boundaries
44
22
11/03/2024
Predictive analytics
How do we evaluate classifiers?
• Try to avoid overfitting, or
models that are too accurate.
They are bad at predicting a
future observation.
Exhibit 3-21 Illustration of Underfitting and
Overfitting the Data with a Predictive Model
45
Predictive analytics
How do we evaluate classifiers?
• Look for the sweet spot
where we maximize the
accuracy of the testing data.
Exhibit 3-22 Illustration of the Trade-Off between the
Complexity of the Model and the Accuracy of the
Classification
46
23
11/03/2024
Prescriptive analytics
Once other diagnostic and predictive analyses have
been performed, the decision process can be aided by
Machine learning and Artificial intelligence are two forms
rules-based decision support systems, machine learning
models, or added to an existing artificial intelligence of Prescriptive approach to Data Analytics work.
model to improve future predictions.
47
Prescriptive analytics
Prescriptive analytics examples:
• Decision support systems are rule-based systems that
gather data and recommend actions based on the input.
• Machine learning and artificial intelligence are learning
models or intelligent agents that adapt to new external data
to recommend a course of action.
48
24
11/03/2024
Prescriptive analytics
DSS use rules to guide the accountant.
• The rules are derived from past
behavior to help guide the
accountant through a process.
• For example, the classification
of leases is based on
evaluating several rules.
Exhibit 3-23 Lease Classification Flowchart
49
Prescriptive analytics
Machine learning learns from past data to
predict better outcomes.
• What these all have in common is the use of algorithms and statistical
models to generate a previously unknown model that relies on patterns
and inferences.
• For most application of artificial intelligence models, most companies
will outsource the underlying system from companies like Microsoft,
Amazon, or Google rather than develop it themselves.
• These companies have large datasets to create more accurate
prediction and recommendation engines.
50
25
11/03/2024
Summary
• In this chapter, we addressed the third and fourth steps of the
IMPACT cycle model: the “P” for “performing test plan” and “A” for
“address and refine results.” That is, how are we going to test or
analyze the data to address a problem we are facing?
• We identified descriptive analytics that help describe what happened
with the data, including summary statistics, data reduction, and filtering.
• We provided examples of diagnostic analytics that help users identify
relationships in the data that uncover why certain events happen through
profiling, clustering; similarity matching, and co-occurrence grouping.
51
Summary
• We introduced some specific models and terminology related to these tools,
including Benford’s law, test and training data, decision trees and boundaries,
linear classifiers, and support vector machines. We identified cases where creating
models that overfit existing data are not very accurate at predicting the future.
• We explained examples of predictive analytics and introduced some data mining
concepts related to regression, classification, and link prediction that can help
predict future events or values. We discussed prescriptive analytics, including
decision support systems and artificial intelligence and provided some examples
of how these systems can make recommendations for future actions.
52
26