UJ Module Lecture 2 Rev2
UJ Module Lecture 2 Rev2
03 August 2020
Module Index
• Introduction 4
• Concepts of Reliability 6
• Probability of Failure 10
• Statistical Analysis 17
• Base Methodology and Concepts 18
• Mathematics and Statistical Models 28
• Statistical Methods 29
• Analysis Fundamental Approach – A Quick Recap 37
• Statistical Data Method Considerations 56
• Most used Statistical Methods 62
• Linear Regression 67
• Statistical Classification 69
• Resampling Methods 76
• Sub-set Selection 79
• Shrinkage 83
• Dimension Reduction 85
• Exploratory Factor Analysis (EFA) 88
• Non-Linear Models 91
• Support Vector Machines (SVM) 93
• Unsupervised Learning 96
• Reliability of Scales (Cronbach’s Alpha) 99
2
Module Index
• Reliability Statistics and Mathematics 102
Reliability Statistics 105
Discrete Functions 110
Continuous Functions 112
Series of Events (Point Processes) 118
Dealing with Multiple Reliability Engineering Scenarios 124
Bayesian Methods 128
Monte Carlo Simulation 136
Markov Analysis 143
Petri Nets 150
• Life Data Analysis Concepts 153
Life Data Classification 155
Ranking of Data 157
Confidence Bounds 162
Maximum Likelihood Estimation (MLE) 164
• Managing Variations in Engineering 166
Discrete Variation 170
• Statistical Handbooks & References 177
3
Introduction
5
Concepts of Probability
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 7
Probability are defined in 2 ways
Definition 1: If an event can occur in N equally likely ways, and if the event with attribute A can happen in n of
these ways, then the probability of A occurring is
The first definition covers the case of equally likely independent events occurring, such as rolling a dice.
Definition 2: If, in an experiment, an event with attribute A occurs n times out of N experiments, then as n
becomes large, the probability of event A approaches n/N, that is,
The second definition covers typical cases in quality control and reliability. If we test 100 items and find that 30
are defective, we may feel justified in saying that the probability of finding a defective item in our next test is
0.30, or 30 %. The probability of 0.30 of finding a defective item in our next test may be considered as our
degree of belief, in this outcome, limited by the size of the sample.
RULES of Probability: See Section 2.4 (Pages 22-24 of prescribed handbook for more information)
Reference Handbook: Probability and Statistics by Example, Volume I (Basic Probability and Statistics). Suhov and Kelbert
8
Example of Rules of Probability: P22-25 in handbook
Tutorial: https://revisionmaths.com/gcse-maths-revision/statistics-handling-data/probability
Probability are also affected by Dependent vs Independent Events
• Two events are independent if the
outcome of one of the events doesn't
affect the outcome of another.
• PoF is established by deterioration models wherein there are two primary methods for graphically
representing the PoF along a curve:
• Degradation Curve - A performance curve, such as the P-F interval.
• Survivor Curve - Usually expressed a survivor curve, with positive skewness or negative skewness.
• Further refinement of the survivor curves is derived from the following analysis.
• Right-Modal Curve ("R" Curves) with negative skewness
• Left-Modal Curve ("L" Curves) with positive skewness
• Symmetrical ("S" Curve) with no skewness
• Original Modal ("O" Curve) with extreme positive skewness
11
Data required for Probability of Failure Analysis
• In order to establish PoF, several pieces of information is required, including:
• typical service life,
• consumed life and remaining life,
• PF interval,
• FMEA.
• Physics of failure (how does the item fail?)
• Some of this information is empirical and some is statistical in nature.
• Listed below are some of the key elements to establish the PoF for a single asset:
• Service Life
• Consumed Life (measured from the tope of a performance curve).
• Remaining Useful Life (measured to the bottom of a performance curve or to functional failure or to complete
failure).
12
It requires an understanding of the Physics of Failure
• The pace at which an asset degrades over
time under normal operating conditions also
plays a role.
• It is an expression of the durability and
robustness of an asset.
13
https://assetinsights.net/Concepts/Peeps_Statistical_Elements_of_Asset_Survivor_Curve.JPG
Left and Right Modal curves (P-F Curve Refinement)
• Key attributes of Right-Modal curves:
• Back-end Loading - These are assets in which the greatest frequency of retirements is after the life
term is reached, which causes the retirement frequency curve to be skewed to the right of the mean. The
majority of the assets in this group will last longer than the average life, but most of them will be retired in a
short period of time after the average life is reached.
• Negative Skewness - The probability distribution has negative skewness with a long tail to the left of the
mean.
• Positive Aging - The longer the asset has been in service, the more likely it is to fail. In other words, the
hazard function increases for larger values of t. This makes intuitive sense, because the longer stuff is
used, the more it wears down. Thus, something that has been in use for a long time will be approaching its
breaking point.
14
A
Survivor Curve – Often used for Asset renewal decisions
15
https://assetinsights.net/Concepts/Peeps_Statistical_Elements_of_Asset_Survivor_Curve.JPG
Risk & Risk Mitigation
• Probability of Failure is also used to classify risk and
consider risk mitigation factors.
• The Probability of Failure (PoF) and risk of such a
failure is directly affected by various factors, including
the following:
• The quality of maintenance applied to the preservation of the
assets.
• Exposure or protection of the assets from the elements and
other loadings.
• Durability of the materials used in the production of the
assets.
• Effective age of the assets relative to their chronological age.
16
Risk prioritisation is closely coupled to FMEA/FMECA analysis (Discussed in Module 6)
Statistical Analysis
• There are no ‘rules of thumb’ when determining sample size for quantitative research.
• A hierarchical scale of increasing precision can be used for observing and recording
the data which is based on categorical, ordinal, interval and ratio scales
• Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be
arranged in any particular order. (E.g. quality of training – good, bad or average)
• Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal
intervals.
• Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval
variable are equally spaced.
• Ratio scales are similar to interval scales, in that equal differences between scale values have equal
quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional
property.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 21
Qualitative vs Quantitative Analysis
Although the table above illustrates qualitative and quantitative research as distinct and opposite, in
practice they are often combined or draw on elements from each other. For example, quantitative
surveys can include open-ended questions. Similarly, qualitative responses can be quantified. Qualitative
and quantitative methods can also support each other, both through a triangulation of findings and by
building on each other (e.g., findings from a qualitative study can be used to guide the questions in a
survey).
Source: 6 Methods of data collection and analysis. The Open University. 22
Descriptive vs Inferential Statistics
• Descriptive statistics try to describe the relationship between variables in a sample or population.
• Descriptive statistics provide a summary of data in the form of mean, median and mode.
• Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores.
• Median is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value)
• Mode is the most frequently occurring variable in a distribution.
• The extent to which the observations cluster around a central location is described by the central tendency and the spread towards
the extremes is described by the degree of dispersion.
Negatively and
Positively skewed
distributions
• Inferential statistics use a random sample of data taken from a population to describe and make
inferences about the whole population.
• It is valuable when it is not possible to examine each member of an entire population.
• The purpose is to answer or test the hypotheses. A hypothesis is a proposed explanation for a phenomenon. Hypothesis tests are
thus procedures for making rational decisions about the reality of observed effects.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 23
Parametric and Non-Parametric Tests
• Numerical data (quantitative variables) that are normally distributed are analysed with parametric
tests. Two most basic pre-requisites for parametric statistical analysis are:
• The assumption of normality which specifies that the means of the sample group are normally distributed.
• The assumption of equal variance which specifies that the variances of the samples and of their corresponding population
are equal.
• Parametric tests: The parametric tests assume that the data are on a quantitative (numerical) scale,
with a normal distribution of the underlying population. The samples have the same variance
(homogeneity of variances). The samples are randomly drawn from the population, and the
observations within a group are independent of each other. The commonly used parametric tests are
the t-test, analysis of variance (ANOVA) and repeated-measures ANOVA.
• Non-Parametric: If the distribution of the sample is skewed towards one side or the distribution is
unknown due to the small sample size, non-parametric statistical techniques are used. Non-parametric
tests are used to analyse ordinal and categorical data.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 24
Parametric and Non-Parametric Test Equivalents
Source: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 25
Parametric Tests
• t-test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three
circumstances:
• To test if a sample mean (as an estimate of a population mean) differs significantly from a given population mean (this is a one-sample t-test)
• To test if the population means estimated by two independent samples differ significantly (the unpaired t-test).
• To test if the population means estimated by two dependent samples differ significantly (the paired t-test). A usual setting for paired t-test is when
measurements are made on the same subjects before and after a specified intervention.
• The t-test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any
significant difference between the means of two or more groups. In ANOVA, two variances are studied:
• Between-group variability: The between-group (or effect variance) is the result of the intervention imposed (before-after).
• Within-group variability: The variation that cannot be accounted for in the study design. It is based on random differences present in the data
samples.
• These two estimates of variances are compared using the F-test.
• As with ANOVA, Repeated-Measures ANOVA analyses the equality of means of three or more groups.
• A repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time. As the
variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA
in this case is not appropriate because it fails to model the correlation between the repeated measures (i.e. the data violate the ANOVA assumption of
independence).
• Hence, in the measurement of repeated dependent variables, Repeated-Measures ANOVA should be used.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 26
Non-Parametric Tests
• When the assumptions of normality are not met, and
the sample means are not normally distributed
parametric tests can lead to erroneous results.
• Non-parametric tests (distribution-free test) are used in
such situations as they do not require the normality
assumption.
• Non-parametric tests may fail to detect a significant
difference when compared with a parametric test. That
is, they usually have less power.
• As is done for the parametric tests, the test statistic is
compared with known values for the sampling
distribution of that statistic and the null hypothesis is
accepted or rejected.
• See source below for brief description of non-
parametric test intent.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 27
Mathematics and Statistical Models
• Descriptive Statistics
• Frequencies and Percentages
• Means and Standard Deviations (SD)
• Inferential Statistics
• Hypothesis Testing
• Correlation
• T-Tests
• Chi-square
• Logistic Regression
• Prediction
• Confidence Intervals
• Significance Testing
• Regression Analysis: Regression analysis allows modelling the relationship between a dependent variable
and one or more independent variables. In data mining, this technique is used to predict the values, given a
particular dataset.
• Inferential: Aims to test theories about the nature of the world in general (or some part of it) based on
samples of “subjects” taken from the world (or some part of it). That is, use a relatively small sample of data to
say something about a bigger population.
• Predictive: The various types of methods that analyse current and historical facts to make predictions about
future events. In essence, to use the data on some objects to predict values for another object. Accurate
prediction depends heavily on measuring the right variables
• Causal: To find out what happens to one variable when you change another.
31
Statistical Methods most often used
• Mechanistic: Understand the exact changes in variables that lead to changes in other variables for individual
objects. Usually modelled by a deterministic set of equations (physical/engineering science)
• Factor Analysis: Factor analysis is a regression based data analysis technique, used to find an underlying
structure in a set of variables. It goes with finding new independent factors (variables) that describe the
patterns and models of relationships among original dependent variables.
• Dispersion Analysis: Dispersion analysis is not a common method used in data mining, but still has a role in
analysis. Dispersion is the spread to which a set of data is stretched. It is a technique of describing how
extended a set of data is. The measure of dispersion helps data scientists to study the variability of the item(s)
being researched.
• Discriminant Analysis: Discriminant analysis is one of the most powerful classification techniques in data
mining. The discriminant analysis utilizes variable measurements on different groups of items to underline
points (characteristics) that distinguish the groups.
• Time Series Analysis: Time series data analysis is the process of modelling and explaining time-
dependent series of data points. The goal is to draw all meaningful information (statistics, rules and patterns)
from the shape of data.
Reference Handbook: Time Series Analysis and its Applications. Shumway & Stoffer
32
Modern Statistical Data Analysis Methods used
• Artificial Neural Networks (ANN): Often just called a “neural network”, present a brain metaphor for
information processing. These models are biologically inspired computational models. They consist of an
inter-connected group of artificial neurons and process information using a computation approach.
• Decision Trees: The decision tree is a tree-shaped diagram that represents classification or regression
models. It divides a data set into smaller and smaller sub-datasets (that contain instances with similar values)
while at the same time a related decision tree is continuously developed. The tree is built to show how and
why one choice might lead to (or influence) the next, with the help of the branches.
• Evolutionary Programming: Evolutionary programming in data mining is a common concept that combines
many different types of data analysis using evolutionary algorithms. Most popular of them are: genetic
algorithms, genetic programming, and co-evolutionary algorithms.
• Fuzzy Logic: Fuzzy logic is applied to cope with the uncertainty in data mining problems. Fuzzy logic is an
innovative type of many-valued logic in which the truth values of variables are a real number between 0 and 1.
In this term, the truth value can range between completely true and completely false.
33
Summary of Descriptive and Graphical Statistical Representations
34
Examples of Graphical Statistical Representations
Box Plot
Histogram
Pie/Bar Chart
Line Chart
2
Investigate the data with summary
statistics and charts first – summary
results and charts especially can show
outliers and patterns.
For continuous normally distributed data,
summarise using means and standard
deviations. If the data is skewed or there
are influential outliers, the median
(middle value) and interquartile range
(Upper quartile – lower quartile) are
more appropriate.
39
Consider “Goodness of Fit” of data set
2
In analysing statistical data we need to determine how well the data
“fits” an assumed distribution.
The goodness of fit can be tested statistically, to provide a level of
significance that the null hypothesis (i.e. that the data do fit the
assumed distribution) is rejected.
Goodness-of-fit testing is an extension of significance testing in which
the sample cdf is compared with the assumed true cdf.
A number of methods are available to test how closely a set of data fits
an assumed distribution. As with significance testing, the power of Red line is CDF, blue line is an estimated CDF
(ECDF) – Kolmogorov-Smirnov statistical test
these tests in rejecting incorrect hypotheses varies with the number outcome example
Source:
and type of data available, and with the assumption being tested. The https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
41
Statistical Method Choice for Prediction/Probability Modelling
3
Source: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-
nonparametric-tests
42
Assumption of Normality
Examples of very skewed data (i.e. non-normal)
4
Parametric tests assume that the data follows a particular
distribution e.g for t-tests, ANOVA and regression, the
data needs to be normally distributed.
Parametric tests are more powerful than non-parametric
tests, when the assumptions about the distribution of the
data are true. This means that they are more likely to
detect true differences or relationships that exist.
The tests are quite robust to departures of non-normality
so the data only needs to be approximately normally
distributed.
Plotting a histogram or QQ plot of the variable of interest
will give an indication of the shape of the distribution.
Histograms should peak in the middle and be
approximately symmetrical about the mean. If data is
normally distributed, the points in QQ plots will be close to
the line.
43
Statistical Tests for Normality
4 There are statistical tests for normality such as the Shapiro-Wilk and Kolmogorov-Smirnoff tests but for
small sample sizes (n < 20), the tests are unlikely to detect non-normality and for larger sample sizes (n >
50), the tests can be too sensitive. They are also sensitive to outliers so use histograms (for large
samples) or QQ plots (for small samples).
44
Other Considerations
5
Dependent vs Independent: For most tests, it is assumed that the observations are independent. That is,
the results for one subject are not affected by another.
Examples of data which is not independent are
• repeated measures on the same subject (use the specific tests for this type of experiment) and
• observations over time (check the Durbin Watson test for regression).
Another situation where observations are not independent is when subjects are nested within groups with
a common influence e.g. children within classes who may be influenced by the teacher (use multilevel
modelling to include class as an extra RANDOM factor).
Time series analysis and multilevel modelling allows for non-independent measurements over time but
are much more complex analysis techniques.
https://www.google.co.za/url?sa=i&url=https%3A%2F%2Fwww.expii.com%2Ft%2Fdependent-and-independent-variables
If you have a small sample or if you don’t know the population standard
deviation which in most real-life cases is true), then you’ll find the 95% Image Source: WUSTL.EDU
https://greatbrook.com/survey-statistical-confidence-how-many-is-enough/ 47
Confidence Interval Sample
6 The diagram below shows the confidence intervals for 27 samples of babies taken from the same
population. The actual population mean (which is not normally known) is 7.73 lbs. Two of the confidence
intervals do not contain the population mean (don’t overlap 7.73 lbs)
48
Other Considerations
7
Data Sample Size: The larger the sample size, the more likely a significant result is, so for small sample
sizes a huge difference is needed to conclude a significant difference.
For large sample sizes, small differences may be significant but then it is required to check if the difference
is meaningful.
8
Effect Size: An effect size is a measure of the strength or magnitude of the effect of an independent
variable on a dependent variable which helps assess whether a statistically significant result is
meaningful.
For example, for a t-test, the absolute effect size is just the difference between the two groups. A
standardised effect size involves variability and can then be compared to industry standards.
49
Other Considerations
9
Measure of Variance: A measure of variability is a summary statistic that represents the amount of
dispersion in a dataset. While a measure of central tendency describes the typical value, measures of
variability define how far away the data points tend to fall from the centre. We talk about variability in the
context of a distribution of values. A low dispersion indicates that the data points tend to be clustered
tightly around the centre. High dispersion signifies that they tend to fall further away.
Variance is the average squared difference of the values from the mean. Unlike the measure of variability,
the variance includes all values in the calculation by comparing each value to the mean. To
calculate this statistic, you calculate a set of squared differences between the data points and the mean,
sum them, and then divide by the number of observations.
51
Statistical Hypothesis Testing
10
• Key terms:
• NULL HYPOTHESIS (H0) is a statement about the population & sample data used to decide whether to reject that statement or not. Typically the statement
is that there is no difference between groups or association between variables.
• ALTERNATIVE HYPOTHESIS (H1) is often the research question and varies depending on whether the test is one or two tailed.
• SIGNIFICANCE LEVEL: The probability of rejecting the null hypothesis when it is true, (also known as a Type I error). This is decided by the individual but
is normally set at 5% (0.05) which means that there is a 1 in 20 chance of rejecting the null hypothesis when it is true.
• TEST STATISTIC is a value calculated from a sample to decide whether to accept or reject the null (H0) and varies between tests. The test statistic
compares differences between the samples or between observed and expected values when the null hypothesis is true.
• P-VALUE: the probability of obtaining a test statistic at least as extreme as ours if the null is true and there really is no difference or association in the
population of interest. P-values are calculated using different probability distributions depending on the test. A significant result is when the p-value is less
than the chosen level of significance (usually 0.05).
• Many significance test techniques have been developed for dealing with the many types of situation
which can be encountered.
• Z-Test (test for differences in Means)
• X2 Test for Significance
• F Test (Test for Differences in Variances and Variance Ratio Test)
52
See pages 53 – 54 of handbook for examples
Hypothesis Testing Approach
10
• A statistical hypothesis is an assumption
made by the researcher about the data of
the population collected for any
experiment.
• Statistical Hypothesis Testing can be
categorized into two types :
• Null Hypothesis – Hypothesis testing is carried out
in order to test the validity of a claim or assumption
that is made about the larger population. This claim
that involves attributes to the trial is known as the
Null Hypothesis. The null hypothesis testing is
denoted by H0.
• Alternative Hypothesis – An alternative hypothesis
would be considered valid if the null hypothesis is
fallacious. The evidence that is present in the trial
is basically the data and the statistical
computations that accompany it. The alternative
hypothesis testing is denoted by H1or Ha.
https://data-flair.training/blogs/hypothesis-testing-in-r/ 53
Hypothesis Example
10 Members of a jury have to decide whether a person is guilty or innocent based on evidence presented to
them. If a court case was a hypothesis test, the jury consider the likelihood of innocence given the
evidence and if there’s less than a 5% chance that the person is innocent they reject the statement of
innocence. The null can only be rejected if there is enough evidence to disprove it and the jury do not
know whether the person is really guilty or innocent so they may make a mistake.
Null: The person is innocent
Alternative: The person is not innocent.
54
Type I and Type II Hypothesis Errors
10 A statistically significant result cannot prove that a research hypothesis is correct (as this implies 100%
certainty).
Because a p-value is based on probabilities, there is always a chance of making an incorrect conclusion
regarding accepting or rejecting the null hypothesis (H0).
Anytime we make a decision using statistics there are four possible outcomes, with two representing
correct decisions and two representing errors.
The mean is useful in determining the overall trend of a data set or providing a rapid snapshot of the
data.
An advantage of the mean is that it’s very easy and quick to calculate.
Pitfalls:
• Taken alone, the mean is a dangerous tool. In some data sets, the mean is also closely related to
the mode and the median (two other measurements near the average).
• In a data set with a high number of outliers or a skewed distribution, the mean simply doesn’t
provide the accuracy you need for a nuanced decision.
57
Standard Deviation
The standard deviation, often represented with the Greek letter sigma, is the measure of a spread of
data around the mean.
A high standard deviation signifies that data is spread more widely from the mean, where a low
standard deviation signals that more data align with the mean.
In a portfolio of data analysis methods, the standard deviation is useful for quickly determining
dispersion of data points.
Pitfall:
• Just like the mean, the standard deviation is deceptive if taken alone. For example, if the data have
a very strange pattern such as a non-normal curve or a large amount of outliers, then the standard
deviation won’t give all the information that may be needed to draw an accurate conclusion.
58
Regression
Regression models the relationships between dependent and explanatory variables, which are
usually charted on a scatterplot.
The regression line also designates whether those relationships are strong or weak.
Regression is commonly taught in statistics courses with applications for science or business in
determining trends over time.
Pitfall:
• Regression is not very nuanced. Sometimes, the outliers on a scatterplot (and the reasons for
them) matter significantly. For example, an outlying data point may represent the input from your
most critical supplier or your highest selling product. The nature of a regression line, however,
tempts you to ignore these outliers. You can have data sets that have the exact same regression
line but include widely different data points.
59
Sample Size Determination
When measuring a large data set or population, like a workforce, it is not always necessary to collect
information from every member of that population – a sample does the job just as well.
The trick is to determine the right size for a sample to be accurate. Using proportion and standard
deviation methods, it is possible to accurately determine the right sample size needed to make the
data collection statistically significant.
Pitfall:
• When studying a new, untested variable in a population, the proportion equations might need to
rely on certain assumptions. However, these assumptions might be completely inaccurate. This
error is then passed along to the sample size determination and then onto the rest of the statistical
data analysis.
60
Hypothesis Testing
Also commonly called t-testing, hypothesis testing assesses if a certain premise is actually true for
your data set or population.
In data analysis and statistics, the result of a hypothesis test is considered statistically significant if
the results couldn’t have happened by random chance. Hypothesis tests are used in everything from
science and research to business and economics.
Pitfalls:
• To be rigorous, hypothesis tests need to watch out for common errors:
• For example, the placebo effect occurs when participants falsely expect a certain result and then perceive
(or actually attain) that result.
• Another common error is the Hawthorne effect (or observer effect), which happens when participants skew
results because they know they are being studied.
61
Most used Statistical Methods
The Most Common Statistical Techniques Used
1. INDEPENDENT T-TEST
2. MANN-WHITNEY TEST
3. PAIRED T-TEST
4. WILCOXON SIGNED RANK TEST
5. ONE-WAY ANOVA
6. KRUSKAL-WALLIS TEST
7. ONE-WAY ANOVA WITH REPEATED MEASURES (WITHIN SUBJECTS)
8. FRIEDMAN TEST
9. TWO-WAY ANOVA
10. CHI-SQUARED TEST
11. ODDS AND RELATIVE RISK
12. CORRELATION
13. PEARSON’S CORRELATION COEFFICIENT
14. RANKED CORRELATION COEFFICIENTS
15. PARTIAL CORRELATION
16. REGRESSION
17. LINEAR
18. LOGISTIC REGRESSION
19. PROPORTIONS TEST (Z-TEST
20. RELIABILITY
21. Interrater reliability
22. Cohen’s Kappa
23. Intra-class Correlation Coefficient
24. Cronbach's alpha (reliability of scales)
25. PRINCIPAL COMPONENT ANALYSIS (PCA)
26. CLUSTER ANALYSIS
27. HIERARCHICAL CLUSTERING
28. K-MEANS CLUSTERING
2. Classification
3. Resampling Methods
4. Sub-set Selection
5. Shrinkage
6. Dimension Reduction
7. Non-Linear Models
8. Tree-Based Models
https://hbr.org/1971/07/how-to-choose-the-right-forecasting-technique 65
Choosing the Techniques
https://hbr.org/1971/07/how-to-choose-the-right-forecasting-technique 66
Linear Regression
Linear Regression
In statistics, linear regression is a method to predict a target
variable by fitting the best linear relationship between the
dependent and independent variable.
The best fit is done by making sure that the sum of all the
distances between the shape and the actual observations at
each point is as small as possible. The fit of the shape is
“best” in the sense that no other position would produce less Simple Linear Regression
• Classification is a data mining technique that assigns categories to a collection of data in order to
aid in more accurate predictions and analysis.
• Also sometimes called a Decision Tree (See next slide), classification is one of several methods
intended to make the analysis of very large datasets more effective.
Stepwise Selection: This approach identifies a subset of the p predictors that is believed to be related to the
response. A model is then fitted using the least squares of the subset features.
Best-Subset Selection: Here we fit a separate OLS regression for each possible
combination of the p predictors and then look at the resulting model fits.
The algorithm is broken up into 2 stages: (1) Fit all models that contain k predictors,
where k is the max length of the models, (2) Select a single model using cross-validated
prediction error.
It is important to use testing or validation error, and not training error to assess model fit
because RSS and R² monotonically increase with more variables.
The best approach is to cross-validate and choose the model with the highest R² and lowest
RSS on testing error estimates.
Tutorial: http://online.fliphtml5.com/crpq/bwcj/#p=2
80
https://www.youtube.com/watch?v=Ah9XWzsB2mo
Subset Stepwise Selection Options A - C
Forward Stepwise Selection considers a much
smaller subset of p-predictors. It begins with a model
containing no predictors, then adds predictors to the
model, one at a time until all of the predictors are in
the model. The order of the variables being added is
the variable, which gives the greatest addition
improvement to the fit, until no more variables
improve model fit using cross-validated prediction
error.
Backward Stepwise Selection begins will
all p predictors in the model, then iteratively removes
the least useful predictor one at a time.
Hybrid Methods follows the forward stepwise
approach, however, after adding each new variable,
the method may also remove variables that do not
contribute to the model fit.
Tutorial: http://online.fliphtml5.com/crpq/bwcj/#p=2
82
https://www.youtube.com/watch?v=Ah9XWzsB2mo
https://www.machinelearningplus.com/machine-learning/feature-selection/
Shrinkage
Shrinkage (“Regularisation”)
This approach fits a model involving all p predictors, however, the estimated coefficients are shrunken towards
zero relative to the least squares estimates. This shrinkage, aka regularisation has the effect of reducing
variance. Depending on what type of shrinkage is performed, some of the coefficients may be estimated to be
exactly zero. Thus this method also performs variable selection. The two best-known techniques for shrinking
the coefficient estimates towards zero are the ridge regression and the lasso.
Ridge regression is similar to least squares except that the coefficients are estimated by minimizing
a slightly different quantity. Ridge regression, like OLS, seeks coefficient estimates that reduce RSS,
however they also have a shrinkage penalty when the coefficients come closer to zero. This penalty
has the effect of shrinking the coefficient estimates towards zero. Without going into the math, it is
useful to know that ridge regression shrinks the features with the smallest column space variance.
Like in principal component analysis, ridge regression projects the data into d-directional space and
then shrinks the coefficients of the low-variance components more than the high variance
components, which are equivalent to the largest and smallest principal components.
Lasso: Ridge regression had at least one disadvantage; it includes all p-predictors in the final model.
The penalty term will set many of them close to zero, but never exactly to zero. This isn’t generally a
problem for prediction accuracy, but it can make the model more difficult when interpreting the
results. Lasso overcomes this disadvantage and is capable of forcing some of the coefficients to zero
granted that s is small enough. Since s = 1 results in regular OLS regression, as s approaches 0 the
coefficients shrink towards zero. Thus, Lasso regression also performs variable selection.
More Info: https://www.youtube.com/watch?v=Q81RR3yKn30 and https://www.youtube.com/watch?v=NGf0voTMlcs
84
Dimension Reduction
Dimension Reduction
Partial least squares (PLS) are a supervised alternative to PCR. Like PCR, PLS is a dimension
reduction method, which first identifies a new smaller set of features that are linear combinations of
the original features, then fits a linear model via least squares to the new M features. Yet, unlike PCR,
PLS makes use of the response variable in order to identify the new features.
http://www.statisticshell.com/docs/factor.pdf
87
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm
Video: https://www.youtube.com/watch?v=WKEGhyFx0Dg (PLS)
Exploratory Factor Analysis (EFA)
Exploratory Factor Analysis (EFA)
• Factor analysis is used to model the interrelationships among items, and
focus primarily on the variance and covariance rather than the mean.
Factor analysis assumes that variance can be partitioned into two types
of variance, common and unique
• Common variance is the amount of variance that is shared among a set
of items. Items that are highly correlated will share a lot of variance.
• Communality (also called h2) is a definition of common variance that ranges
between 0 and 1. Values closer to 1 suggest that extracted factors explain
more of the variance of an individual item.
• Unique variance is any portion of variance that’s not common. There are
two types:
• Specific variance: is variance that is specific to a particular item.
• Error variance: comes from errors of measurement and basically anything
unexplained by common or specific variance.
https://stats.idre.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor-analysis/
89
Exploratory Factor Analysis (EFA)
• There are two types of Factor Analysis.
• Exploratory Factor Analysis (EFA) aims to group together and
summarise variables which are correlated and can therefore
identify possible underlying latent variables which cannot be
measured directly
• Confirmatory Factor Analysis (CFA) tests theories about latent
factors. Confirmatory Factor Analysis is performed using additional
SPSS software and is beyond the scope of stats support but EFA
is commonly used in disciplines such as Psychology and can be
found in standard textbooks.
• Exploratory Factor Analysis and Principal Component
Analysis are very similar. The main differences are:
• PCA uses all the variance in the variables analysed whereas EFA
EFA (left) and CFA (right).
uses only the common (shared) variance between the variables
• EFA aims to identify underlying latent variables (factors) rather
than just reduce the number of variables
http://www.statisticshell.com/docs/factor.pdf
https://www.mailman.columbia.edu/research/population-health-methods/exploratory-factor-analysis 90
http://www.en.globalstatistik.com/exploratory-factor-analysis-vs-confirmatory-factor-analysis/
Video: https://slideplayer.com/slide/7824705/ and https://www.youtube.com/watch?v=Q2JBLuQDUvI
Non-Linear Models
Non-Linear Models
In statistics, nonlinear regression is a form of regression analysis in which observational data are
modelled by a function which is a non-linear combination of the model parameters and depends on
one or more independent variables. The data are fitted by a method of successive approximations.
• A function on the real numbers is called a step function if it can be written as a finite linear
combination of indicator functions of intervals. Informally speaking, a step function is a
piecewise constant function having only finitely many pieces.
• A generalized additive model is a generalized linear model in which the linear predictor
depends linearly on unknown smooth functions of some predictor variables, and interest
focuses on inference about these smooth functions.
More info: https://www.amazon.ca/Nonlinear-Regression-Analysis-Its-Applications/dp/0470139005
92
https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/
https://www.youtube.com/watch?v=sKrDYxQ9vTU
Support Vector Machines (SVM)
Support Vector Machines (SVM)
SVM is a classification technique that is listed under supervised learning models in Machine
Learning. In layman’s terms, it involves finding the hyperplane (line in 2D, plane in 3D and
hyperplane in higher dimensions. More formally, a hyperplane is n-1 dimensional subspace of an n-
dimensional space) that best separates two classes of points with the maximum margin. Essentially,
it is a constrained optimization problem where the margin is maximized subject to the constraint that
it perfectly classifies the data (hard margin).
• The data points that kind of “support” this hyperplane on either sides
are called the “support vectors”. In the picture, the filled blue circle
and the two filled squares are the support vectors.
• For cases where the two classes of data are not linearly separable,
the points are projected to an exploded (higher dimensional) space
where linear separation may be possible.
• A problem involving multiple classes can be broken down into
multiple one-versus-one or one-versus-rest binary classification
problems.
• VM can be used for linearly separable as well as non-linearly separable data. Linearly separable data is the hard margin whereas non-linearly
separable data poses a soft margin.
• SVMs provide compliance to the semi-supervised learning models. It can be used in areas where the data is labelled as well as unlabelled. It
only requires a condition to the minimization problem which is known as the Transductive SVM.
• Feature Mapping used to be quite a load on the computational complexity of the overall training performance of the model. However, with the
help of Kernel Trick, SVM can carry out the feature mapping using simple dot product.
Disadvantages
• SVM is incapable of handling text structures. This leads to loss of sequential information and thereby, leading to worse performance.
• Vanilla SVM cannot return the probabilistic confidence value that is similar to logistic regression. This does not provide much explanation
as confidence of prediction is important in several applications.
• Choice of the kernel is perhaps the biggest limitation of the support vector machine. Considering so many kernels present, it becomes
difficult to choose the right one for the data.
• Cronbach’s alpha is computed by correlating the score for each scale item with the total score for each
observation (usually individual survey respondents or test takers), and then comparing that to the
variance for all individual item scores
• The resulting α coefficient of reliability ranges from 0 to 1 in providing this overall assessment of a
measure’s reliability. If all of the scale items are entirely independent from one another (i.e., are not
correlated or share no covariance), then α = 0; and, if all of the items have high co-variances, then α will
approach 1 as the number of items in the scale approaches infinity. In other words, the higher
the α coefficient, the more the items have shared covariance and probably measure the same underlying
concept.
http://www.statisticshell.com/docs/factor.pdf
100
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm
https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/
Cronbach’s Alpha – Guidelines for acceptability
• 70% generally cited as “acceptable”.
• Nunnally’s guidelines
• 0.70 minimum acceptable for exploratory research
• 0.80 minimum acceptable for basic research
• 0.90 or higher for data where there is applied (operational) scenarios
• Assumptions
• Assumes uni-dimensionality (All items measure only single dimension in all data)
• Tested through factor analysis
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.463.428&rep=rep1&type=pdf 101
https://slideplayer.com/slide/5280162/
Reliability Statistics and Mathematics
103
Typical RM Assessment Techniques in Lifecycle Phases
Key
• A – applicable
• L - only very limited data may be available
during this lifecycle phase, reducing
effectiveness of the method.
• M - applicable where modifications or
significant changes to the product or
system design occur.
104
Reliability Statistics
Fundamental Reliability Statistical equations & definitions
• The most common and fundamental statistical equations and definitions* used in reliability
engineering and life data analysis are:
• Random Variables
• Reliability function
• Median Life
• Lifetime distributions
107
https://www.reliableplant.com/Read/18693/reliability-engineering-plant
The 3 Categories of Reliability Statistics
• In general, most problems in reliability engineering deal with quantitative measures, such as the
time-to-failure of a component, or qualitative measures, such as whether a component is defective
or non-defective.
• The methods used to quantify reliability are the mathematics of probability and statistics.
• Reliability statistics can be broadly divided into the treatment of discrete functions, continuous functions
and point processes.
• For example, a switch may either work or not work when selected or a pressure vessel may pass or fail a test—these
situations are described by discrete functions. In reliability we are often concerned with two-state discrete systems,
since equipment is in either an operational or a failed state.
• Continuous functions describe those situations which are governed by a continuous variable, such as time or
distance travelled.
• The statistics of point processes are used in relation to repairable systems, when more than one failure can occur in a
time continuum.
108
Types of Reliability Statistics
Reliability statistics are divided into the treatment of:
• Discrete Functions
• Continuous Functions
• Point Processes
109
A
Discrete Functions
Discrete Functions
• For example, a switch may either work or not work when selected or a pressure vessel may pass or fail
a test—these situations are described by discrete functions.
• In reliability we are often concerned with two-state discrete systems, since equipment is in either an
operational or a failed state.
111
A
Continuous Functions
Continuous Functions
113
Reliability Continuous Distribution Functions
• By far the most widely used ‘model’ of the
nature of variation is the mathematical function
known as the normal (or Gaussian) distribution.
The normal data distribution pattern occurs in
many natural phenomena, such as human
heights, weather patterns, and so on.
• The lognormal distribution is a more versatile
distribution than the normal as it has a range of
shapes, and therefore is often a better fit to
reliability data, such as for populations with
wear-out characteristics. Also, it does not have
the normal distribution’s disadvantage of
extending below zero to –∞.
• The exponential distribution describes the
situation wherein the hazard rate is constant. A
Poisson process generates a constant hazard
rate. This is an important distribution in reliability
work, as it has the same central limiting
relationship to life statistics as the normal
distribution has to non-life statistics. It describes
the constant hazard rate situation.
116
Examples of Continuous Distribution Functions (cdf)
• Normal (or Gaussian)
• Lognormal
• Exponential
• Gamma
• X2 (Chi-Square)
• Weibull
Cumulative distribution function for the Exponential distribution
• Extreme Value
117
Note: Refer to prescribed handbook, pages 33-41 for description and application examples of each of the function types
Series of Events (Point Processes)
Point Processes
• The statistics of point processes are used in relation to repairable systems, when more than one failure can
occur in a time continuum.
• Situations in which discrete events occur randomly in a continuum (e.g. time) cannot be truly represented by a
single continuous distribution function. Failures occurring in repairable systems, aircraft accidents and vehicle
traffic flow past a point are examples of series of discrete events. These situations are called stochastic point
processes. They can be analysed using the statistics of event series (see later slide).
• Homogeneous Poisson process (HPP): A HPP is a stationary point process, since the distribution of the
number of events in an interval of fixed length does not vary, regardless of when (where) the interval is
sampled (e.g. events occur randomly and at a constant average rate).
• Non-homogeneous Poisson process (NHPP): NHPP is where the point process is non-stationary (rate of
occurrence is a function of time), so that the distribution of the number of events in an interval of fixed length
changes as x increases. Typically, the discrete events (e.g. failures) might occur at an increasing or
decreasing rate.
• A HPP describes a sequence of independently and identically exponentially distributed (IIED) random
variables. A NHPP describes a sequence of random variables which is neither independently nor
identically distributed.
119
See handbook, pages 61 – 64 for more info and examples
HHP vs NHPP Example
120
See handbook, pages 61 – 64 for more info and examples
Series of Events Analysis Method
• Trend Analysis (Time Series Analysis)
• Super-imposed Processes: If a number of separate stochastic point process combine to form an overall
process (for example, failure processes of individual components (or sockets) in a system).
https://www.slideshare.net/mirkokaempf/from-events-to-networks-time-series-analysis-on-scale 121
Example: https://www.youtube.com/watch?v=ztvQQlGpL6Y
Reliability Scenario Modelling - Software
• Scenario-Based Reliability Analysis (SBRA): A reliability model, and a reliability analysis technique
for component-based software.
• Using scenarios of component interactions, a probabilistic model named Component-Dependency
Graph (CDG) is constructed. Based on CDG, a reliability analysis algorithm is developed to analyse
the reliability of the system as a function of reliabilities of its architectural constituents.
• The proposed approach has the following benefits:
• It is used to analyse the impact of variations and uncertainties in the reliability of individual components, subsystems, and
links between components on the overall reliability estimate of the software system. This is particularly useful when the
system is built partially or fully from existing off-the-shelf components.
• It is suitable for analysing the reliability of distributed software systems because it incorporates link and delivery channel
reliabilities.
• The technique is used to identify critical components, interfaces, and subsystems; and to investigate the sensitivity of the
application reliability to these elements.
• The approach is applicable early in the development lifecycle, at the architecture level. Early detection of critical
architecture elements, those that affect the overall reliability of the system the most, is useful in delegating resources in
later development phases.
Example: https://www.researchgate.net/publication/3152734_A_Scenario-Based_Reliability_Analysis_Approach_for_Component-
122
Based_Software/link/0f31752e28167e0e6d000000/download
Component-Dependency Graph (CDG) Example
Example: https://www.researchgate.net/figure/An-example-of-HPU-aware-dependency-graph-A-component-box-describes-a-software-and-
123
its_fig3_262213451
Dealing with Multiple Reliability Engineering Scenarios
Methods Used for multiple Scenarios
• Monte-Carlo Simulation
• Markov Analysis
• Petri-Nets
https://systemreliability.wordpress.com/2017/05/12/accelerated-monte-carlo-system-reliability-analysis-through-machine-learning-based-surrogate-models- 125
of-network-connectivity/
Worst Case Scenario Analysis – complex reliability modelling
Worst-case circuit analysis (WCCA or WCA) is a cost-effective
means of screening a design to ensure with a high degree of
confidence that potential defects and deficiencies are identified and
eliminated prior to and during test, production, and delivery.
It is a quantitative assessment of the equipment performance,
accounting for manufacturing, environmental and aging effects. In
addition to a circuit analysis, a WCCA often includes stress and
derating analysis, failure modes and effects criticality (FMECA) and
reliability prediction (MTBF).
The specific objective is to verify that the design is robust
enough to provide operation which meets the system
performance specification over design life under worst-case
conditions and tolerances (initial, aging, radiation, temperature,
etc.).
Stress and de-rating analysis is intended to increase reliability by
providing sufficient margin compared to the allowable stress limits. Source: https://en.wikipedia.org/wiki/Worst-case_circuit_analysis
https://www.youtube.com/watch?v=0F0QoMCSKJ4
129
https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/
Bayes Rule underpinning Bayesian Methods
• Probabilistic models describes data that can be observed from a system.
• Mathematics of probability theory is used to express all forms of uncertainty and noise associated
with the model.
• Bayes rule of inverse probability creates the ability to infer unknown quantities, make predictions
and adapt models.
Advantages:
• Coherent.
• Conceptually straightforward.
• Modular.
• Often good performance.
Source: Bayesian Modelling. Z Ghahramani, 2012 and https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
130
Reference Book: Bayesian Reliability. MS Hamada et al.
Bayes Theorem
Bayes Theorem comes into effect when multiple events Ai form an exhaustive set with another
event B.
Deducing Bayes Equation:
Source: https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
131
Bayesian Inference
• A primary goal of Bayesian inference is summarizing available information about unknown
parameters that define statistical models through the specification of probability density
functions.
• “Unknown parameters that define statistical models” refers to things like failure probabilities or mean system
lifetimes; they are the parameters of interest.
• “Available information” normally comes in the form of test data, experience with related
systems, and engineering judgment.
For more information see Reference Book: Bayesian Reliability. MS Hamada et al 132
Also see: https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
Bayesian Density Function Examples
Prior densities
Sampling densities
Posterior densities
Predictive densities
For many reasons this is unsatisfactory. One reason is that it lacks proper theoretical justification from a probabilistic perspective: why maximum
likelihood? Why just point estimates? Using MLE ignores any uncertainty that we may have in the proper weight values. From a practical
standpoint, this type of training is often susceptible to overfitting, as NNs often do.
One partial fix for this is to introduce regularization. From a Bayesian perspective, this is equivalent to inducing priors on the weights (say
Gaussian distributions if we are using L2 regularization). Optimization in this case is akin to searching for MAP estimators rather than MLE.
Again from a probabilistic perspective, this is not the right thing to do, though it certainly works well in practice.
The correct (i.e., theoretically justifiable) thing to do is posterior inference, though this is very challenging both from a modelling and
computational point of view. BNNs are neural networks that take this approach. In the past this was all but impossible, and we had to resort to
poor approximations such as Laplace’s method (low complexity) or Markov Chain Monte Carlo (MCMC) (long convergence, difficult to
diagnose). However, lately there have been interesting results on using variational inference to do this [1], and this has sparked a great deal of
interest in the area.
BNNs are important in specific settings, especially when uncertainty issues are the biggest concern. Some examples of these cases are
decision making systems, (relatively) smaller data settings, Bayesian Optimization, model-based reinforcement learning and others.
The random variables or inputs are modelled on the basis of probability distributions such as normal,
log normal, etc. Different iterations or simulations are run for generating paths and the outcome is
arrived at by using suitable numerical computations.
Monte Carlo Simulation is the most tenable method used when a model has uncertain parameters or
a dynamic complex system needs to be analysed. It is a probabilistic method for modelling risk in a
system.
The method is used extensively in a wide variety of fields such as physical science, computational
biology, statistics, artificial intelligence, and quantitative finance. It is pertinent to note that Monte
Carlo Simulation provides a probabilistic estimate of the uncertainty in a model. It is never
deterministic.
137
Monte Carlo Methods
• Simple Monte Carlo
• Rejection Sampling
• Importance Sampling
https://www.palisade.com/images3/product/risk/en/Distributions_monteCarloSim.jpg 138
Common probability distributions used in MC
• Normal Or “bell curve”: The user simply defines the mean or expected value and a standard deviation to describe the
variation about the mean. Values in the middle near the mean are most likely to occur. It is symmetric and describes many
natural phenomena such as people’s heights. Examples of variables described by normal distributions include inflation rates
and energy prices.
• Lognormal: Values are positively skewed, not symmetric like a normal distribution. It is used to represent values that don’t
go below zero but have unlimited positive potential. Examples of variables described by lognormal distributions include real
estate property values, stock prices, and oil reserves.
• Uniform: All values have an equal chance of occurring, and the user simply defines the minimum and maximum. Examples
of variables that could be uniformly distributed include manufacturing costs or future sales revenues for a new product.
• Triangular: The user defines the minimum, most likely, and maximum values. Values around the most likely are more likely
to occur. Variables that could be described by a triangular distribution include past sales history per unit of time and inventory
levels.
• PERT: The user defines the minimum, most likely, and maximum values, just like the triangular distribution. Values around
the most likely are more likely to occur. However values between the most likely and extremes are more likely to occur than
the triangular; that is, the extremes are not as emphasized. An example of the use of a PERT distribution is to describe the
duration of a task in a project management model.
• Discrete: The user defines specific values that may occur and the likelihood of each. An example might be the results of a
lawsuit: 20% chance of positive verdict, 30% change of negative verdict, 40% chance of settlement, and 10% chance of
mistrial.
https://www.palisade.com/risk/monte_carlo_simulation.asp 139
Methodology
• Monte Carlo simulation performs risk analysis by building models of possible results by substituting a range of
values—a probability distribution—for any factor that has inherent uncertainty. It then calculates results over
and over, each time using a different set of random values from the probability functions. Depending upon the
number of uncertainties and the ranges specified for them, a Monte Carlo simulation could involve thousands
or tens of thousands of recalculations before it is complete. Monte Carlo simulation produces distributions of
possible outcome values.
• By using probability distributions, variables can have different probabilities of different outcomes occurring.
Probability distributions are a much more realistic way of describing uncertainty in variables of a risk analysis.
• During a Monte Carlo simulation, values are sampled at random from the input probability distributions. Each
set of samples is called an iteration, and the resulting outcome from that sample is recorded. Monte Carlo
simulation does this hundreds or thousands of times, and the result is a probability distribution of possible
outcomes. In this way, Monte Carlo simulation provides a much more comprehensive view of what may
happen. It tells you not only what could happen, but how likely it is to happen.
• An enhancement to Monte Carlo simulation is the use of Latin Hypercube sampling, which samples more
accurately from the entire range of distribution functions.
140
Examples of Use
Example: https://www.youtube.com/watch?v=ohfUaDdJyzc
141
https://www.youtube.com/watch?v=0ikk_VEBkJw&vl=en
Advantages of MC
• Probabilistic Results: Results show not only what could happen, but how likely each outcome is.
• Graphical: Because of the data a Monte Carlo simulation generates, it’s easy to create graphs of
different outcomes and their chances of occurrence. This is important for communicating findings
to other stakeholders.
• Sensitivity Analysis: With just a few cases, deterministic analysis makes it difficult to see which
variables impact the outcome the most. In Monte Carlo simulation, it’s easy to see which inputs
had the biggest effect on bottom-line results.
• Scenario Analysis: In deterministic models, it’s very difficult to model different combinations of
values for different inputs to see the effects of truly different scenarios. Using Monte Carlo
simulation, analysts can see exactly which inputs had which values together when certain
outcomes occurred. This is invaluable for pursuing further analysis.
• Correlation of Inputs: In Monte Carlo simulation, it’s possible to model interdependent
relationships between input variables. It’s important for accuracy to represent how, in reality,
when some factors goes up, others go up or down accordingly.
https://www.palisade.com/risk/monte_carlo_simulation.asp
142
Example:
Markov Analysis
Introduction
• When Markov chains are used in reliability analysis, the process usually represents the various
stages (states) that a system can be in at any given time.
• The states are connected via transitions that represent the probability, or rate, that the system will
move from one state to another during a step, or a given time.
• When using probabilities and steps the Markov chain is referred to as a discrete Markov chain,
while a Markov chain that uses rate and the time domain is referred to as a continuous Markov
chain.
Graph shows the results. From the plot, we can also determine that the
probabilities of being in a state reach steady-state after about 6 steps.
Because we are no longer performing analysis using fixed probabilities and a fixed step, we are no longer able
to simply multiply a state probability vector with a transition matrix in order to obtain new state probabilities after
a given step.
Continuous Markov chains are often used for system availability/reliability analyses, as it has the ability to
designate one or more states as unavailable states. This allows for the calculation of both availability and
reliability of the system.
• Availability is calculated as the mean probability that the system is in a state that is not an unavailable state.
• Reliability is calculated in the same manner as availability, with the additional restriction that all transitions leaving any
unavailable state are considered to have a transition rate of zero.
146
Example
Assume you have a system composed of two generators. The system can be in one of
three states:
• Both generators are operational
• One generator is operational and the other is under repair
• Both generators are under repair. This is an unavailable state.
The system starts in the state in which both generators are operational. We know that
the failure rate of a generator is 1 per 2,000 hours, and the repair rate is 1 per 200 hours.
Therefore:
• The transition rate from the state in which both generators are operational to the state where only one is
operational is 1 per 1,000 hours.
• The transition rate from the state in which one generator is operational to the state where both generators
are operational is 1 per 200 hours.
• The transition rate from the state in which one generator is operational to the state where both generators
are under repair is 1 per 2,000 hours.
• The transition rate from the state in which both generators are under repair to the state where one generator
is operational is 1 per 100 hours.
We would like to know the mean availability of our system after 20,000 hours for all three
states so that we can estimate our output based on time spent at full, half and zero
generator capacity.
From the Mean Probability column, we can see that the system is expected to be fully operational
82.8% of the time, half operational 16.4% of the time, and non-operational 0.8% of the time.
From the Point Probability (Av) column, we can get the point probability of being in a state when all
transitions are considered. From the Point Probability (Rel) column, we can get the point probability of
being in a state if we assume that there is no return from unavailable states, or in other words we are
assuming no repair once the system has entered an unavailable (failed) state. Using the "non-repair"
assumption, there is only an 18.0% chance that the system would still be fully operational, a 3.3%
chance that it would be half operational and a 78.7% chance that it would be non-operational.
Example: http://reliawiki.org/index.php/Markov_Diagrams
147
Regularly used Markov Chain Methods
• Markov
• Gibbs Sampling
• Metropolis Algorithm
• Metropolis-Hastings Algorithm
• Hybrid Monte Carlo
A
148
Advantages & Disadvantages of Markov Chain Methods
• Markov analysis has the advantage of being an analytical method which means that the reliability parameters for the
system are calculated in effect by a formula. This has the considerable advantages of speed and accuracy when
producing results. Speed is especially useful when investigating many alternative variations of design or exploring a
range of sensitivities. In contrast accuracy is vitally important when investigating small design changes or when the
reliability or availability of high integrity systems are being quantified. Markov analysis has a clear advantage over MCS
in respect of speed and accuracy since MCS requires longer simulation runs to achieve higher accuracy and, unlike
Markov analysis, does not produce an “exact” answer.
• As in the case of applying MCS, Markov analysis requires great care during the model building phase since model
accuracy is all-important in obtaining valid results. The assumptions implicit in Markov models that are associated with
“memory-lessness” and the Exponential distribution to represent times to failure and repair provide additional
constraints to those within MCS. Markov models can therefore become somewhat contrived if these implicit
assumptions do not reflect sufficiently well the characteristics of a system and how it functions in practice. In order to
gain the benefits of speed and accuracy that it can offer, Markov analysis depends to a greater extent on the
experience and judgement of the modeller than MCS. Also, whilst MCS is a safer and more flexible approach, it does
not always offer the speed and accuracy that may be required in particular system studies.
https://egertonconsulting.com/markov-analysis-brief-introduction/?doing_wp_cron=1575035940.5098938941955566406250
149
Petri Nets
Introduction
A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never
between places or between transitions. The places from which an arc runs to a transition are called the input
places of the transition; the places to which arcs run from a transition are called the output places of the
transition.
Graphically, places in a Petri net may contain a discrete number of marks called tokens. Any distribution of
tokens over the places will represent a configuration of the net called a marking. In an abstract sense relating to
a Petri net diagram, a transition of a Petri net may fire if it is enabled, i.e. there are sufficient tokens in all of its
input places; when the transition fires, it consumes the required input tokens, and creates tokens in its output
places. A firing is atomic, i.e. a single non-interruptible step.
Unless an execution policy is defined, the execution of Petri nets is nondeterministic: when multiple transitions
are enabled at the same time, they will fire in any order.
Since firing is nondeterministic, and multiple tokens may be present anywhere in the net (even in the same
place), Petri nets are well suited for modelling the concurrent behaviour of distributed systems.
https://en.wikipedia.org/wiki/Petri_net
151
More Info: https://slideplayer.com/slide/3289096/
Example
Source: https://www.researchgate.net/figure/Petri-net-model-of-an-XML-firewall-with-one-application-and-one-web-service_fig2_220885110
152
Life Data Analysis Concepts (Weibull Analysis)
Life Data Analysis
• Commonly referred to as Weibull Analysis.
• Reliability Life Data Analysis refers to the study and modelling of observed product lives. Life data
can be lifetimes of products in the marketplace, such as the time the product operated successfully
or the time the product operated before it failed. These lifetimes can be measured in hours, miles,
cycles-to-failure, stress cycles or any other metric with which the life or exposure of a product can
be measured.
• All such data of product lifetimes can be encompassed in the term life data or, more specifically,
product life data. The subsequent analysis and prediction are described as life data analysis.
• Life data analysis requires the practitioner to:
Eta (η) represents the
• Gather life data for the product. characteristic life of an item,
defined as the time at which
• Select a lifetime distribution that will fit the data and model 63.2% of the population has
the life of the product. failed. The shape
parameter, beta (β), is the
• Estimate the parameters that will fit the distribution to the data. slope of the best-fit line
through the data points on a
• Generate plots and results that estimate the life characteristics Weibull plot.
of the product, such as the reliability or mean life.
http://reliawiki.org/index.php/Introduction_to_Life_Data_Analysis 154
Life Data Classification
Types of Life Data
• Complete
• Censored
• Right Censored (Suspended)
• Interval Censored
• Left-Censored
Parameter Estimation: In order to fit a statistical model to a life data set, the
analyst estimates the parameters of the life distribution that will make the
function most closely fit the data. The parameters control the scale, shape and
location of the pdf function.
For example, in the 3-parameter Weibull model (shown on the right), the scale
parameter, η , defines where the bulk of the distribution lies. The shape
parameter, β, defines the shape of the distribution and the location parameter,
Ɣ, defines the location of the distribution in time.
156
Ranking of Data
Ranking of Data
• Mean Rank: Mean ranks are based on the distribution-free model and are used mostly to plot
symmetrical statistical distributions, such as the normal.
• Median Rank: Median ranking is the method most frequently used in probability plotting, particularly if
the data are known not to be normally distributed. Median rank can be defined as the cumulative
percentage of the population represented by a particular data sample with 50% confidence.
• Cumulative Binomial Method for Median Ranks
• Algebraic Approximation of the Median Rank: When neither software nor tables are available or when
the sample is beyond the range covered by the available tables the approximation formula, known as
Benard’s approximation, can be used.
159
Example: https://data-flair.training/blogs/seo-ranking-factors/
Ranking of Censored Data
• Survival analysis is a type of semi-supervised ranking task where the target output (the survival
time) is often right-censored.
• Utilizing this information is a challenge because it is not obvious how to correctly incorporate these
censored examples into a model.
• Three categories of loss functions, namely partial likelihood methods, rank methods, and a new
classification method based on a Wasserstein metric (WM) and the non-parametric Kaplan Meier
estimate of the probability density to impute the labels of censored examples, can take advantage
of this information.
• The proposed method* allows a model that predict the probability distribution of an event.
160
*Example: https://arxiv.org/pdf/1806.01984.pdf
Concept of Rank Regression
• The rank regression method for parameter estimation, also known as the least squares method. This is, in
essence, a more formalized method of the manual probability plotting technique, in that it provides a
mathematical method for fitting a line to plotted failure data points.
• The x-axis coordinates represent the failure times, while the y-axis coordinates represent unreliability
estimates. These unreliability estimates are usually obtained via median ranks, hence the term rank
regression.
• Least squares, or least sum of squares, regression
requires that a straight line be fitted to a set of data
points, such that the sum of the squares of the distance
of the points to the fitted line is minimized.
This minimization can be performed in either the vertical
or horizontal direction. If the regression is on the x-axis,
then the line is fitted so that the horizontal deviations
from the points to the line are minimized. If the
regression is on the y-axis, then this means that the
distance of the vertical deviations from the points to the
line is minimized.
161
https://www.weibull.com/hotwire/issue10/relbasics10.htm
Confidence Bounds
Confidence Bounds for Life Data Analysis
• Estimating the precision of an estimate is an important concept in the field of
reliability engineering, leading to the use of confidence intervals.
163
Maximum Likelihood Estimation ( MLE)
Maximum Likelihood Estimation (MLE)
• The Maximum likelihood estimation (MLE) method is
considered to be one of the most robust parameter estimation
techniques.
• Maximum likelihood estimation endeavours to find the
most "likely" values of distribution parameters for a set of
data by maximizing the value of what is called the
"likelihood function." This likelihood function is largely
based on the probability density function (pdf) for a given
distribution.
• The graphic gives an example of a likelihood function surface
plot for a two-parameter Weibull distribution. Thus, the "peak"
of the likelihood surface function corresponds to the values of
the parameters that maximize the likelihood function, i.e. the
MLE estimates for the distribution's parameters.
165
More info: https://weibull.com/hotwire/issue9/relbasics9.htm
Managing Variations in Engineering
• Variation is inherent in all manufacturing processes, and designers should understand the nature and
extent of possible variation in the parts and processes they specify. They should know how to measure
and control this variation, so that the effects on performance and reliability are minimized.
• Variation also exists in the environments that engineered products must withstand. Temperature,
mechanical stress, vibration spectra, and many other varying factors must be considered.
• Statistical methods provide the means for analysing, understanding and controlling variation (also
known as “Statistical Process Control (SPC)” – will be covered in later modules in detail.
It is not necessary to apply statistical methods to understand every engineering problem, since many
are purely deterministic or easily solved using past experience or information available in sources such
as data books, specifications, design guides, and in known physical relationships such as Ohm’s law.
169
Discrete Variation
Types of Discrete Variation
• The Binomial Distribution: The binomial distribution describes
a situation in which there are only two outcomes, such as pass
or fail, and the probability remains the same for all trials. (Trials
which give such results are called Bernoulli trials.)
• F-Test
• Chi-square Test
• Non-Parametric Statistics – do not assume that the data or populating have any characteristic
structure (Numerous procedures – statistical study in its own right!)
• Correlation (Spearman’s rank)
• Non-parametric Regression
• ANOVA
• Chi-Square
• Resampling methods
172
Tutorial Video: https://www.youtube.com/watch?v=3bcYLj11uME
Examples
173
https://www.docsity.com/en/probability-cheat-sheet/4176747/
Correlation
174
Non-Parametric Inferential Methods
• Methods have been developed for measuring and
comparing statistical variables when no assumption is
made as to the form of the underlying distributions.
These are called non-parametric (or distribution-free)
statistical methods.
• They are only slightly less powerful than parametric
methods in terms of the accuracy of the inferences
derived for assumed normal distributions.
• However, they are more powerful when the
distributions are not normal.
• They are also simple to use. Therefore they can be
very useful in reliability work provided that the data
being analysed are independently and identically
distributed (IID).
178
Statistical Tutorials
• Self-paced statistical learning:
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
http://www.statstutor.ac.uk/
http://onlinestatbook.com/
• Probabilities in Statistics
https://revisionmaths.com/gcse-maths-revision/statistics-handling-data/probability
179
Statistical Software
• You get R for free from http://cran.us.r-project.org/. Typically it installs with a click.
• You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.
180
A