0% found this document useful (0 votes)
41 views180 pages

UJ Module Lecture 2 Rev2

The document outlines the concepts of reliability mathematics and modeling, focusing on probability, statistical analysis, and methods for evaluating the probability of failure in engineering contexts. It discusses various statistical methodologies, including parametric and non-parametric tests, and emphasizes the importance of understanding variables and sample sizes in reliability engineering. Additionally, it covers the use of statistical distributions and methods such as Monte Carlo simulation and Bayesian methods to assess and manage reliability risks.

Uploaded by

Mapitso Makena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views180 pages

UJ Module Lecture 2 Rev2

The document outlines the concepts of reliability mathematics and modeling, focusing on probability, statistical analysis, and methods for evaluating the probability of failure in engineering contexts. It discusses various statistical methodologies, including parametric and non-parametric tests, and emphasizes the importance of understanding variables and sample sizes in reliability engineering. Additionally, it covers the use of statistical distributions and methods such as Monte Carlo simulation and Bayesian methods to assess and manage reliability risks.

Uploaded by

Mapitso Makena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reliability Management – Lecture 2

Reliability Mathematics and Modelling

03 August 2020
Module Index
• Introduction 4
• Concepts of Reliability 6
• Probability of Failure 10
• Statistical Analysis 17
• Base Methodology and Concepts 18
• Mathematics and Statistical Models 28
• Statistical Methods 29
• Analysis Fundamental Approach – A Quick Recap 37
• Statistical Data Method Considerations 56
• Most used Statistical Methods 62
• Linear Regression 67
• Statistical Classification 69
• Resampling Methods 76
• Sub-set Selection 79
• Shrinkage 83
• Dimension Reduction 85
• Exploratory Factor Analysis (EFA) 88
• Non-Linear Models 91
• Support Vector Machines (SVM) 93
• Unsupervised Learning 96
• Reliability of Scales (Cronbach’s Alpha) 99

2
Module Index
• Reliability Statistics and Mathematics 102
Reliability Statistics 105
Discrete Functions 110
Continuous Functions 112
Series of Events (Point Processes) 118
Dealing with Multiple Reliability Engineering Scenarios 124
Bayesian Methods 128
Monte Carlo Simulation 136
Markov Analysis 143
Petri Nets 150
• Life Data Analysis Concepts 153
Life Data Classification 155
Ranking of Data 157
Confidence Bounds 162
Maximum Likelihood Estimation (MLE) 164
• Managing Variations in Engineering 166
Discrete Variation 170
• Statistical Handbooks & References 177

3
Introduction

“Reliability is, after all, Engineering in its most practical form.”


JR Schlesinger
Aim of the Module

• Briefly discuss concepts of probability.


• Apply the mathematics relevant to reliability engineering.
• Briefly recap the statistical data analysis base methodology
• Compare and recommend the most suitable statistical analysis option(s)
for reliability engineering scenarios.
• Briefly discuss the most used Statistical distributions and methods in
Reliability Engineering (e.g. Monte Carlo simulation).
• Dealing with variations in Engineering.

5
Concepts of Probability

Methods to compare and recommend the most suitable options


Probability
• Probability is the measure of the likelihood that an event will occur. Probability is quantified as a
number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).
• In inferential statistics, the term ‘null hypothesis’ (H0 ‘H-naught,’ ‘H-null’) denotes that there is no relationship
(difference) between the population variables in question.
• Alternative hypothesis (H1 and Ha) denotes that a statement between the variables is expected to be true.
• The P value (or the calculated probability) is the probability of the event occurring by chance if the null
hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in
deciding whether to reject or retain the null hypothesis.
• If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is
rejected. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.

Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 7
Probability are defined in 2 ways
Definition 1: If an event can occur in N equally likely ways, and if the event with attribute A can happen in n of
these ways, then the probability of A occurring is

The first definition covers the case of equally likely independent events occurring, such as rolling a dice.

Definition 2: If, in an experiment, an event with attribute A occurs n times out of N experiments, then as n
becomes large, the probability of event A approaches n/N, that is,

The second definition covers typical cases in quality control and reliability. If we test 100 items and find that 30
are defective, we may feel justified in saying that the probability of finding a defective item in our next test is
0.30, or 30 %. The probability of 0.30 of finding a defective item in our next test may be considered as our
degree of belief, in this outcome, limited by the size of the sample.

RULES of Probability: See Section 2.4 (Pages 22-24 of prescribed handbook for more information)

Reference Handbook: Probability and Statistics by Example, Volume I (Basic Probability and Statistics). Suhov and Kelbert
8
Example of Rules of Probability: P22-25 in handbook
Tutorial: https://revisionmaths.com/gcse-maths-revision/statistics-handling-data/probability
Probability are also affected by Dependent vs Independent Events
• Two events are independent if the
outcome of one of the events doesn't
affect the outcome of another.

• When the probability of one event


depends on another, the events
are dependent. This means the result of
a previous event will have an impact on
the possible outcome of the next event.

• Probability Trees are used to indicate the


dependence of events on each other. It is
an underlying principle to Markov
Analysis.
Example of probability tree

Example of Rules of Probability: P22-25 in handbook


9
Tutorial: https://revisionmaths.com/gcse-maths-revision/statistics-handling-data/probability
Probability of Failure
The Probability of Failure
• Probability of Failure (PoF)
• Is the likelihood, based on realistic forecasts, that an asset will reach functional failure ("F") at a point in time (usually
within in a particular calendar year), which is expressed along a probability distribution.
• Or the probability that an asset will fail during a particular age interval, given that it survives to enter that age.
• Or the probability that performance will be maintained into the future.

• PoF is established by deterioration models wherein there are two primary methods for graphically
representing the PoF along a curve:
• Degradation Curve - A performance curve, such as the P-F interval.
• Survivor Curve - Usually expressed a survivor curve, with positive skewness or negative skewness.

• Further refinement of the survivor curves is derived from the following analysis.
• Right-Modal Curve ("R" Curves) with negative skewness
• Left-Modal Curve ("L" Curves) with positive skewness
• Symmetrical ("S" Curve) with no skewness
• Original Modal ("O" Curve) with extreme positive skewness

11
Data required for Probability of Failure Analysis
• In order to establish PoF, several pieces of information is required, including:
• typical service life,
• consumed life and remaining life,
• PF interval,
• FMEA.
• Physics of failure (how does the item fail?)
• Some of this information is empirical and some is statistical in nature.

• Listed below are some of the key elements to establish the PoF for a single asset:
• Service Life
• Consumed Life (measured from the tope of a performance curve).
• Remaining Useful Life (measured to the bottom of a performance curve or to functional failure or to complete
failure).

12
It requires an understanding of the Physics of Failure
• The pace at which an asset degrades over
time under normal operating conditions also
plays a role.
• It is an expression of the durability and
robustness of an asset.

13
https://assetinsights.net/Concepts/Peeps_Statistical_Elements_of_Asset_Survivor_Curve.JPG
Left and Right Modal curves (P-F Curve Refinement)
• Key attributes of Right-Modal curves:
• Back-end Loading - These are assets in which the greatest frequency of retirements is after the life
term is reached, which causes the retirement frequency curve to be skewed to the right of the mean. The
majority of the assets in this group will last longer than the average life, but most of them will be retired in a
short period of time after the average life is reached.
• Negative Skewness - The probability distribution has negative skewness with a long tail to the left of the
mean.
• Positive Aging - The longer the asset has been in service, the more likely it is to fail. In other words, the
hazard function increases for larger values of t. This makes intuitive sense, because the longer stuff is
used, the more it wears down. Thus, something that has been in use for a long time will be approaching its
breaking point.

• Key attributes of Left-Modal curves:


• Front-end Loading - These are assets in which the greatest frequency of retirements is prior to the
average service life. Though a minority of the assets will be in service for a long time, the majority of the
assets are retired prior to the average life of the group. see: front-end loading.
• Positive Skewness - A probability distribution curve with a positive skewness. The mean is smaller than the
median and mode of the curve.
• Negative aging - The longer the component has been in service, the less likely it is to fail. Here, the hazard
function decreases for larger values of t. This makes sense when individual components vary in quality:
poorly made components usually fail early, so anything that has been in service for a long time is likely to be
particularly robust and will usually survive even longer. Function decreases for larger values of t.

14
A
Survivor Curve – Often used for Asset renewal decisions

The year in which the probability for renewal of an


asset is deemed to be highest (ie. the probability
of failure PoF is highest). The confidence interval
is greatest. Also sometimes referred to as the
"modal year".

15
https://assetinsights.net/Concepts/Peeps_Statistical_Elements_of_Asset_Survivor_Curve.JPG
Risk & Risk Mitigation
• Probability of Failure is also used to classify risk and
consider risk mitigation factors.
• The Probability of Failure (PoF) and risk of such a
failure is directly affected by various factors, including
the following:
• The quality of maintenance applied to the preservation of the
assets.
• Exposure or protection of the assets from the elements and
other loadings.
• Durability of the materials used in the production of the
assets.
• Effective age of the assets relative to their chronological age.

16
Risk prioritisation is closely coupled to FMEA/FMECA analysis (Discussed in Module 6)
Statistical Analysis

Methods to compare and recommend the most suitable options


Statistical data analysis - Base methodology and concepts
Value of Statistical Data analysis
• Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing
of inferences from the samples to the whole population.
• It covers the Statistical Methods involved in carrying out a study include planning, designing, collecting
data, analysing, drawing meaningful interpretation and reporting of the research findings.
• The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless
data. The results and inferences are precise only if proper statistical tests are used.
• Understanding of the variables is required, e.g. quantitative s qualitative variables and the measures of
central tendency.
• Important to consider sample size – using estimation and/or pre-determined size sets. Also vital to
ensure appropriate representation (sampling bias can be a major failing in research design!)
• Understand impact of statistical errors. Improper statistical methods may result in erroneous
conclusions and may even lead to unethical practice.
• Parametric and non-parametric tests used for data analysis (Hypothesis Testing) and validation of
interpretations.
• Triangulation is when a number of different data sources and methods are compared to confirm findings. It can bring
strength to conclusions or identify areas for further work.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 19
Sample Size
• Calculating the most appropriate sample size is an important step in the research process.
• A larger sample provides a more precise estimate of the ‘real’ situation but the benefits of increased sample size
get smaller as you near the total population.
• Therefore, there is a trade-off between sample precision and considerations of optimal resource
use.

• There are no ‘rules of thumb’ when determining sample size for quantitative research.

• Two main statistics are used to calculate the sample size.


• These are the confidence interval (or margin of error): The confidence interval is the acceptable range in which your
estimate can lie. (For example, if you are carrying out a project which expects to reduce the number of children working
on the street from 75% to 70% you would not want to use a confidence interval of 10% as your estimate would not be
precise enough to detect this change).
• The confidence level: The level of confidence determines how sure you want to be that the actual percentage of samples
chosen, falls within your selected confidence interval. A level of confidence of 95% is commonly used, which means that
there is a 5% chance that the actual percentage of samples chosen will not lie between the confidence interval selected.

Source: 6 Methods of data collection and analysis. The Open University. 20


Variables
• A Variable is a characteristic that varies from one individual member of population or
sample to another individual/sample.
• Variables such as height and weight are measured by some type of scale, convey quantitative information and
is considered quantitative variables.
• Sex and eye colour give qualitative information is considered qualitative variables.

• Quantitative or numerical data are subdivided into discrete and continuous


measurements.
• Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer)
• Continuous data can assume any value.

• A hierarchical scale of increasing precision can be used for observing and recording
the data which is based on categorical, ordinal, interval and ratio scales
• Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be
arranged in any particular order. (E.g. quality of training – good, bad or average)
• Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal
intervals.
• Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval
variable are equally spaced.
• Ratio scales are similar to interval scales, in that equal differences between scale values have equal
quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional
property.

Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 21
Qualitative vs Quantitative Analysis

Although the table above illustrates qualitative and quantitative research as distinct and opposite, in
practice they are often combined or draw on elements from each other. For example, quantitative
surveys can include open-ended questions. Similarly, qualitative responses can be quantified. Qualitative
and quantitative methods can also support each other, both through a triangulation of findings and by
building on each other (e.g., findings from a qualitative study can be used to guide the questions in a
survey).
Source: 6 Methods of data collection and analysis. The Open University. 22
Descriptive vs Inferential Statistics
• Descriptive statistics try to describe the relationship between variables in a sample or population.
• Descriptive statistics provide a summary of data in the form of mean, median and mode.
• Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores.
• Median is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value)
• Mode is the most frequently occurring variable in a distribution.
• The extent to which the observations cluster around a central location is described by the central tendency and the spread towards
the extremes is described by the degree of dispersion.

Negatively and
Positively skewed
distributions

• Inferential statistics use a random sample of data taken from a population to describe and make
inferences about the whole population.
• It is valuable when it is not possible to examine each member of an entire population.
• The purpose is to answer or test the hypotheses. A hypothesis is a proposed explanation for a phenomenon. Hypothesis tests are
thus procedures for making rational decisions about the reality of observed effects.

Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 23
Parametric and Non-Parametric Tests
• Numerical data (quantitative variables) that are normally distributed are analysed with parametric
tests. Two most basic pre-requisites for parametric statistical analysis are:
• The assumption of normality which specifies that the means of the sample group are normally distributed.
• The assumption of equal variance which specifies that the variances of the samples and of their corresponding population
are equal.

• Parametric tests: The parametric tests assume that the data are on a quantitative (numerical) scale,
with a normal distribution of the underlying population. The samples have the same variance
(homogeneity of variances). The samples are randomly drawn from the population, and the
observations within a group are independent of each other. The commonly used parametric tests are
the t-test, analysis of variance (ANOVA) and repeated-measures ANOVA.

• Non-Parametric: If the distribution of the sample is skewed towards one side or the distribution is
unknown due to the small sample size, non-parametric statistical techniques are used. Non-parametric
tests are used to analyse ordinal and categorical data.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 24
Parametric and Non-Parametric Test Equivalents

Parametric tests are those that make


assumptions about the parameters of the
population distribution from which the sample is
drawn. This is often the assumption that the
population data are normally distributed.

Non-parametric tests are “distribution-free” and,


as such, can be used for non-Normal variables.

Source: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 25
Parametric Tests
• t-test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three
circumstances:
• To test if a sample mean (as an estimate of a population mean) differs significantly from a given population mean (this is a one-sample t-test)
• To test if the population means estimated by two independent samples differ significantly (the unpaired t-test).
• To test if the population means estimated by two dependent samples differ significantly (the paired t-test). A usual setting for paired t-test is when
measurements are made on the same subjects before and after a specified intervention.

• The t-test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any
significant difference between the means of two or more groups. In ANOVA, two variances are studied:
• Between-group variability: The between-group (or effect variance) is the result of the intervention imposed (before-after).
• Within-group variability: The variation that cannot be accounted for in the study design. It is based on random differences present in the data
samples.
• These two estimates of variances are compared using the F-test.
• As with ANOVA, Repeated-Measures ANOVA analyses the equality of means of three or more groups.
• A repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time. As the
variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA
in this case is not appropriate because it fails to model the correlation between the repeated measures (i.e. the data violate the ANOVA assumption of
independence).
• Hence, in the measurement of repeated dependent variables, Repeated-Measures ANOVA should be used.

Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 26
Non-Parametric Tests
• When the assumptions of normality are not met, and
the sample means are not normally distributed
parametric tests can lead to erroneous results.
• Non-parametric tests (distribution-free test) are used in
such situations as they do not require the normality
assumption.
• Non-parametric tests may fail to detect a significant
difference when compared with a parametric test. That
is, they usually have less power.
• As is done for the parametric tests, the test statistic is
compared with known values for the sampling
distribution of that statistic and the null hypothesis is
accepted or rejected.
• See source below for brief description of non-
parametric test intent.
Source: Basic statistical tools in research and data analysis. Zulfiqar Ali and S Bala Bhaskar 27
Mathematics and Statistical Models

Discussion of the most used mathematics and statistical models


Statistical Methods
Statistical Methods
Statistics is a vast subject with many techniques and methods!!! You will end up
having to investigate many techniques before making a decision which one is the most appropriate
to use for the problem at hand (or research study).

• Descriptive Statistics
• Frequencies and Percentages
• Means and Standard Deviations (SD)

• Inferential Statistics
• Hypothesis Testing
• Correlation
• T-Tests
• Chi-square
• Logistic Regression
• Prediction
• Confidence Intervals
• Significance Testing

Descriptive Statistics: https://slideplayer.com/slide/12574080/


Reference Books: Statistical Data Analysis – G Cowan 30
Introduction to Statistical Learning – James, Witten, Hastie and Tibshirani
Statistical Analysis Methods most often used
• Descriptive Analysis: Descriptive analysis is an insight into the past. This statistical technique does exactly
what the name suggests -“Describe”. It looks at data and analyses past events and situations for getting an
idea of how to approach the future.

• Regression Analysis: Regression analysis allows modelling the relationship between a dependent variable
and one or more independent variables. In data mining, this technique is used to predict the values, given a
particular dataset.

• Inferential: Aims to test theories about the nature of the world in general (or some part of it) based on
samples of “subjects” taken from the world (or some part of it). That is, use a relatively small sample of data to
say something about a bigger population.

• Predictive: The various types of methods that analyse current and historical facts to make predictions about
future events. In essence, to use the data on some objects to predict values for another object. Accurate
prediction depends heavily on measuring the right variables

• Causal: To find out what happens to one variable when you change another.

31
Statistical Methods most often used
• Mechanistic: Understand the exact changes in variables that lead to changes in other variables for individual
objects. Usually modelled by a deterministic set of equations (physical/engineering science)
• Factor Analysis: Factor analysis is a regression based data analysis technique, used to find an underlying
structure in a set of variables. It goes with finding new independent factors (variables) that describe the
patterns and models of relationships among original dependent variables.
• Dispersion Analysis: Dispersion analysis is not a common method used in data mining, but still has a role in
analysis. Dispersion is the spread to which a set of data is stretched. It is a technique of describing how
extended a set of data is. The measure of dispersion helps data scientists to study the variability of the item(s)
being researched.
• Discriminant Analysis: Discriminant analysis is one of the most powerful classification techniques in data
mining. The discriminant analysis utilizes variable measurements on different groups of items to underline
points (characteristics) that distinguish the groups.
• Time Series Analysis: Time series data analysis is the process of modelling and explaining time-
dependent series of data points. The goal is to draw all meaningful information (statistics, rules and patterns)
from the shape of data.
Reference Handbook: Time Series Analysis and its Applications. Shumway & Stoffer
32
Modern Statistical Data Analysis Methods used
• Artificial Neural Networks (ANN): Often just called a “neural network”, present a brain metaphor for
information processing. These models are biologically inspired computational models. They consist of an
inter-connected group of artificial neurons and process information using a computation approach.

• Decision Trees: The decision tree is a tree-shaped diagram that represents classification or regression
models. It divides a data set into smaller and smaller sub-datasets (that contain instances with similar values)
while at the same time a related decision tree is continuously developed. The tree is built to show how and
why one choice might lead to (or influence) the next, with the help of the branches.

• Evolutionary Programming: Evolutionary programming in data mining is a common concept that combines
many different types of data analysis using evolutionary algorithms. Most popular of them are: genetic
algorithms, genetic programming, and co-evolutionary algorithms.

• Fuzzy Logic: Fuzzy logic is applied to cope with the uncertainty in data mining problems. Fuzzy logic is an
innovative type of many-valued logic in which the truth values of variables are a real number between 0 and 1.
In this term, the truth value can range between completely true and completely false.

33
Summary of Descriptive and Graphical Statistical Representations

34
Examples of Graphical Statistical Representations

Box Plot
Histogram
Pie/Bar Chart

Line Chart

Stacked/Multiple Bar Chart


Means Plot
Scatter Graphs
35
Practical Considerations regarding Statistical Methods
The important points that must be borne in mind when applying statistical methods to engineering
are:
• Real variation is seldom normal.
• The most important variation, as far as reliability is concerned, is usually that observed in the “tails” of the
distribution, where there is inevitably less (or no) data, the data are more uncertain, and this is where
conventional statistical models can be most misleading.
• Variation can change over time, so that the patterns measured at one time might not represent the true
situation at another.
• There might be interaction effects between variables, causing combined effects that can be more significant
than those of individual variations.
• Variation in engineering is usually made or influenced by people. People do not behave in accordance with
any credible mathematical models.
• Most engineering education in statistics covers only the mathematics, and few statisticians understand the
practical aspects of the engineering problems they help to solve. This leads to inappropriate analyses and
conclusions, and to a distrust of statistical methods among engineers.

Source: Practical Reliability Engineering, P.D.T. O’Conner and A. Kleyner 36


Statistical Analysis Fundamental Approach - A Quick Recap
Understand the type of data you will be analysing statistically
1
In order to choose suitable summary
statistics and analysis methods for the
data, it is important to distinguish
between continuous (numerical/ scale)
measurements and categorical
variables.

2
Investigate the data with summary
statistics and charts first – summary
results and charts especially can show
outliers and patterns.
For continuous normally distributed data,
summarise using means and standard
deviations. If the data is skewed or there
are influential outliers, the median
(middle value) and interquartile range
(Upper quartile – lower quartile) are
more appropriate.

Source: Practical Reliability Engineering, P.D.T. O’Conner and A. Kleyner 38


Deciding on the type of Statistical Test
3

39
Consider “Goodness of Fit” of data set
2
In analysing statistical data we need to determine how well the data
“fits” an assumed distribution.
The goodness of fit can be tested statistically, to provide a level of
significance that the null hypothesis (i.e. that the data do fit the
assumed distribution) is rejected.
Goodness-of-fit testing is an extension of significance testing in which
the sample cdf is compared with the assumed true cdf.
A number of methods are available to test how closely a set of data fits
an assumed distribution. As with significance testing, the power of Red line is CDF, blue line is an estimated CDF
(ECDF) – Kolmogorov-Smirnov statistical test
these tests in rejecting incorrect hypotheses varies with the number outcome example
Source:
and type of data available, and with the assumption being tested. The https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

most commonly used methods are:


• X2 (Chi-Squared) Goodness-of-Fit
• The Kolmogorov-Smirnov Test

See pages 59 - 61 of handbook for formulae and examples


40
Example: https://www.researchgate.net/publication/280876940_Use_of_Pearson's_Chi-
Square_for_Testing_Equality_of_Percentile_Profiles_across_Multiple_Populations
Statistical Method Choice for Prediction/Probability Modelling
3

41
Statistical Method Choice for Prediction/Probability Modelling
3

Source: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-
nonparametric-tests

42
Assumption of Normality
Examples of very skewed data (i.e. non-normal)
4
Parametric tests assume that the data follows a particular
distribution e.g for t-tests, ANOVA and regression, the
data needs to be normally distributed.
Parametric tests are more powerful than non-parametric
tests, when the assumptions about the distribution of the
data are true. This means that they are more likely to
detect true differences or relationships that exist.
The tests are quite robust to departures of non-normality
so the data only needs to be approximately normally
distributed.
Plotting a histogram or QQ plot of the variable of interest
will give an indication of the shape of the distribution.
Histograms should peak in the middle and be
approximately symmetrical about the mean. If data is
normally distributed, the points in QQ plots will be close to
the line.
43
Statistical Tests for Normality
4 There are statistical tests for normality such as the Shapiro-Wilk and Kolmogorov-Smirnoff tests but for
small sample sizes (n < 20), the tests are unlikely to detect non-normality and for larger sample sizes (n >
50), the tests can be too sensitive. They are also sensitive to outliers so use histograms (for large
samples) or QQ plots (for small samples).

• Non-parametric tests make no assumptions about the distribution of the data.


• Non-parametric techniques are usually based on ranks or signs rather than the actual data and are usually less powerful
than parametric tests.
• Non-parametric tests can also be used when other assumptions are not met e.g. equality of variance.
• Some statisticians believe that non-parametric tests can be used for small sample sets as it is usually very difficult to
assess normality in such cases.

44
Other Considerations
5
Dependent vs Independent: For most tests, it is assumed that the observations are independent. That is,
the results for one subject are not affected by another.
Examples of data which is not independent are
• repeated measures on the same subject (use the specific tests for this type of experiment) and
• observations over time (check the Durbin Watson test for regression).
Another situation where observations are not independent is when subjects are nested within groups with
a common influence e.g. children within classes who may be influenced by the teacher (use multilevel
modelling to include class as an extra RANDOM factor).
Time series analysis and multilevel modelling allows for non-independent measurements over time but
are much more complex analysis techniques.

https://www.google.co.za/url?sa=i&url=https%3A%2F%2Fwww.expii.com%2Ft%2Fdependent-and-independent-variables

Reference on Confidence Intervals: http://cast.massey.ac.nz/collection_public.html


http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1339793/ 45
Other Considerations
6
Confidence Intervals describe the variability surrounding the sample point estimate (the wider the interval,
the less confident we can be about the estimate of the population mean). In general, all things being
equal, the larger the sample size the better (more precise) the estimate is, as less variation between
sample means is expected.
Confidence intervals give a range of values within which we are confident (in terms of probability) that the
true value of a population parameter lies. A 95% CI is interpreted as 95% of the time the CI would contain
the true value of the population parameter.
The image shows a 95% confidence interval on a normal distribution graph.
The red “tails” are the remaining 5 percent of the interval. Each tail has 2.5
percent (that’s .025 as a decimal). You don’t have to draw a graph when
you’re working with confidence intervals, but it can help you visualize
exactly what you are doing — especially in hypothesis testing. If your
results fall into the red region, then that’s outside of the 95% confidence
level that you, as a researcher, set.

If you have a small sample or if you don’t know the population standard
deviation which in most real-life cases is true), then you’ll find the 95% Image Source: WUSTL.EDU

Confidence Interval with a t-distribution.

Reference on Confidence Intervals: http://cast.massey.ac.nz/collection_public.html


http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1339793/ 46
https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/confidence-interval/
Statistical Confidence
6 Statistical confidence: Confidence is the exact fraction of times the confidence interval will include

the true value, if the experiment is repeated many times.


• The confidence interval is the interval between the upper and lower confidence limits.
• Confidence intervals are used in making an assertion about a population given data from a
sample.
• The SD (Standard Deviation) of the sample means is also called the standard error of the
estimate, and is denoted Sx.

Statistical confidence and engineering confidence must not be


confused!
• Statistical confidence takes no account of engineering or process knowledge
or changes which might make sample data unrepresentative.
• Derived statistical confidence values must always be interpreted in the light of
engineering knowledge, which might serve to increase or decrease our
engineering confidence.

https://greatbrook.com/survey-statistical-confidence-how-many-is-enough/ 47
Confidence Interval Sample
6 The diagram below shows the confidence intervals for 27 samples of babies taken from the same
population. The actual population mean (which is not normally known) is 7.73 lbs. Two of the confidence
intervals do not contain the population mean (don’t overlap 7.73 lbs)

There is a strong relationship


between hypothesis testing and
confidence intervals.
For example, when carrying out a
paired t-test, if the p-value < 0.05,
the 95% confidence interval for the
Population mean paired differences will not contain
= 7.73
0.
However, a p-value just concludes
whether there is significant
evidence of a difference or not.
The confidence interval of the
difference gives an indication of
the size of the difference.

48
Other Considerations
7
Data Sample Size: The larger the sample size, the more likely a significant result is, so for small sample
sizes a huge difference is needed to conclude a significant difference.
For large sample sizes, small differences may be significant but then it is required to check if the difference
is meaningful.

8
Effect Size: An effect size is a measure of the strength or magnitude of the effect of an independent
variable on a dependent variable which helps assess whether a statistically significant result is
meaningful.
For example, for a t-test, the absolute effect size is just the difference between the two groups. A
standardised effect size involves variability and can then be compared to industry standards.

49
Other Considerations
9
Measure of Variance: A measure of variability is a summary statistic that represents the amount of
dispersion in a dataset. While a measure of central tendency describes the typical value, measures of
variability define how far away the data points tend to fall from the centre. We talk about variability in the
context of a distribution of values. A low dispersion indicates that the data points tend to be clustered
tightly around the centre. High dispersion signifies that they tend to fall further away.
Variance is the average squared difference of the values from the mean. Unlike the measure of variability,
the variance includes all values in the calculation by comparing each value to the mean. To
calculate this statistic, you calculate a set of squared differences between the data points and the mean,
sum them, and then divide by the number of observations.

Partial eta-squared is a measure of variance. It represents the proportion of


variance in the dependent variable that is explained by the independent variable.
It also represents the effect size statistic.
The effects sizes given in *Cohen (1988) for the interpretation of the absolute
effect sizes are:
η2 = 0.010 is a small association.
η2 = 0.059 is a medium association.
η2 = 0.138 or larger is a large association.

http://www.utstat.toronto.edu/~brunner/oldclass/378f16/readings/CohenPower.pdf (*Cohen Reference) 50


https://statisticsbyjim.com/basics/variability-range-interquartile-variance-standard-deviation/
Hypothesis Testing
10
• Hypothesis testing is an objective method of making decisions or inferences from sample data
(evidence). Sample data is used to choose between two choices i.e. hypotheses or statements
about a population. Typically this is carried out by comparing what we have observed to what
we expected if one of the statements (Null Hypothesis) was true.
• It is often necessary to determine whether observed differences between the statistics of a
sample and prior knowledge of a population, or between two sets of sample statistics, are
statistically significant or due merely to chance.
• Statistical hypothesis testing is similar to confidence estimation, but instead of asking the
question How confident are we that the population parameter value is within the given limits?
(On the assumption that the sample and the population come from the same distribution), we
ask How significant is the deviation of the sample?
• In statistical hypothesis testing, we set up a null hypothesis, that is, that the two sets of
information are derived from the same distribution. We then derive the significance to which
this inference is tenable. As in confidence estimation, the significance we can attach to the
inference will depend upon the size of the sample.

51
Statistical Hypothesis Testing
10
• Key terms:
• NULL HYPOTHESIS (H0) is a statement about the population & sample data used to decide whether to reject that statement or not. Typically the statement
is that there is no difference between groups or association between variables.
• ALTERNATIVE HYPOTHESIS (H1) is often the research question and varies depending on whether the test is one or two tailed.
• SIGNIFICANCE LEVEL: The probability of rejecting the null hypothesis when it is true, (also known as a Type I error). This is decided by the individual but
is normally set at 5% (0.05) which means that there is a 1 in 20 chance of rejecting the null hypothesis when it is true.
• TEST STATISTIC is a value calculated from a sample to decide whether to accept or reject the null (H0) and varies between tests. The test statistic
compares differences between the samples or between observed and expected values when the null hypothesis is true.
• P-VALUE: the probability of obtaining a test statistic at least as extreme as ours if the null is true and there really is no difference or association in the
population of interest. P-values are calculated using different probability distributions depending on the test. A significant result is when the p-value is less
than the chosen level of significance (usually 0.05).

• Many significance test techniques have been developed for dealing with the many types of situation
which can be encountered.
• Z-Test (test for differences in Means)
• X2 Test for Significance
• F Test (Test for Differences in Variances and Variance Ratio Test)

52
See pages 53 – 54 of handbook for examples
Hypothesis Testing Approach
10
• A statistical hypothesis is an assumption
made by the researcher about the data of
the population collected for any
experiment.
• Statistical Hypothesis Testing can be
categorized into two types :
• Null Hypothesis – Hypothesis testing is carried out
in order to test the validity of a claim or assumption
that is made about the larger population. This claim
that involves attributes to the trial is known as the
Null Hypothesis. The null hypothesis testing is
denoted by H0.
• Alternative Hypothesis – An alternative hypothesis
would be considered valid if the null hypothesis is
fallacious. The evidence that is present in the trial
is basically the data and the statistical
computations that accompany it. The alternative
hypothesis testing is denoted by H1or Ha.

https://data-flair.training/blogs/hypothesis-testing-in-r/ 53
Hypothesis Example
10 Members of a jury have to decide whether a person is guilty or innocent based on evidence presented to
them. If a court case was a hypothesis test, the jury consider the likelihood of innocence given the
evidence and if there’s less than a 5% chance that the person is innocent they reject the statement of
innocence. The null can only be rejected if there is enough evidence to disprove it and the jury do not
know whether the person is really guilty or innocent so they may make a mistake.
Null: The person is innocent
Alternative: The person is not innocent.

In reality, the person is actually Guilty


(null false) or Innocent (null true) but
we can only conclude that there is
evidence to suggest that the null is
false or not enough evidence to
suggest it is false.
A Type I error is equivalent to
convicting an innocent person and is
usually set at 5% (the magic 0.05!).

54
Type I and Type II Hypothesis Errors
10 A statistically significant result cannot prove that a research hypothesis is correct (as this implies 100%
certainty).
Because a p-value is based on probabilities, there is always a chance of making an incorrect conclusion
regarding accepting or rejecting the null hypothesis (H0).
Anytime we make a decision using statistics there are four possible outcomes, with two representing
correct decisions and two representing errors.

• A Type I error is also known as a false positive and occurs


when a researcher incorrectly rejects a true null hypothesis.
This means that your report that your findings are significant
when in fact they have occurred by chance. Reduce risk of
committing a Type I error by using a lower value for probability
(p), but beware to selected p-level too low as it will be less likely
to detect a true difference if it really exists.
• A Type II error is also known as a false negative and occurs
when a researcher fails to reject a null hypothesis which is
really false. Here a researcher concludes there is not a
significant effect, when actually there really is. Ensuring sample
size is large enough to detect a practical difference when one
truly exists.
55
Considerations around Statistical Data Methods

Avoiding some pitfalls


The Mean
The arithmetic mean, more commonly known as “the average,” is the sum of a list of numbers
divided by the number of items on the list.

The mean is useful in determining the overall trend of a data set or providing a rapid snapshot of the
data.

An advantage of the mean is that it’s very easy and quick to calculate.

Pitfalls:
• Taken alone, the mean is a dangerous tool. In some data sets, the mean is also closely related to
the mode and the median (two other measurements near the average).
• In a data set with a high number of outliers or a skewed distribution, the mean simply doesn’t
provide the accuracy you need for a nuanced decision.

57
Standard Deviation
The standard deviation, often represented with the Greek letter sigma, is the measure of a spread of
data around the mean.

A high standard deviation signifies that data is spread more widely from the mean, where a low
standard deviation signals that more data align with the mean.

In a portfolio of data analysis methods, the standard deviation is useful for quickly determining
dispersion of data points.

Pitfall:
• Just like the mean, the standard deviation is deceptive if taken alone. For example, if the data have
a very strange pattern such as a non-normal curve or a large amount of outliers, then the standard
deviation won’t give all the information that may be needed to draw an accurate conclusion.
58
Regression
Regression models the relationships between dependent and explanatory variables, which are
usually charted on a scatterplot.

The regression line also designates whether those relationships are strong or weak.

Regression is commonly taught in statistics courses with applications for science or business in
determining trends over time.

Pitfall:
• Regression is not very nuanced. Sometimes, the outliers on a scatterplot (and the reasons for
them) matter significantly. For example, an outlying data point may represent the input from your
most critical supplier or your highest selling product. The nature of a regression line, however,
tempts you to ignore these outliers. You can have data sets that have the exact same regression
line but include widely different data points.

59
Sample Size Determination

When measuring a large data set or population, like a workforce, it is not always necessary to collect
information from every member of that population – a sample does the job just as well.

The trick is to determine the right size for a sample to be accurate. Using proportion and standard
deviation methods, it is possible to accurately determine the right sample size needed to make the
data collection statistically significant.

Pitfall:
• When studying a new, untested variable in a population, the proportion equations might need to
rely on certain assumptions. However, these assumptions might be completely inaccurate. This
error is then passed along to the sample size determination and then onto the rest of the statistical
data analysis.

60
Hypothesis Testing
Also commonly called t-testing, hypothesis testing assesses if a certain premise is actually true for
your data set or population.

In data analysis and statistics, the result of a hypothesis test is considered statistically significant if
the results couldn’t have happened by random chance. Hypothesis tests are used in everything from
science and research to business and economics.

Pitfalls:
• To be rigorous, hypothesis tests need to watch out for common errors:
• For example, the placebo effect occurs when participants falsely expect a certain result and then perceive
(or actually attain) that result.
• Another common error is the Hawthorne effect (or observer effect), which happens when participants skew
results because they know they are being studied.

61
Most used Statistical Methods
The Most Common Statistical Techniques Used
1. INDEPENDENT T-TEST
2. MANN-WHITNEY TEST
3. PAIRED T-TEST
4. WILCOXON SIGNED RANK TEST
5. ONE-WAY ANOVA
6. KRUSKAL-WALLIS TEST
7. ONE-WAY ANOVA WITH REPEATED MEASURES (WITHIN SUBJECTS)
8. FRIEDMAN TEST
9. TWO-WAY ANOVA
10. CHI-SQUARED TEST
11. ODDS AND RELATIVE RISK
12. CORRELATION
13. PEARSON’S CORRELATION COEFFICIENT
14. RANKED CORRELATION COEFFICIENTS
15. PARTIAL CORRELATION
16. REGRESSION
17. LINEAR
18. LOGISTIC REGRESSION
19. PROPORTIONS TEST (Z-TEST
20. RELIABILITY
21. Interrater reliability
22. Cohen’s Kappa
23. Intra-class Correlation Coefficient
24. Cronbach's alpha (reliability of scales)
25. PRINCIPAL COMPONENT ANALYSIS (PCA)
26. CLUSTER ANALYSIS
27. HIERARCHICAL CLUSTERING
28. K-MEANS CLUSTERING

Also refer to www.statstutor.ac.uk


63
Quick Reference Guide: The Statistics Tutor’s Quick Guide to Commonly Used Statistical Tests
The 10 Statistical Analysis Techniques
1. Linear Regression

2. Classification

3. Resampling Methods

4. Sub-set Selection

5. Shrinkage

6. Dimension Reduction

7. Non-Linear Models

8. Tree-Based Models

9. Support-Vector Machines (SVM)

10. Unsupervised Learning

Also refer to www.statstutor.ac.uk


64
The Statistics Tutor’s Quick Guide to Commonly Used Statistical Tests
Choosing the Techniques
In general, most problems in reliability
engineering deal with quantitative
measures, such as the time-to-failure of a
component, or qualitative measures, such
as whether a component is defective or
non-defective.

https://hbr.org/1971/07/how-to-choose-the-right-forecasting-technique 65
Choosing the Techniques

https://hbr.org/1971/07/how-to-choose-the-right-forecasting-technique 66
Linear Regression
Linear Regression
In statistics, linear regression is a method to predict a target
variable by fitting the best linear relationship between the
dependent and independent variable.

The best fit is done by making sure that the sum of all the
distances between the shape and the actual observations at
each point is as small as possible. The fit of the shape is
“best” in the sense that no other position would produce less Simple Linear Regression

error given the choice of shape.

2 major types of linear regression are


• Simple Linear Regression: Uses a single independent variable to
predict a dependent variable by fitting a best linear relationship
• Multiple Linear Regression: Uses more than one independent
variable to predict a dependent variable by fitting a best linear
relationship. Multiple Linear Regression

For more info see: https://www.youtube.com/watch?v=ZkjP5RJLQF4 68


Statistical Classification
Classification

• Classification is a data mining technique that assigns categories to a collection of data in order to
aid in more accurate predictions and analysis.

• Also sometimes called a Decision Tree (See next slide), classification is one of several methods
intended to make the analysis of very large datasets more effective.

• 2 major Classification techniques stand out:


• Logistic Regression and
• Discriminant Analysis

More Info: https://www.youtube.com/watch?v=B0TI2q7wgIQ


70
Tree-Based Models
Tree-based methods can be used for both regression and classification problems. These involve stratifying
or segmenting the predictor space into a number of simple regions. Since the set of splitting rules used to
segment the predictor space can be summarized in a tree, these types of approaches are known as decision-
tree methods. The methods below grow multiple trees which are then combined to yield a single consensus
prediction.
Bagging is the way decrease the variance of your prediction by generating
additional data for training from your original dataset using combinations with
repetitions to produce multistep of the same carnality/size as your original data. By
increasing the size of your training set you can’t improve the model predictive force,
but just decrease the variance, narrowly tuning the prediction to expected outcome.
Boosting is an approach to calculate the output using several different models and
then average the result using a weighted average approach. By combining the
advantages and pitfalls of these approaches by varying your weighting formula you
can come up with a good predictive force for a wider range of input data, using
different narrowly tuned models.
The Random Forest algorithm is actually very similar to bagging. Also here, you
draw random bootstrap samples of your training set. However, in addition to the
bootstrap samples, you also draw a random subset of features for training the
individual trees; in bagging, you give each tree the full set of features. Due to the
random feature selection, you make the trees more independent of each other
compared to regular bagging, which often results in better predictive performance
(due to better variance-bias trade-offs) and it’s also faster, because each tree learns
only from a subset of features.
More Info: https://www.youtube.com/watch?v=B0TI2q7wgIQ and https://www.youtube.com/watch?v=7VeUPuFGJHk
71
Logistic Regression
• Logistic Regression is the appropriate regression analysis to conduct when the dependent variable
is dichotomous (binary).
• Like all regression analyses, the logistic regression is a predictive analysis.
• Logistic regression is used to describe data and to explain the relationship between one
dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent
variables.
• Dimensionality Reduction using PCA (Principal Component Analysis) or LDA (Linear Discriminant
Analysis) is many times applied
• Types of questions that a logistic regression can examine:
• How does the probability of getting lung cancer (Yes vs No)
change for every additional pound of overweight and for every
pack of cigarettes smoked per day?
• Do body weight calorie intake, fat intake, and participant age
have an influence on heart attacks (Yes vs No)?

More Info – linear vs logistic regression: https://www.youtube.com/watch?v=OCwZyYH14uw


72
PCA vs LDA
• PCA (Principal Component Analysis): Reducing the dimension of the feature space
(“dimensionality reduction: There are many methods, but most of the techniques fall into one of 2
classes:
• Feature Elimination – Eliminate features, which gives simplicity and maintainability
of data, but you also entirely eliminate any benefits the dropped variables would have
brought.
• Feature Extraction - Combines input variables in a specific way, allowing the
“least important” variables to be dropped while retaining the most valuable
parts of all the variables.

• LDA (Linear Discriminant Analysis) are used a lot in machine learning


algorithms: A type of linear combination and mathematical process
using various data items and applying a function to separately analyse
multiple classes of object or item.

More Info: https://medium.com/machine-learning-researcher/dimensionality-reduction-pca-and-lda-6be91734f567 73


Video: https://www.youtube.com/watch?v=M4HpyJHPYBY
Discriminant Analysis
• In Discriminant Analysis, 2 or more groups or clusters or populations are known - a priori and one
or more new observations are classified into one of the known populations based on the measured
characteristics.
• Discriminant analysis models the distribution of the predictors X separately in each of the response
classes, and then uses Bayes’ theorem to flip these around into estimates for the probability of the
response category given the value of X.
• Such models can either be linear or quadratic.
• Linear Discriminant Analysis computes “discriminant scores” for each observation to classify what response variable
class it is in. These scores are obtained by finding linear combinations of the independent variables. It assumes that the
observations within each class are drawn from a multivariate Gaussian distribution and the covariance of the predictor
variables are common across all k levels of the response variable Y.
• Quadratic Discriminant Analysis provides an alternative approach. Like LDA, QDA assumes that the observations from
each class of Y are drawn from a Gaussian distribution. However, unlike LDA, QDA assumes that each class has its own
covariance matrix. In other words, the predictor variables are not assumed to have common variance across each of the k
levels in Y.

For more info see: https://www.sciencedirect.com/topics/medicine-and-dentistry/discriminant-analysis 74


Video: https://www.youtube.com/watch?v=1Z7JiTp9Y90
Discriminant Analysis
• Discriminant analysis is a way to build classifiers: that is, the algorithm uses
labelled training data to build a predictive model of group membership which
can then be applied to new cases.
• While regression techniques produce a real value as output, discriminant
analysis produces class labels.
• As with regression, discriminant analysis can be linear, attempting to find a
straight line that separates the data into categories, or it can fit any of a variety
of curves.
• It can be two dimensional or multidimensional; in higher dimensions the
separating line becomes a plane, or more generally a hyperplane.
Discriminant analysis attempts to identify a boundary
• Discriminant analysis also outputs an equation that can be used to classify between groups in the data, which can then be used to
new examples. classify new observations. The boundary may be linear or
nonlinear; in this example both a linear and a quadratic line
• Discriminant analysis makes the assumptions that the variables are distributed are fitted
normally, and that the within-group covariance matrices are equal. However,
discriminant analysis is surprising robust to violation of these assumptions,
and is usually a good first choice for classifier development.

For more info see: https://www.sciencedirect.com/topics/medicine-and-dentistry/discriminant-analysis 75


Resampling Methods
Resampling
• Resampling is the method that consists of drawing repeated samples from the original data
samples. It is a non-parametric method of statistical inference. In other words, the method of
resampling does not involve the utilization of the generic distribution tables in order to compute
approximate p probability values.
• Resampling generates a unique sampling distribution on the basis of the actual data.
• It uses experimental methods, rather than analytical methods, to generate the unique sampling
distribution.
• It yields unbiased estimates as it is based on the unbiased samples of all the possible results of the
data studied by the researcher.
• In order to understand the concept of resampling, you should understand the terms Bootstrapping
and Cross-Validation (see next slide).
• Usually for linear models, ordinary least squares is the major criteria to be considered to fit them
into the data.

For more info see: https://www.sciencedirect.com/topics/medicine-and-dentistry/discriminant-analysis 77


Video: https://www.youtube.com/watch?v=O_Fj4q8lgmc
Bootstrapping versus Cross-Validation
• Bootstrapping is a technique that helps in many
situations like validation of a predictive model
performance, ensemble methods, estimation of bias and
variance of the model. It works by sampling with
replacement from the original data, and take the “not
chosen” data points as test cases. It can be done several
times and used to calculate the average score as
estimation of model performance.
• Cross-validation is a technique for validating the model
performance, and it’s done by split the training data into k
parts. The k — 1 parts are taken as a training set and
the “held out” part are used as the test set. This process
is repeated k times differently. Finally, the average of the
k scores are used as performance estimation.

For more info see: https://www.sciencedirect.com/topics/medicine-and-dentistry/discriminant-analysis 78


Video: https://www.youtube.com/watch?v=O_Fj4q8lgmc and https://www.youtube.com/watch?v=7062skdX05Y
Subset Selection
Subset Selection
Sometimes incorrectly seen as a special case
of feature extraction, but has a unique set of
methodologies.

Stepwise Selection: This approach identifies a subset of the p predictors that is believed to be related to the
response. A model is then fitted using the least squares of the subset features.

Best-Subset Selection: Here we fit a separate OLS regression for each possible
combination of the p predictors and then look at the resulting model fits.

The algorithm is broken up into 2 stages: (1) Fit all models that contain k predictors,
where k is the max length of the models, (2) Select a single model using cross-validated
prediction error.

It is important to use testing or validation error, and not training error to assess model fit
because RSS and R² monotonically increase with more variables.

The best approach is to cross-validate and choose the model with the highest R² and lowest
RSS on testing error estimates.

Tutorial: http://online.fliphtml5.com/crpq/bwcj/#p=2
80
https://www.youtube.com/watch?v=Ah9XWzsB2mo
Subset Stepwise Selection Options A - C
Forward Stepwise Selection considers a much
smaller subset of p-predictors. It begins with a model
containing no predictors, then adds predictors to the
model, one at a time until all of the predictors are in
the model. The order of the variables being added is
the variable, which gives the greatest addition
improvement to the fit, until no more variables
improve model fit using cross-validated prediction
error.
Backward Stepwise Selection begins will
all p predictors in the model, then iteratively removes
the least useful predictor one at a time.
Hybrid Methods follows the forward stepwise
approach, however, after adding each new variable,
the method may also remove variables that do not
contribute to the model fit.

Tutorial: http://online.fliphtml5.com/crpq/bwcj/#p=2 and https://www.youtube.com/watch?v=iM-8jc3CiH8


81
Example: https://www.youtube.com/watch?v=KsFO2lDxMQI
Some Other Subset Selection Methods
• Boruta is a feature ranking and selection algorithm based on random
forests algorithm. The advantage with Boruta is that it clearly decides if
a variable is important or not and helps to select variables that are
statistically significant.
• Least Absolute Shrinkage and Selection Operator (LASSO) regression
is a type of regularization method that penalizes with L1-norm. It
basically imposes a cost to having large weights (value of coefficients).
And its called L1 regularization, because the cost added, is proportional
to the absolute value of weight coefficients. As a result, in the process
of shrinking the coefficients, it eventually reduces the coefficients of
certain unwanted features all the to zero. That is, it removes the
unneeded variables altogether.
• Recursive feature elimination (rfe) offers a rigorous way to determine
the important variables before you even feed them into a ML algorithm.
• The Information Value can be used to judge how important a given
categorical variable is in explaining the binary Y variable. It goes well This Boruta plot reveals the importance of each of the features.
The columns in green are ‘confirmed’ and the ones in red are
with logistic regression and other classification models that can model not. There are a couple of blue bars representing ShadowMax
binary variables. and ShadowMin.
They are not actual features, but are used by the Boruta
algorithm to decide if a variable is important or not

Tutorial: http://online.fliphtml5.com/crpq/bwcj/#p=2
82
https://www.youtube.com/watch?v=Ah9XWzsB2mo
https://www.machinelearningplus.com/machine-learning/feature-selection/
Shrinkage
Shrinkage (“Regularisation”)
This approach fits a model involving all p predictors, however, the estimated coefficients are shrunken towards
zero relative to the least squares estimates. This shrinkage, aka regularisation has the effect of reducing
variance. Depending on what type of shrinkage is performed, some of the coefficients may be estimated to be
exactly zero. Thus this method also performs variable selection. The two best-known techniques for shrinking
the coefficient estimates towards zero are the ridge regression and the lasso.
Ridge regression is similar to least squares except that the coefficients are estimated by minimizing
a slightly different quantity. Ridge regression, like OLS, seeks coefficient estimates that reduce RSS,
however they also have a shrinkage penalty when the coefficients come closer to zero. This penalty
has the effect of shrinking the coefficient estimates towards zero. Without going into the math, it is
useful to know that ridge regression shrinks the features with the smallest column space variance.
Like in principal component analysis, ridge regression projects the data into d-directional space and
then shrinks the coefficients of the low-variance components more than the high variance
components, which are equivalent to the largest and smallest principal components.

Lasso: Ridge regression had at least one disadvantage; it includes all p-predictors in the final model.
The penalty term will set many of them close to zero, but never exactly to zero. This isn’t generally a
problem for prediction accuracy, but it can make the model more difficult when interpreting the
results. Lasso overcomes this disadvantage and is capable of forcing some of the coefficients to zero
granted that s is small enough. Since s = 1 results in regular OLS regression, as s approaches 0 the
coefficients shrink towards zero. Thus, Lasso regression also performs variable selection.
More Info: https://www.youtube.com/watch?v=Q81RR3yKn30 and https://www.youtube.com/watch?v=NGf0voTMlcs
84
Dimension Reduction
Dimension Reduction

• The curse of dimensionality is the phenomena whereby an increase


in the dimensionality of a data set results in exponentially more data
being required to produce a representative sample of that data set.

• To combat the curse of dimensionality, numerous linear and non-


linear dimensionality reduction techniques have been developed.

• These techniques aim to reduce the number of dimensions


(variables) in a data set through either feature selection or feature
extraction without significant loss of information.

• Feature extraction is the process of transforming the original data set


into a data set with fewer dimensions.

• Two well known, and closely related, feature extraction techniques


are Principal Component Analysis (PCA) and Self Organizing Maps
(SOM).
Self Organizing Map
http://www.statisticshell.com/docs/factor.pdf
86
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm
Dimension Reduction
Dimension reduction reduces the problem of estimating p + 1 coefficients to the simple problem of M + 1
coefficients, where M < p. This is attained by computing M different linear combinations, or projections, of the
variables. Then these M projections are used as predictors to fit a linear regression model by least squares. 2
approaches for this task are principal component regression and partial least squares.
Principal Components Regression (PCR) is an approach for deriving a low-dimensional set of
features from a large set of variables.
• The first principal component direction of the data is along which the observations vary the most. In other
words, the first PC is a line that fits as close as possible to the data. One can fit p distinct principal
components.
• The second PC is a linear combination of the variables that is uncorrelated with the first PC, and has the
largest variance subject to this constraint. The idea is that the principal components capture the most
variance in the data using linear combinations of the data in subsequently orthogonal directions. In this way,
we can also combine the effects of correlated variables to get more information out of the available data,
whereas in regular least squares we would have to discard one of the correlated variables.
The PCR method described above involves identifying linear combinations of X that best represent
the predictors. These combinations (directions) are identified in an unsupervised way, since the
response Y is not used to help determine the principal component directions. That is, the response Y
does not supervise the identification of the principal components, thus there is no guarantee that the
directions that best explain the predictors also are the best for predicting the response (even though
that is often assumed).

Partial least squares (PLS) are a supervised alternative to PCR. Like PCR, PLS is a dimension
reduction method, which first identifies a new smaller set of features that are linear combinations of
the original features, then fits a linear model via least squares to the new M features. Yet, unlike PCR,
PLS makes use of the response variable in order to identify the new features.
http://www.statisticshell.com/docs/factor.pdf
87
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm
Video: https://www.youtube.com/watch?v=WKEGhyFx0Dg (PLS)
Exploratory Factor Analysis (EFA)
Exploratory Factor Analysis (EFA)
• Factor analysis is used to model the interrelationships among items, and
focus primarily on the variance and covariance rather than the mean.
Factor analysis assumes that variance can be partitioned into two types
of variance, common and unique
• Common variance is the amount of variance that is shared among a set
of items. Items that are highly correlated will share a lot of variance.
• Communality (also called h2) is a definition of common variance that ranges
between 0 and 1. Values closer to 1 suggest that extracted factors explain
more of the variance of an individual item.
• Unique variance is any portion of variance that’s not common. There are
two types:
• Specific variance: is variance that is specific to a particular item.
• Error variance: comes from errors of measurement and basically anything
unexplained by common or specific variance.

https://stats.idre.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor-analysis/
89
Exploratory Factor Analysis (EFA)
• There are two types of Factor Analysis.
• Exploratory Factor Analysis (EFA) aims to group together and
summarise variables which are correlated and can therefore
identify possible underlying latent variables which cannot be
measured directly
• Confirmatory Factor Analysis (CFA) tests theories about latent
factors. Confirmatory Factor Analysis is performed using additional
SPSS software and is beyond the scope of stats support but EFA
is commonly used in disciplines such as Psychology and can be
found in standard textbooks.
• Exploratory Factor Analysis and Principal Component
Analysis are very similar. The main differences are:
• PCA uses all the variance in the variables analysed whereas EFA
EFA (left) and CFA (right).
uses only the common (shared) variance between the variables
• EFA aims to identify underlying latent variables (factors) rather
than just reduce the number of variables

http://www.statisticshell.com/docs/factor.pdf
https://www.mailman.columbia.edu/research/population-health-methods/exploratory-factor-analysis 90
http://www.en.globalstatistik.com/exploratory-factor-analysis-vs-confirmatory-factor-analysis/
Video: https://slideplayer.com/slide/7824705/ and https://www.youtube.com/watch?v=Q2JBLuQDUvI
Non-Linear Models
Non-Linear Models
In statistics, nonlinear regression is a form of regression analysis in which observational data are
modelled by a function which is a non-linear combination of the model parameters and depends on
one or more independent variables. The data are fitted by a method of successive approximations.
• A function on the real numbers is called a step function if it can be written as a finite linear
combination of indicator functions of intervals. Informally speaking, a step function is a
piecewise constant function having only finitely many pieces.

• A piecewise function is a function which is defined by multiple sub-functions, each sub-


function applying to a certain interval of the main function’s domain. Piecewise is actually a
way of expressing the function, rather than a characteristic of the function itself, but with
additional qualification, it can describe the nature of the function. For example, a piecewise
polynomial function is a function that is a polynomial on each of its sub-domains, but
possibly a different one on each.

• A spline is a special function defined piecewise by polynomials. In computer graphics,


spline refers to a piecewise polynomial parametric curve. Splines are popular curves
because of the simplicity of their construction, their ease and accuracy of evaluation, and
their capacity to approximate complex shapes through curve fitting and interactive curve
design.

• A generalized additive model is a generalized linear model in which the linear predictor
depends linearly on unknown smooth functions of some predictor variables, and interest
focuses on inference about these smooth functions.
More info: https://www.amazon.ca/Nonlinear-Regression-Analysis-Its-Applications/dp/0470139005
92
https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/
https://www.youtube.com/watch?v=sKrDYxQ9vTU
Support Vector Machines (SVM)
Support Vector Machines (SVM)
SVM is a classification technique that is listed under supervised learning models in Machine
Learning. In layman’s terms, it involves finding the hyperplane (line in 2D, plane in 3D and
hyperplane in higher dimensions. More formally, a hyperplane is n-1 dimensional subspace of an n-
dimensional space) that best separates two classes of points with the maximum margin. Essentially,
it is a constrained optimization problem where the margin is maximized subject to the constraint that
it perfectly classifies the data (hard margin).

• The data points that kind of “support” this hyperplane on either sides
are called the “support vectors”. In the picture, the filled blue circle
and the two filled squares are the support vectors.
• For cases where the two classes of data are not linearly separable,
the points are projected to an exploded (higher dimensional) space
where linear separation may be possible.
• A problem involving multiple classes can be broken down into
multiple one-versus-one or one-versus-rest binary classification
problems.

More info: https://data-flair.training/blogs/svm-support-vector-machine-tutorial/


94
https://www.youtube.com/watch?v=RKZoJVMr6CU
Support Vector Machines (SVM)
Advantages
• Guaranteed Optimality: Owing to the nature of Convex Optimization, the solution will always be global minimum not a local minimum.

• Abundance of Implementations: We can access it conveniently, be it from Python or Matlab.

• VM can be used for linearly separable as well as non-linearly separable data. Linearly separable data is the hard margin whereas non-linearly
separable data poses a soft margin.

• SVMs provide compliance to the semi-supervised learning models. It can be used in areas where the data is labelled as well as unlabelled. It
only requires a condition to the minimization problem which is known as the Transductive SVM.

• Feature Mapping used to be quite a load on the computational complexity of the overall training performance of the model. However, with the
help of Kernel Trick, SVM can carry out the feature mapping using simple dot product.

Disadvantages
• SVM is incapable of handling text structures. This leads to loss of sequential information and thereby, leading to worse performance.

• Vanilla SVM cannot return the probabilistic confidence value that is similar to logistic regression. This does not provide much explanation
as confidence of prediction is important in several applications.

• Choice of the kernel is perhaps the biggest limitation of the support vector machine. Considering so many kernels present, it becomes
difficult to choose the right one for the data.

More info: https://data-flair.training/blogs/svm-support-vector-machine-tutorial/


95
Unsupervised Learning
Unsupervised Learning
In supervised learning techniques, in which the groups are known and the experience provided to the algorithm
is the relationship between actual entities and the group they belong to. Another set of techniques can be used
when the groups (categories) of data are not known – these are called unsupervised as it is left on the learning
algorithm to figure out patterns in the data provided. Clustering is an example of unsupervised learning in which
different data sets are clustered into groups of closely related items. The most widely used methods are:

• Principal Component Analysis (PCA) helps in


producing low dimensional representation of the
dataset by identifying a set of linear combination of
features which have maximum variance and are
mutually un-correlated. This linear dimensionality
technique could be helpful in understanding latent
interaction between the variable in an unsupervised
setting.
• k-Means clustering: partitions data into k distinct
clusters based on distance to the centroid of a cluster.
• Hierarchical clustering: builds a multi-level hierarchy
of clusters by creating a cluster tree.
97
More Info: https://www.youtube.com/watch?v=IUn8k5zSI6g
Unsupervised Learning

More Info: https://www.youtube.com/watch?v=IUn8k5zSI6g 98


https://www.youtube.com/watch?v=JnnaDNNb380
Reliability of Scales (Cronbach’s Alpha)
Reliability of Scales (Cronbach’s Alpha)
• Cronbach’s alpha is a measure used to assess the reliability, or internal consistency, of a set of scale or
test items. In other words, the reliability of any given measurement refers to the extent to which it is a
consistent measure of a concept, and Cronbach’s alpha is one way of measuring the strength of that
consistency.

• Cronbach’s alpha is computed by correlating the score for each scale item with the total score for each
observation (usually individual survey respondents or test takers), and then comparing that to the
variance for all individual item scores

• The resulting α coefficient of reliability ranges from 0 to 1 in providing this overall assessment of a
measure’s reliability. If all of the scale items are entirely independent from one another (i.e., are not
correlated or share no covariance), then α = 0; and, if all of the items have high co-variances, then α will
approach 1 as the number of items in the scale approaches infinity. In other words, the higher
the α coefficient, the more the items have shared covariance and probably measure the same underlying
concept.

http://www.statisticshell.com/docs/factor.pdf
100
http://statistics.ats.ucla.edu/stat/spss/output/principal_components.htm
https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/
Cronbach’s Alpha – Guidelines for acceptability
• 70% generally cited as “acceptable”.

• Nunnally’s guidelines
• 0.70 minimum acceptable for exploratory research
• 0.80 minimum acceptable for basic research
• 0.90 or higher for data where there is applied (operational) scenarios

• Assumptions
• Assumes uni-dimensionality (All items measure only single dimension in all data)
• Tested through factor analysis

• Cronbach’s Alpha superseded by Omega


• A measure that overcomes the deficiencies of alpha is coefficient omega, which is based on a one-factor model. In
particular, when the covariance among the items can be approximately accounted for by a one-factor model, the
formulation of coefficient omega closely matches the definition of reliability (McDonald, 1999)
• Example Case Study: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5965544/

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.463.428&rep=rep1&type=pdf 101
https://slideplayer.com/slide/5280162/
Reliability Statistics and Mathematics

Discussion of the most used mathematics and statistical


models used in Reliability Engineering
Dependability (Reliability) Analysis
• Defining the reliability requirements for sub-systems is
an essential part of the system design scope of work.
The objective of this task is to find the most effective
system architecture to achieve the reliability
requirements.
• The reliability requirements of the overall system is allocated to
subsystems, depending on the complexity of these sub-systems
and is usually based on experience with comparable sub-systems.

• If the requirements are not met by the initial design, re-allocation


of reliability values to sub-systems and/or re-design of the sub-
system/system shall be repeated.

• Allocation is also often made on the basis of considerations such


as complexity, criticality, operational profile and environmental
condition.

103
Typical RM Assessment Techniques in Lifecycle Phases

Some reliability engineering techniques


are more effective than others, depending
on the asset lifecycle phase.

The guide here provide a high level


oversight of where RE techniques are
most effective.

Key
• A – applicable
• L - only very limited data may be available
during this lifecycle phase, reducing
effectiveness of the method.
• M - applicable where modifications or
significant changes to the product or
system design occur.

104
Reliability Statistics
Fundamental Reliability Statistical equations & definitions
• The most common and fundamental statistical equations and definitions* used in reliability
engineering and life data analysis are:
• Random Variables

• Probability Density Function (pdf)

• Cumulative Distribution Function (cdf)

• Reliability function

• Conditional Reliability Function

• Failure Rate Function

• Mean Life (MTTF)

• Median Life

• Modal Life (or Mode)

• Lifetime distributions

*For formulae and more info see: http://reliawiki.org/index.php/Basic_Statistical_Background 106


Typically trying to address failure rates and minimise their impact

107
https://www.reliableplant.com/Read/18693/reliability-engineering-plant
The 3 Categories of Reliability Statistics
• In general, most problems in reliability engineering deal with quantitative measures, such as the
time-to-failure of a component, or qualitative measures, such as whether a component is defective
or non-defective.

• The methods used to quantify reliability are the mathematics of probability and statistics.

• Reliability statistics can be broadly divided into the treatment of discrete functions, continuous functions
and point processes.
• For example, a switch may either work or not work when selected or a pressure vessel may pass or fail a test—these
situations are described by discrete functions. In reliability we are often concerned with two-state discrete systems,
since equipment is in either an operational or a failed state.
• Continuous functions describe those situations which are governed by a continuous variable, such as time or
distance travelled.
• The statistics of point processes are used in relation to repairable systems, when more than one failure can occur in a
time continuum.

108
Types of Reliability Statistics
Reliability statistics are divided into the treatment of:
• Discrete Functions
• Continuous Functions
• Point Processes

• The choice of method will


depend upon the problem and
on the type of data available.

109
A
Discrete Functions
Discrete Functions
• For example, a switch may either work or not work when selected or a pressure vessel may pass or fail
a test—these situations are described by discrete functions.
• In reliability we are often concerned with two-state discrete systems, since equipment is in either an
operational or a failed state.

111
A
Continuous Functions
Continuous Functions

• Continuous functions describe those situations which are


governed by a continuous variable, such as time or distance
travelled.
• Electronic equipment would have a reliability function in this class.
E.g. data may show that a certain type of power supply fails at a
constant average rate of once per 107 hours.

• The distinction between discrete and continuous functions is


one of how the problem is treated, and not necessarily of
the physics or mechanics of the situation.
• For example, whether or not a pressure vessel fails a test may be a
function of its age, and its reliability could therefore be treated as a
continuous function.

113
Reliability Continuous Distribution Functions
• By far the most widely used ‘model’ of the
nature of variation is the mathematical function
known as the normal (or Gaussian) distribution.
The normal data distribution pattern occurs in
many natural phenomena, such as human
heights, weather patterns, and so on.
• The lognormal distribution is a more versatile
distribution than the normal as it has a range of
shapes, and therefore is often a better fit to
reliability data, such as for populations with
wear-out characteristics. Also, it does not have
the normal distribution’s disadvantage of
extending below zero to –∞.
• The exponential distribution describes the
situation wherein the hazard rate is constant. A
Poisson process generates a constant hazard
rate. This is an important distribution in reliability
work, as it has the same central limiting
relationship to life statistics as the normal
distribution has to non-life statistics. It describes
the constant hazard rate situation.

Also see: http://reliawiki.org/index.php/Life_Distributions


114
Reliability Continuous Distribution Functions (Cont.)
• In statistical terms the gamma distribution
represents the sum of n exponentially
distributed random variables. The gamma
distribution is a flexible life distribution model
that may offer a good fit to some sets of failure
data.
• The Weibull distribution is arguably the most
popular statistical distribution used by
reliability engineers. It has the great
advantage in reliability work that by
adjusting the distribution parameters it can
be made to fit many life distributions.
• Extreme value statistics are capable of
describing these situations asymptotically.
Extreme value statistics are derived by
considering the lowest or highest values in each
of a series of equal samples. Used for reliability
work where the concern is not with the
distribution of variables which describe the bulk
of the population but only with the extreme
values which can lead to failure.
Also see: http://reliawiki.org/index.php/Life_Distributions
115
Probability Distribution Function (pdf)
To describe a pdf we normally consider four aspects:
1 The central tendency, about which the distribution is grouped.
2 The spread, indicating the extent of variation about the central
tendency. (Also referred to as the standard deviation [SD])
3 The “skewness”, indicating the lack of symmetry about the
central tendency. Skewness equal to zero is a characteristic
of a symmetrical distribution. Positive skewness indicates that
the distribution has a longer tail to the right and negative
skewness indicates the opposite.
4 The kurtosis, indicating the ‘peakedness’ of the pdf. In
general terms kurtosis characterizes the relative peakedness
or flatness of a distribution compared to the normal
distribution. Positive kurtosis indicates a relatively peaked
distribution. Negative kurtosis indicates a relatively flat
distribution.

116
Examples of Continuous Distribution Functions (cdf)
• Normal (or Gaussian)
• Lognormal
• Exponential
• Gamma
• X2 (Chi-Square)
• Weibull
Cumulative distribution function for the Exponential distribution
• Extreme Value

Gamma Distribution Chi-Square distribution


Cumulative distribution function for the Normal distribution

117
Note: Refer to prescribed handbook, pages 33-41 for description and application examples of each of the function types
Series of Events (Point Processes)
Point Processes
• The statistics of point processes are used in relation to repairable systems, when more than one failure can
occur in a time continuum.
• Situations in which discrete events occur randomly in a continuum (e.g. time) cannot be truly represented by a
single continuous distribution function. Failures occurring in repairable systems, aircraft accidents and vehicle
traffic flow past a point are examples of series of discrete events. These situations are called stochastic point
processes. They can be analysed using the statistics of event series (see later slide).
• Homogeneous Poisson process (HPP): A HPP is a stationary point process, since the distribution of the
number of events in an interval of fixed length does not vary, regardless of when (where) the interval is
sampled (e.g. events occur randomly and at a constant average rate).
• Non-homogeneous Poisson process (NHPP): NHPP is where the point process is non-stationary (rate of
occurrence is a function of time), so that the distribution of the number of events in an interval of fixed length
changes as x increases. Typically, the discrete events (e.g. failures) might occur at an increasing or
decreasing rate.
• A HPP describes a sequence of independently and identically exponentially distributed (IIED) random
variables. A NHPP describes a sequence of random variables which is neither independently nor
identically distributed.
119
See handbook, pages 61 – 64 for more info and examples
HHP vs NHPP Example

The homogeneous Poisson process is a model describing


the occurrence of random events in time. The model is
based on the following two assumptions:
• Each event is isolated, that is, no two (or more) events
can occur at the same moment in time.
• Events are generated randomly and independently of
each other with a mean rate ϱ that is uniform in time.

NHPP model is used to determine an appropriate mean


value function to denote the expected number of failures
experienced up to a certain time.
With different assumptions, the model will end up with
different functional forms of the mean value function.

120
See handbook, pages 61 – 64 for more info and examples
Series of Events Analysis Method
• Trend Analysis (Time Series Analysis)
• Super-imposed Processes: If a number of separate stochastic point process combine to form an overall
process (for example, failure processes of individual components (or sockets) in a system).

https://www.slideshare.net/mirkokaempf/from-events-to-networks-time-series-analysis-on-scale 121
Example: https://www.youtube.com/watch?v=ztvQQlGpL6Y
Reliability Scenario Modelling - Software
• Scenario-Based Reliability Analysis (SBRA): A reliability model, and a reliability analysis technique
for component-based software.
• Using scenarios of component interactions, a probabilistic model named Component-Dependency
Graph (CDG) is constructed. Based on CDG, a reliability analysis algorithm is developed to analyse
the reliability of the system as a function of reliabilities of its architectural constituents.
• The proposed approach has the following benefits:
• It is used to analyse the impact of variations and uncertainties in the reliability of individual components, subsystems, and
links between components on the overall reliability estimate of the software system. This is particularly useful when the
system is built partially or fully from existing off-the-shelf components.
• It is suitable for analysing the reliability of distributed software systems because it incorporates link and delivery channel
reliabilities.
• The technique is used to identify critical components, interfaces, and subsystems; and to investigate the sensitivity of the
application reliability to these elements.
• The approach is applicable early in the development lifecycle, at the architecture level. Early detection of critical
architecture elements, those that affect the overall reliability of the system the most, is useful in delegating resources in
later development phases.
Example: https://www.researchgate.net/publication/3152734_A_Scenario-Based_Reliability_Analysis_Approach_for_Component-
122
Based_Software/link/0f31752e28167e0e6d000000/download
Component-Dependency Graph (CDG) Example

Example: https://www.researchgate.net/figure/An-example-of-HPU-aware-dependency-graph-A-component-box-describes-a-software-and-
123
its_fig3_262213451
Dealing with Multiple Reliability Engineering Scenarios
Methods Used for multiple Scenarios
• Monte-Carlo Simulation
• Markov Analysis
• Petri-Nets

A system is a collection of subsystems,


assemblies and/or components arranged in a
specific design in order to achieve desired
functions with acceptable performance and
reliability.
The types of components, their quantities, their
qualities and the manner in which they are
arranged within the system have a direct effect
on the system's reliability.
Therefore, in addition to the reliability of the
components, the relationship between these
components is also considered and decisions as
to the choice of components can be made to
improve or optimize the overall system reliability,
maintainability and/or availability.

https://systemreliability.wordpress.com/2017/05/12/accelerated-monte-carlo-system-reliability-analysis-through-machine-learning-based-surrogate-models- 125
of-network-connectivity/
Worst Case Scenario Analysis – complex reliability modelling
Worst-case circuit analysis (WCCA or WCA) is a cost-effective
means of screening a design to ensure with a high degree of
confidence that potential defects and deficiencies are identified and
eliminated prior to and during test, production, and delivery.
It is a quantitative assessment of the equipment performance,
accounting for manufacturing, environmental and aging effects. In
addition to a circuit analysis, a WCCA often includes stress and
derating analysis, failure modes and effects criticality (FMECA) and
reliability prediction (MTBF).
The specific objective is to verify that the design is robust
enough to provide operation which meets the system
performance specification over design life under worst-case
conditions and tolerances (initial, aging, radiation, temperature,
etc.).
Stress and de-rating analysis is intended to increase reliability by
providing sufficient margin compared to the allowable stress limits. Source: https://en.wikipedia.org/wiki/Worst-case_circuit_analysis

This reduces overstress conditions that may induce failure, and


reduces the rate of stress-induced parameter change over life. It
determines the maximum applied stress to each component in the
system.
https://systemreliability.wordpress.com/2017/05/12/accelerated-monte-carlo-system-reliability-analysis-through-machine-learning-based-surrogate-models- 126
of-network-connectivity/
Example

From Case Study: https://core.ac.uk/download/pdf/1669349.pdf 127


Bayesian Methods
Bayesian Theorem & Model
• Also known as the Evidence or Trust Theorem: Posterior probability is prior probability times the
likelihood.
• Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It
provides people the tools to update their beliefs in the evidence of new data.
• The cornerstone of Bayesian methods is the notion of subjective probability. Bayesian methods
consider probability to be a subjective assessment of the state of knowledge (also called degree of
belief) about model parameters of interest, given all available evidence.
• In Bayesian reliability analysis, the statistical model consists of two parts: the likelihood function
and the prior distribution.
• The likelihood function is typically constructed from the sampling distribution of the data, defined by the probability density
function assumed for the data. Simply put, the likelihood principle states that all information contained in experimental
data is contained in the sampling density of the observed data.
• A probability density function is used to describe the uncertainty about the parameters used. Before analysing
experimental data, we call the distribution that represents our knowledge about these parameters the prior distribution.

https://www.youtube.com/watch?v=0F0QoMCSKJ4
129
https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/
Bayes Rule underpinning Bayesian Methods
• Probabilistic models describes data that can be observed from a system.
• Mathematics of probability theory is used to express all forms of uncertainty and noise associated
with the model.
• Bayes rule of inverse probability creates the ability to infer unknown quantities, make predictions
and adapt models.

Limitations and Criticisms of Bayesian Methods:


• They are subjective.
• It is hard to come up with a prior, the assumptions are usually wrong.
• The closed world assumption: need to consider all possible hypotheses for the data before
observing the data.
• They can be computationally demanding.
• The use of approximations weakens the coherence argument.

Advantages:
• Coherent.
• Conceptually straightforward.
• Modular.
• Often good performance.
Source: Bayesian Modelling. Z Ghahramani, 2012 and https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
130
Reference Book: Bayesian Reliability. MS Hamada et al.
Bayes Theorem
Bayes Theorem comes into effect when multiple events Ai form an exhaustive set with another
event B.
Deducing Bayes Equation:

Source: https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
131
Bayesian Inference
• A primary goal of Bayesian inference is summarizing available information about unknown
parameters that define statistical models through the specification of probability density
functions.
• “Unknown parameters that define statistical models” refers to things like failure probabilities or mean system
lifetimes; they are the parameters of interest.

• “Probability density functions” occur in four derivations:


• Prior densities (using Prior Belief Distribution function, resulting in a beta distribution)
• Sampling densities or likelihood functions (Using Bernoulli Likelihood Function)
• Posterior densities (Using Bayes Theorem formula)
• Predictive densities.

• An important part of Bayesian inference is the establishment of parameters and models.


• Models are the mathematical formulation of the observed events.
• Parameters are the factors in the models affecting the observed data.

• “Available information” normally comes in the form of test data, experience with related
systems, and engineering judgment.
For more information see Reference Book: Bayesian Reliability. MS Hamada et al 132
Also see: https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/
Bayesian Density Function Examples

Prior densities

Sampling densities

Posterior densities

Predictive densities

References: https://www.researchgate.net/figure/Illustration-of-the-prior-probability-density-function-PDF-test-results-and-predictive_fig2_271704064 133


Bayes Factor and HDI
• Bayes factor is the equivalent of p-value in the Bayesian framework.
• Bayes factor is defined as the ratio of the posterior odds to the prior odds

• To reject a null hypothesis, a BF <1/10 is preferred.


• The immediate benefit of using Bayes Factor instead of p-values are that it is independent of
intentions and sample size.
• The High Density Interval (HDI) is formed from the posterior
distribution after observing the new data.
• Since HDI is a probability, the 95% HDI gives the 95% most credible
values.
• It is also guaranteed that 95 % values will lie in this interval unlike the
Confidence Interval

Info on Bayesian Logic: https://www.omegastatistics.com/webinars/


134
Bayesian Neural Network
A Bayesian neural network (BNN) refers to extending standard networks with posterior inference. Standard NN training via optimization is (from
a probabilistic perspective) equivalent to maximum likelihood estimation (MLE) for the weights.

For many reasons this is unsatisfactory. One reason is that it lacks proper theoretical justification from a probabilistic perspective: why maximum
likelihood? Why just point estimates? Using MLE ignores any uncertainty that we may have in the proper weight values. From a practical
standpoint, this type of training is often susceptible to overfitting, as NNs often do.

One partial fix for this is to introduce regularization. From a Bayesian perspective, this is equivalent to inducing priors on the weights (say
Gaussian distributions if we are using L2 regularization). Optimization in this case is akin to searching for MAP estimators rather than MLE.
Again from a probabilistic perspective, this is not the right thing to do, though it certainly works well in practice.

The correct (i.e., theoretically justifiable) thing to do is posterior inference, though this is very challenging both from a modelling and
computational point of view. BNNs are neural networks that take this approach. In the past this was all but impossible, and we had to resort to
poor approximations such as Laplace’s method (low complexity) or Markov Chain Monte Carlo (MCMC) (long convergence, difficult to
diagnose). However, lately there have been interesting results on using variational inference to do this [1], and this has sparked a great deal of
interest in the area.

BNNs are important in specific settings, especially when uncertainty issues are the biggest concern. Some examples of these cases are
decision making systems, (relatively) smaller data settings, Bayesian Optimization, model-based reinforcement learning and others.

Info: https://www.kdnuggets.com/2017/12/what-bayesian-neural-network.html and https://www.kdnuggets.com/2017/11/bayesian-networks-understanding-


135
effects-variables.html
Weight Uncertainty in Neural Networks. May 2015. Charles Blundell et.al.
Monte Carlo Simulation
Introduction
Monte Carlo Simulation is a mathematical technique that generates random variables for modelling
risk or uncertainty of a certain system.

The random variables or inputs are modelled on the basis of probability distributions such as normal,
log normal, etc. Different iterations or simulations are run for generating paths and the outcome is
arrived at by using suitable numerical computations.

Monte Carlo Simulation is the most tenable method used when a model has uncertain parameters or
a dynamic complex system needs to be analysed. It is a probabilistic method for modelling risk in a
system.

The method is used extensively in a wide variety of fields such as physical science, computational
biology, statistics, artificial intelligence, and quantitative finance. It is pertinent to note that Monte
Carlo Simulation provides a probabilistic estimate of the uncertainty in a model. It is never
deterministic.
137
Monte Carlo Methods
• Simple Monte Carlo
• Rejection Sampling
• Importance Sampling

Random variables or inputs are modelled on the basis of probability distributions


as indicated above

https://www.palisade.com/images3/product/risk/en/Distributions_monteCarloSim.jpg 138
Common probability distributions used in MC
• Normal Or “bell curve”: The user simply defines the mean or expected value and a standard deviation to describe the
variation about the mean. Values in the middle near the mean are most likely to occur. It is symmetric and describes many
natural phenomena such as people’s heights. Examples of variables described by normal distributions include inflation rates
and energy prices.
• Lognormal: Values are positively skewed, not symmetric like a normal distribution. It is used to represent values that don’t
go below zero but have unlimited positive potential. Examples of variables described by lognormal distributions include real
estate property values, stock prices, and oil reserves.
• Uniform: All values have an equal chance of occurring, and the user simply defines the minimum and maximum. Examples
of variables that could be uniformly distributed include manufacturing costs or future sales revenues for a new product.
• Triangular: The user defines the minimum, most likely, and maximum values. Values around the most likely are more likely
to occur. Variables that could be described by a triangular distribution include past sales history per unit of time and inventory
levels.
• PERT: The user defines the minimum, most likely, and maximum values, just like the triangular distribution. Values around
the most likely are more likely to occur. However values between the most likely and extremes are more likely to occur than
the triangular; that is, the extremes are not as emphasized. An example of the use of a PERT distribution is to describe the
duration of a task in a project management model.
• Discrete: The user defines specific values that may occur and the likelihood of each. An example might be the results of a
lawsuit: 20% chance of positive verdict, 30% change of negative verdict, 40% chance of settlement, and 10% chance of
mistrial.
https://www.palisade.com/risk/monte_carlo_simulation.asp 139
Methodology
• Monte Carlo simulation performs risk analysis by building models of possible results by substituting a range of
values—a probability distribution—for any factor that has inherent uncertainty. It then calculates results over
and over, each time using a different set of random values from the probability functions. Depending upon the
number of uncertainties and the ranges specified for them, a Monte Carlo simulation could involve thousands
or tens of thousands of recalculations before it is complete. Monte Carlo simulation produces distributions of
possible outcome values.
• By using probability distributions, variables can have different probabilities of different outcomes occurring.
Probability distributions are a much more realistic way of describing uncertainty in variables of a risk analysis.
• During a Monte Carlo simulation, values are sampled at random from the input probability distributions. Each
set of samples is called an iteration, and the resulting outcome from that sample is recorded. Monte Carlo
simulation does this hundreds or thousands of times, and the result is a probability distribution of possible
outcomes. In this way, Monte Carlo simulation provides a much more comprehensive view of what may
happen. It tells you not only what could happen, but how likely it is to happen.
• An enhancement to Monte Carlo simulation is the use of Latin Hypercube sampling, which samples more
accurately from the entire range of distribution functions.

140
Examples of Use

Financial Modelling Design Scenario Modelling Weather Forecasting

Example: https://www.youtube.com/watch?v=ohfUaDdJyzc
141
https://www.youtube.com/watch?v=0ikk_VEBkJw&vl=en
Advantages of MC
• Probabilistic Results: Results show not only what could happen, but how likely each outcome is.
• Graphical: Because of the data a Monte Carlo simulation generates, it’s easy to create graphs of
different outcomes and their chances of occurrence. This is important for communicating findings
to other stakeholders.
• Sensitivity Analysis: With just a few cases, deterministic analysis makes it difficult to see which
variables impact the outcome the most. In Monte Carlo simulation, it’s easy to see which inputs
had the biggest effect on bottom-line results.
• Scenario Analysis: In deterministic models, it’s very difficult to model different combinations of
values for different inputs to see the effects of truly different scenarios. Using Monte Carlo
simulation, analysts can see exactly which inputs had which values together when certain
outcomes occurred. This is invaluable for pursuing further analysis.
• Correlation of Inputs: In Monte Carlo simulation, it’s possible to model interdependent
relationships between input variables. It’s important for accuracy to represent how, in reality,
when some factors goes up, others go up or down accordingly.

https://www.palisade.com/risk/monte_carlo_simulation.asp
142
Example:
Markov Analysis
Introduction
• When Markov chains are used in reliability analysis, the process usually represents the various
stages (states) that a system can be in at any given time.

• The states are connected via transitions that represent the probability, or rate, that the system will
move from one state to another during a step, or a given time.

• When using probabilities and steps the Markov chain is referred to as a discrete Markov chain,
while a Markov chain that uses rate and the time domain is referred to as a continuous Markov
chain.

• It is used across many applications to represent a stochastic process made up of a sequence of


random variables representing the evolution of a system. Events are "chained" or "linked" serially
together though memoryless transitions from one state to another. The term "memoryless" is
used because past events are forgotten, as they are irrelevant; an event or state is dependent
only on the state or event that immediately preceded it.

More info: https://www.weibull.com/hotwire/issue177/hottopics177.htm


144
Discrete Markov Chains
A discrete Markov chain can be viewed as a Markov chain where at the end of a step, the system
will transition to another state (or remain in the current state), based on fixed probabilities. It is
common to use discrete Markov chains when analysing problems involving general probabilities,
genetics, physics, etc.
Take a system that can be in any one of three states — operational,
standby or offline — at a given time, and starts in the standby state.

After each step:


• If the system is in the operational state, there is a 20% chance that it
moves to the standby state, and a 5% chance that it goes offline.
• If it is in the standby state, there is a 40% chance that it becomes
operational, and a 1% chance that it goes offline.
• If it is in the offline state, there is a 15% chance that it becomes
operational, and a 50% chance that it moves to the standby state.

We want to know the probability that it is offline after 10 steps.

Graph shows the results. From the plot, we can also determine that the
probabilities of being in a state reach steady-state after about 6 steps.

Example: See http://reliawiki.org/index.php/Markov_Diagrams


145
Continuous Markov Chains
A continuous Markov chain can be viewed as a Markov chain where the transitions between states are defined
by (constant) transition rates, as opposed to transition probabilities at fixed steps. It is common to use
continuous Markov chains when analysing system reliability/availability problems.

Because we are no longer performing analysis using fixed probabilities and a fixed step, we are no longer able
to simply multiply a state probability vector with a transition matrix in order to obtain new state probabilities after
a given step.

Continuous Markov chains are often used for system availability/reliability analyses, as it has the ability to
designate one or more states as unavailable states. This allows for the calculation of both availability and
reliability of the system.
• Availability is calculated as the mean probability that the system is in a state that is not an unavailable state.
• Reliability is calculated in the same manner as availability, with the additional restriction that all transitions leaving any
unavailable state are considered to have a transition rate of zero.

146
Example
Assume you have a system composed of two generators. The system can be in one of
three states:
• Both generators are operational
• One generator is operational and the other is under repair
• Both generators are under repair. This is an unavailable state.
The system starts in the state in which both generators are operational. We know that
the failure rate of a generator is 1 per 2,000 hours, and the repair rate is 1 per 200 hours.
Therefore:
• The transition rate from the state in which both generators are operational to the state where only one is
operational is 1 per 1,000 hours.
• The transition rate from the state in which one generator is operational to the state where both generators
are operational is 1 per 200 hours.
• The transition rate from the state in which one generator is operational to the state where both generators
are under repair is 1 per 2,000 hours.
• The transition rate from the state in which both generators are under repair to the state where one generator
is operational is 1 per 100 hours.

We would like to know the mean availability of our system after 20,000 hours for all three
states so that we can estimate our output based on time spent at full, half and zero
generator capacity.

From the Mean Probability column, we can see that the system is expected to be fully operational
82.8% of the time, half operational 16.4% of the time, and non-operational 0.8% of the time.
From the Point Probability (Av) column, we can get the point probability of being in a state when all
transitions are considered. From the Point Probability (Rel) column, we can get the point probability of
being in a state if we assume that there is no return from unavailable states, or in other words we are
assuming no repair once the system has entered an unavailable (failed) state. Using the "non-repair"
assumption, there is only an 18.0% chance that the system would still be fully operational, a 3.3%
chance that it would be half operational and a 78.7% chance that it would be non-operational.

Example: http://reliawiki.org/index.php/Markov_Diagrams
147
Regularly used Markov Chain Methods
• Markov
• Gibbs Sampling
• Metropolis Algorithm
• Metropolis-Hastings Algorithm
• Hybrid Monte Carlo

A
148
Advantages & Disadvantages of Markov Chain Methods
• Markov analysis has the advantage of being an analytical method which means that the reliability parameters for the
system are calculated in effect by a formula. This has the considerable advantages of speed and accuracy when
producing results. Speed is especially useful when investigating many alternative variations of design or exploring a
range of sensitivities. In contrast accuracy is vitally important when investigating small design changes or when the
reliability or availability of high integrity systems are being quantified. Markov analysis has a clear advantage over MCS
in respect of speed and accuracy since MCS requires longer simulation runs to achieve higher accuracy and, unlike
Markov analysis, does not produce an “exact” answer.

• As in the case of applying MCS, Markov analysis requires great care during the model building phase since model
accuracy is all-important in obtaining valid results. The assumptions implicit in Markov models that are associated with
“memory-lessness” and the Exponential distribution to represent times to failure and repair provide additional
constraints to those within MCS. Markov models can therefore become somewhat contrived if these implicit
assumptions do not reflect sufficiently well the characteristics of a system and how it functions in practice. In order to
gain the benefits of speed and accuracy that it can offer, Markov analysis depends to a greater extent on the
experience and judgement of the modeller than MCS. Also, whilst MCS is a safer and more flexible approach, it does
not always offer the speed and accuracy that may be required in particular system studies.

https://egertonconsulting.com/markov-analysis-brief-introduction/?doing_wp_cron=1575035940.5098938941955566406250
149
Petri Nets
Introduction
A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never
between places or between transitions. The places from which an arc runs to a transition are called the input
places of the transition; the places to which arcs run from a transition are called the output places of the
transition.

Graphically, places in a Petri net may contain a discrete number of marks called tokens. Any distribution of
tokens over the places will represent a configuration of the net called a marking. In an abstract sense relating to
a Petri net diagram, a transition of a Petri net may fire if it is enabled, i.e. there are sufficient tokens in all of its
input places; when the transition fires, it consumes the required input tokens, and creates tokens in its output
places. A firing is atomic, i.e. a single non-interruptible step.

Unless an execution policy is defined, the execution of Petri nets is nondeterministic: when multiple transitions
are enabled at the same time, they will fire in any order.

Since firing is nondeterministic, and multiple tokens may be present anywhere in the net (even in the same
place), Petri nets are well suited for modelling the concurrent behaviour of distributed systems.
https://en.wikipedia.org/wiki/Petri_net
151
More Info: https://slideplayer.com/slide/3289096/
Example

Source: https://www.researchgate.net/figure/Petri-net-model-of-an-XML-firewall-with-one-application-and-one-web-service_fig2_220885110
152
Life Data Analysis Concepts (Weibull Analysis)
Life Data Analysis
• Commonly referred to as Weibull Analysis.
• Reliability Life Data Analysis refers to the study and modelling of observed product lives. Life data
can be lifetimes of products in the marketplace, such as the time the product operated successfully
or the time the product operated before it failed. These lifetimes can be measured in hours, miles,
cycles-to-failure, stress cycles or any other metric with which the life or exposure of a product can
be measured.
• All such data of product lifetimes can be encompassed in the term life data or, more specifically,
product life data. The subsequent analysis and prediction are described as life data analysis.
• Life data analysis requires the practitioner to:
Eta (η) represents the
• Gather life data for the product. characteristic life of an item,
defined as the time at which
• Select a lifetime distribution that will fit the data and model 63.2% of the population has
the life of the product. failed. The shape
parameter, beta (β), is the
• Estimate the parameters that will fit the distribution to the data. slope of the best-fit line
through the data points on a
• Generate plots and results that estimate the life characteristics Weibull plot.
of the product, such as the reliability or mean life.

http://reliawiki.org/index.php/Introduction_to_Life_Data_Analysis 154
Life Data Classification
Types of Life Data
• Complete
• Censored
• Right Censored (Suspended)
• Interval Censored
• Left-Censored

Parameter Estimation: In order to fit a statistical model to a life data set, the
analyst estimates the parameters of the life distribution that will make the
function most closely fit the data. The parameters control the scale, shape and
location of the pdf function.
For example, in the 3-parameter Weibull model (shown on the right), the scale
parameter, η , defines where the bulk of the distribution lies. The shape
parameter, β, defines the shape of the distribution and the location parameter,
Ɣ, defines the location of the distribution in time.

156
Ranking of Data
Ranking of Data
• Mean Rank: Mean ranks are based on the distribution-free model and are used mostly to plot
symmetrical statistical distributions, such as the normal.
• Median Rank: Median ranking is the method most frequently used in probability plotting, particularly if
the data are known not to be normally distributed. Median rank can be defined as the cumulative
percentage of the population represented by a particular data sample with 50% confidence.
• Cumulative Binomial Method for Median Ranks
• Algebraic Approximation of the Median Rank: When neither software nor tables are available or when
the sample is beyond the range covered by the available tables the approximation formula, known as
Benard’s approximation, can be used.

More info: See prescribed Handbook and https://accendoreliability.com/primer-probability-plots/


158
Example: https://www.youtube.com/watch?v=0cDhXbPyvSE
Ranking of Data – An example of how this is used

159
Example: https://data-flair.training/blogs/seo-ranking-factors/
Ranking of Censored Data
• Survival analysis is a type of semi-supervised ranking task where the target output (the survival
time) is often right-censored.

• Utilizing this information is a challenge because it is not obvious how to correctly incorporate these
censored examples into a model.

• Three categories of loss functions, namely partial likelihood methods, rank methods, and a new
classification method based on a Wasserstein metric (WM) and the non-parametric Kaplan Meier
estimate of the probability density to impute the labels of censored examples, can take advantage
of this information.

• The proposed method* allows a model that predict the probability distribution of an event.

160
*Example: https://arxiv.org/pdf/1806.01984.pdf
Concept of Rank Regression
• The rank regression method for parameter estimation, also known as the least squares method. This is, in
essence, a more formalized method of the manual probability plotting technique, in that it provides a
mathematical method for fitting a line to plotted failure data points.
• The x-axis coordinates represent the failure times, while the y-axis coordinates represent unreliability
estimates. These unreliability estimates are usually obtained via median ranks, hence the term rank
regression.
• Least squares, or least sum of squares, regression
requires that a straight line be fitted to a set of data
points, such that the sum of the squares of the distance
of the points to the fitted line is minimized.
This minimization can be performed in either the vertical
or horizontal direction. If the regression is on the x-axis,
then the line is fitted so that the horizontal deviations
from the points to the line are minimized. If the
regression is on the y-axis, then this means that the
distance of the vertical deviations from the points to the
line is minimized.
161
https://www.weibull.com/hotwire/issue10/relbasics10.htm
Confidence Bounds
Confidence Bounds for Life Data Analysis
• Estimating the precision of an estimate is an important concept in the field of
reliability engineering, leading to the use of confidence intervals.

• When we use two-sided confidence bounds (or intervals), we are looking at a


closed interval where a certain percentage of the population is likely to lie.

• One-sided confidence bounds are essentially an open-ended version of two-


sided bounds. A one-sided bound defines the point where a certain
percentage of the population is either higher or lower than the defined point.
This means that there are two types of one-sided bounds: upper and lower.
An upper one-sided bound defines a point that a certain percentage of the
population is less than. Conversely, a lower one-sided bound defines a point
that a specified percentage of the population is greater than.

163
Maximum Likelihood Estimation ( MLE)
Maximum Likelihood Estimation (MLE)
• The Maximum likelihood estimation (MLE) method is
considered to be one of the most robust parameter estimation
techniques.
• Maximum likelihood estimation endeavours to find the
most "likely" values of distribution parameters for a set of
data by maximizing the value of what is called the
"likelihood function." This likelihood function is largely
based on the probability density function (pdf) for a given
distribution.
• The graphic gives an example of a likelihood function surface
plot for a two-parameter Weibull distribution. Thus, the "peak"
of the likelihood surface function corresponds to the values of
the parameters that maximize the likelihood function, i.e. the
MLE estimates for the distribution's parameters.

165
More info: https://weibull.com/hotwire/issue9/relbasics9.htm
Managing Variations in Engineering

Methods to compare and recommend the most suitable options


Variation
• Reliability is influenced by variability, in parameter values such as resistance of resistors, material
properties, or dimensions of parts.

• Variation is inherent in all manufacturing processes, and designers should understand the nature and
extent of possible variation in the parts and processes they specify. They should know how to measure
and control this variation, so that the effects on performance and reliability are minimized.

• Variation also exists in the environments that engineered products must withstand. Temperature,
mechanical stress, vibration spectra, and many other varying factors must be considered.

• Statistical methods provide the means for analysing, understanding and controlling variation (also
known as “Statistical Process Control (SPC)” – will be covered in later modules in detail.

It is not necessary to apply statistical methods to understand every engineering problem, since many
are purely deterministic or easily solved using past experience or information available in sources such
as data books, specifications, design guides, and in known physical relationships such as Ohm’s law.

Statistical Process Control is Handled in Module 6.


167
SPC Introductory Reference: http://www.cimt.org.uk/projects/mepres/alevel/fstats_ch8.pdf
Variation in Engineering
• Every practical engineering design must take account of the effects of the variation inherent in
parameters, environments, and processes. Variation and its effects can be considered in three
categories:
• Deterministic, or causal, which is the case when the relationship between a parameter and its effect
is known, and we can use theoretical or empirical formulae, for example, we can use Ohm’s law to
calculate the effect of resistance change on the performance of a voltage divider. No statistical
methods are required. The effects of variation are calculated by inserting the expected range of values
into the formulae.
• Functional, which includes relationships such as the effect of a change of operating procedure,
human mistakes, calibration errors, and so on. There are no theoretical formulae. In principle these
can be allowed for, but often are not, and the cause and effect relationships are not always easy to
identify or quantify.
• Random. These are the effects of the inherent variability of processes and operating conditions. They
can be considered to be the variations that are left unexplained when all deterministic and functional
causes have been eliminated.
168
Continuous Variation
• The variation of parameters in engineering applications
(machined dimensions, material strengths, transistor gains,
resistor values, temperatures, etc.) are conventionally
described in two ways.
• The first, and the simplest, is to state maximum and minimum values, or
tolerances. This provides no information on the nature, or shape, of the
actual distribution of values. However, in many practical cases, this is
sufficient information for the creation of manufacture-able, reliable designs.
• The second approach is to describe the nature of the variation, using data
derived from measurements.

• It should be noted that Variation can also be progressive, for


example due to wear, material fatigue, change of lubricating
properties, or electrical parameter drift

169
Discrete Variation
Types of Discrete Variation
• The Binomial Distribution: The binomial distribution describes
a situation in which there are only two outcomes, such as pass
or fail, and the probability remains the same for all trials. (Trials
which give such results are called Bernoulli trials.)

• The Poisson Distribution: If events are Poisson-distributed


they occur at a constant average rate, with only one of the two
outcomes countable, for example, the number of failures in a
given time or defects in a length of wire.

See handbook, pages 48 to 51 for examples and formulae


https://en.wikipedia.org/wiki/Poisson_distribution 171
https://en.wikipedia.org/wiki/Binomial_distribution
Parametric vs. Non-Parametric Binomials
• Parametric Statistics – used to make inferences about population parameters
• T-test or Anova

• F-Test

• Chi-square Test

• Non-Parametric Statistics – do not assume that the data or populating have any characteristic
structure (Numerous procedures – statistical study in its own right!)
• Correlation (Spearman’s rank)

• Non-parametric Regression

• ANOVA

• Chi-Square

• Resampling methods

172
Tutorial Video: https://www.youtube.com/watch?v=3bcYLj11uME
Examples

173
https://www.docsity.com/en/probability-cheat-sheet/4176747/
Correlation

174
Non-Parametric Inferential Methods
• Methods have been developed for measuring and
comparing statistical variables when no assumption is
made as to the form of the underlying distributions.
These are called non-parametric (or distribution-free)
statistical methods.
• They are only slightly less powerful than parametric
methods in terms of the accuracy of the inferences
derived for assumed normal distributions.
• However, they are more powerful when the
distributions are not normal.
• They are also simple to use. Therefore they can be
very useful in reliability work provided that the data
being analysed are independently and identically
distributed (IID).

See pages 57 – 59 in handbook for methods and examples


https://www.slideshare.net/AnchalGarg8/non-parametric-statistics-70145722 175
Types of Methods
• Comparison of Media Values
• The Weighted Sign Test
• Tests for Variance (See Prescribed Book, Chapter 11)
• Reliability Estimates (See Prescribed Book, Chapter 13)

Procedures for estimating reliability: https://slideplayer.com/slide/3587749/


176
Statistical Handbooks & References
Some good Statistical reference handbooks
• Statistical Data Analysis. Glen Cowan. (1998)
• Practical Statistics for Data Scientists. Andrew Bruce and Peter Bruce. May 2017
• Time Series Analysis and its Applications. With R Examples. 4th Edition. Shumway & Stoffer’s.
• Doing Bayesian Data Analysis. A tutorial with R, JAGS and Stan.
• An Introduction to Statistical Learning - with Applications in R by Gareth James, Daniela Witten, Trevor
Hastie and Robert Tibshirani
• Probability and Statistics by Example: Basic Probability and Statistics: Volume 1.13 Oct 2005 by Yuri
Suhov (Author)
• CAST: http://cast.massey.ac.nz/collection_public.html A collection of computer assisted statistics
textbooks including core statistics e-books
• 100 Statistical Tests. Gopal Kanji.

178
Statistical Tutorials
• Self-paced statistical learning:
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
http://www.statstutor.ac.uk/
http://onlinestatbook.com/

• Probabilities in Statistics
https://revisionmaths.com/gcse-maths-revision/statistics-handling-data/probability

• APPLICATION OF STATISTICAL METHODS


https://ecstep.com/statistical-tests/
https://slideplayer.com/slide/3615703/
https://www.yumpu.com/en/document/read/46734003/statistical-concepts-and-statistical-describing-of-data
https://www.yumpu.com/en/document/read/25974564/overview-of-basic-concepts-in-statistics-and-probability-samsi

179
Statistical Software
• You get R for free from http://cran.us.r-project.org/. Typically it installs with a click.
• You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.

180
A

You might also like