Random Phenomena Text
Random Phenomena Text
Ogunnaike
Random Phenomena
Fundamentals and Engineering Applications of
Probability & Statistics
Random Phenomena
Fundamentals and Engineering Applications of Probability & Statistics
I frame no hypothesis; for whatever is not deduced from the phenomenon is to be called a hypothesis; and hypotheses, whether
metaphysical or physical, whether of occult qualities or mechanical, have no place in experimental philosophy.
Sir Isaac Newton (16421727)
In Memoriam
19312005
Some who only search for silver and gold
Soon nd what they cannot hold;
You searched after Gods own heart,
and left behind, too soon, your pilgrims chart
ii
Preface
In writing this book, I have been particularly cognizant of these basic facts
of 21st century science and engineering. And yet while most scientists and engineers are well-trained in problem formulation and problem solving when all
the entities involved are considered deterministic in character, many remain
uncomfortable with problems involving random variations, if such problems
cannot be idealized and reduced to the more familiar deterministic types.
Even after going through the usual one-semester course in Engineering Statistics, the discomfort persists. Of all the reasons for this circumstance, the most
compelling is this: most of these students tend to perceive their training in
statistics more as a set of instructions on what to do and how to do it, than as
a training in fundamental principles of random phenomena. Such students are
then uncomfortable when they encounter problems that are not quite similar
to those covered in class; they lack the fundamentals to attack new and unfamiliar problems. The purpose of this book is to address this issue directly by
presenting basic fundamental principles, methods, and tools for formulating
and solving engineering problems that involve randomly varying phenomena.
The premise is that by emphasizing fundamentals and basic principles, and
then illustrating these with examples, the reader will be better equipped to
deal with a range of problems wider than that explicitly covered in the book.
This important point is expanded further in Chapter 0.
iii
iv
v
The core of statistics is presented in Part IV (Chapters 1220). Chapter
12 lays the foundation with an introduction to the concepts and ideas behind
statistics, before the coverage begins in earnest in Chapter 13 with sampling
theory, continuing with statistical inference, estimation and hypothesis testing, in Chapters 14 and 15, and regression analysis in Chapter 16. Chapter
17 introduces the important but oft-neglected issue of probability model validation, while Chapter 18 on nonparametric methods extends the ideas of
Chapters 14 and 15 to those cases where the usual probability model assumptions (mostly the normality assumption) are invalid. Chapter 19 presents an
overview treatment of design of experiments. The third and nal set of case
studies is presented in Chapter 20 to illustrate the application of various aspects of statistics to real-life problems.
Part V (Chapters 2123) showcases the application of probability and
statistics with a hand-selected set of special topics: reliability and life testing
in Chapter 21, quality assurance and control in Chapter 22, and multivariate
analysis in Chapter 23. Each has roots in probability and statistics, but all
have evolved into bona de subject matters in their own rights.
Key Features
Before presenting suggestions of how to cover the material for various audiences, I think it is important to point out some of the key features of the
textbook.
1. Approach. This book takes a more fundamental, rst-principles approach to the issue of dealing with random variability and uncertainty in
engineering problems. As a result, for example, the treatment of probability
distributions for random variables (Chapters 810) is based on a derivation of
each model from phenomenological mechanisms, allowing the reader to see the
subterraneous roots from which these probability models sprang. The reader is
then able to see, for instance, how the Poisson model arises either as a limiting
case of the binomial random variable, or from the phenomenon of observing
in nite-sized intervals of time or space, rare events with low probabilities of
occurrence; or how the Gaussian model arises from an accumulation of small
random perturbations.
2. Examples and Case Studies. This fundamental approach note above
is integrated with practical applications in the form of a generous amount
of examples but also with the inclusion of three chapter-length application
case studies, one each for probability, probability distributions, and statistics.
In addition to the usual traditional staples, many of the in-chapter examples
have been drawn from non-traditional applications in molecular biology (e.g.,
DNA replication origin distributions; gene expression data, etc.), from nance
and business, and from population demographics.
vi
3. Computers, Computer Software, On-line resources. As expanded
further in the Appendix, the availability of computers has transformed the
teaching and learning of probability and statistics. Statistical software packages are now so widely available that many of what used to be staples of
traditional probability and statistics textbookstricks for carrying out various computations, approximation techniques, and especially printed statistical
tablesare now essentially obsolete. All the examples in this book were carried out with MINITAB, and I fully expect each student and instructor to have
access to one such statistical package. In this book, therefore, we depart from
tradition and do not include any statistical tables. Instead, we have included
in the Appendix a compilation of useful information about some popular software packages, on-line electronic versions of statistical tables, and a few other
on-line resources such as on-line electronic statistics handbooks, and websites
with data sets.
4. Questions, Exercises, Application Problems, Projects. No one feels
truly condent about a subject matter without having tackled (and solved!)
some problems; and a useful textbook ought to provide a good selection that
oers a broad range of challenges. Here is what is available in this book:
Review Questions: Found at the end of each chapter (with the exception
of the chapters on case studies), these are short, specic questions designed to test the readers basic comprehension. If you can answer all the
review questions at the end of each chapter, you know and understand
the material; if not, revisit the relevant portion and rectify the revealed
deciency.
Exercises: are designed to provide the opportunity to master the mechanics behind a single concept. Some may therefore be purely mechanical in the sense of requiring basic computations; some may require lling in the steps deliberately left as an exercise to the reader; some may
have the avor of an application; but the focus is usually a single aspect
of a topic covered in the text, or a straightforward extension thereof.
Application Problems: are more substantial practical problems whose
solutions usually require integrating various concepts (some obvious,
some not) and deploying the appropriate set of tools. Many of these are
drawn from the literature and involve real applications and actual data
sets. In such cases, the references are provided, and the reader may wish
to consult some of them for additional background and perspective, if
necessary.
Project assignments: allow deeper exploration of a few selected issues
covered in a chapter, mostly as a way of extending the coverage and
also to provide opportunities for creativity. By denition, these involve
a signicant amount of work and also require report-writing. This book
oers a total of nine such projects. They are a good way for students to
vii
learn how to plan, design, and execute projects and to develop writing
and reporting skills. (Each graduate student that has taken the CHEG
604 and CHEG 867 courses at the University of Delaware has had to do
a term project of this type.)
5. Data Sets. All the data sets used in each chapter, whether in the chapter
itself, in an example, or in the exercises or application problems, are made
available on-line and on CD.
Suggested Coverage
Of the three categories mentioned earlier, a methodical coverage of the entire textbook is only possible for Category I, in a two-semester undergraduate
sequence. For this group, the following is one possible approach to dividing
the material up into instruction modules for each semester:
First Semester
Module 1 (Foundations): Chapters 02.
Module 2 (Probability): Chapters 3, 4, 5 and 7.
Module 3 (Probability Models): Chapter 81 (omit detailed derivations
and Section 8.7.2), Chapter 91 (omit detailed derivations), and Chapter
111 (cover Sections 11.4 and 11.5 selectively; omit Section 11.6).
Module 4 (Introduction to Statistics/Sampling): Chapters 12 and 13.
Module 5 (Statistical Inference): Chapter 141 (omit Section 14.6), Chapter 151 (omit Sections 15.8 and 15.9), Chapter 161 (omit Sections 16.4.3,
16.4.4, and 16.5.2), and Chapter 17.
Module 6 (Design of Experiments): Chapter 191 (cover Sections 19.3
19.4 lightly; omit Section 19.10) and Chapter 20.
Second Semester
Module 7 (Probability and Models): Chapters 6 (with ad hoc reference
to Chapters 4 and 5); Chapters 82 and 92 (include details omitted in the
rst semester), Chapter 10.
Module 8 (Statistical Inference): Chapter 142 (Bayesian estimation, Section 14.6), Chapter 152 (Sections 15.8 and 15.9), Chapter 162 (Sections
16.4.3, 16.4.4, and 16.5.2), and Chapter 18.
Module 9 (Applications): Select one of Chapter 21, 22 or 23. (For chemical engineers, and anyone planning to work in the manufacturing industry, I recommend Chapter 22.)
With this as a basic template, other variations can be designed as appropriate.
For example, those who can only aord one semester (Category II) may
adopt the rst semester suggestion above, to which I recommend adding Chapter 22 at the end.
viii
The beginning graduate one-semester course (Cateogory III) may also be
based on the rst semester suggestion above, but with the following additional
recommendations: (i) cover of all the recommended chapters fully; (ii) add
Chapter 23 on multivariate analysis; and (iii) in lieu of a nal examination,
assign at least one, possibly two, of the nine projects.
This will make for a hectic semester, but graduate students should be able
to handle the workload.
A second, perhaps more straightforward, recommendation for a twosemester sequence is to devote the rst semester to Probability (Chapters
011), and the second to Statistics (Chapters 1220) along with one of the
three application chapters.
Acknowledgments
Pulling o a project of this magnitude requires the support and generous
assistance of many colleagues, students, and family. Their genuine words of encouragement and the occasional (innocent and not-so-innocent) inquiry about
the status of the book all contributed to making sure that this potentially
endless project was actually nished. At the risk of leaving someone out, I feel
some deserve particular mention. I begin with, in alphabetical order, Marc
Birtwistle, Ketan Detroja, Claudio Gelmi (Chile), Mary McDonald, Vinay
Prasad (Alberta, Canada), Paul Taylor (AIMS, Muizenberg, South Africa),
and Carissa Young. These are colleagues, former and current students, and
postdocs, who patiently waded through many versions of various chapters,
oered invaluable comments and caught many of the manuscript errors, typographical and otherwise. It is a safe bet that the manuscript still contains
a random number of these errors (few and Poisson distributed, I hope!) but
whatever errors remain are my responsibility. I encourage readers to let me
know of the ones they nd.
I wish to thank my University of Delaware colleagues, Antony Beris and
especially Dion Vlachos, with whom I often shared the responsibility of teaching CHEG 867 to beginning graduate students. Their insight into what the
statistics component of the course should contain was invaluable (as were the
occasional Greek lessons!). Of my other colleagues, I want to thank Dennis
Williams of Basel, for his interest and comments, and then single out former
fellow DuPonters Mike Piovoso, whose ngerprint is recognizable on the
illustrative example of Chapter 23, Ra Sela, now a Six-Sigma Master Black
Belt, Mike Deaton of James Madison University, and Ron Pearson, whose
near-encyclopedic knowledge never ceases to amaze me. Many of the ideas,
problems and approaches evident in this book arose from those discussions
and collaborations from many years ago. Of my other academic colleagues, I
wish to thank Carl Laird of Texas A & M for reading some of the chapters,
Joe Qin of USC for various suggestions, and Jim Rawlings of Wisconsin with
whom I have carried on a long-running discussion about probability and estimation because of his own interests and expertise in this area. David Bacon
ix
and John MacGregor, pioneers in the application of statistics and probability in chemical engineering, deserve my thanks for their early encouragement
about the project and for providing the occasional commentary. I also wish to
take this opportunity to acknowledge the inuence and encouragement of my
chemical engineering mentor, Harmon Ray. I learned more from Harmon than
he probably knew he was teaching me. Much of what is in this book carries
an echo of his voice and reects the Wisconsin tradition.
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
19
20
22
24
24
36
39
3.1
3.2
3.3
3.4
3.5
66
72
73
74
75
4.1
2.1
2.2
2.3
4.2
4.3
4.4
4.5
4.6
4.7
4.8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
37
91
97
110
110
111
112
117
118
xi
xii
5.1
5.2
5.3
5.4
6.1
6.2
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
variable of
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
149
159
159
160
178
193
262
267
270
274
286
289
291
295
295
298
304
304
305
307
xiii
9.15 Two uniform distributions over dierent ranges (0,1) and (2,10).
Since the total area under the pdf must be 1, the narrower pdf is
proportionately longer than the wider one. . . . . . . . . . . . .
Two F distribution plots for dierent values for 1 , the rst degree
of freedom, but the same value for 2 . Note how the mode shifts to
the right as 1 increases . . . . . . . . . . . . . . . . . . . . . .
Three tdistribution plots for degrees of freedom values =
5, 10, 100. Note the symmetrical shape and the heavier tail for
smaller values of . . . . . . . . . . . . . . . . . . . . . . . . . .
A comparison of the tdistribution with = 5 with the standard
normal N (0, 1) distribution. Note the similarity as well as the tdistributions comparatively heavier tail. . . . . . . . . . . . . .
A comparison of the tdistribution with = 50 with the standard
normal N (0, 1) distribution. The two distributions are practically
indistinguishable. . . . . . . . . . . . . . . . . . . . . . . . . . .
A comparison of the standard Cauchy distributions with the standard normal N (0, 1) distribution. Note the general similarities as
well as the Cauchy distributions substantially heavier tail. . . . .
Common probability distributions and connections among them .
315
319
340
379
381
382
9.16
9.17
9.18
9.19
9.20
9.21
11.1
11.2
11.3
11.4
309
311
312
313
313
383
383
384
386
388
388
389
389
391
392
393
xiv
11.15Relative sensitivity of the binomial model derived n to errors in
estimates of p as a function of p . . . . . . . . . . . . . . . . . .
12.1 Relating the tools of Probability, Statistics and Design of Experiments to the concepts of Population and Sample . . . . . . . . .
12.2 Bar chart of welding injuries from Table 12.1 . . . . . . . . . . .
12.3 Bar chart of welding injuries arranged in decreasing order of number
of injuries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 Pareto chart of welding injuries . . . . . . . . . . . . . . . . . .
12.5 Pie chart of welding injuries . . . . . . . . . . . . . . . . . . . .
12.6 Bar Chart of frozen ready meals sold in France in 2002 . . . . . .
12.7 Pie Chart of frozen ready meals sold in France in 2002 . . . . . .
12.8 Histogram for YA data of Chapter 1 . . . . . . . . . . . . . . . .
12.9 Frequency Polygon of YA data of Chapter 1 . . . . . . . . . . . .
12.10Frequency Polygon of YB data of Chapter 1 . . . . . . . . . . . .
12.11Boxplot of the chemical process yield data YA , YB of Chapter 1 .
12.12Boxplot of random N(0,1) data: original set, and with added outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.13Box plot of raisins dispensed by ve dierent machines . . . . . .
12.14Scatter plot of cranial circumference versus nger length: The plot
shows no real relationship between these variables . . . . . . . . .
12.15Scatter plot of city gas mileage versus highway gas mileage for various two-seater automobiles: The plot shows a strong positive linear
relationship, but no causality is implied. . . . . . . . . . . . . . .
12.16Scatter plot of highway gas mileage versus engine capacity for various two-seater automobiles: The plot shows a negative linear relationship. Note the two unusually high mileage values associated
with engine capacities 7.0 and 8.4 liters identied as belonging to
the Chevrolet Corvette and the Dodge Viper, respectively. . . . .
12.17Scatter plot of highway gas mileage versus number of cylinders for
various two-seater automobiles: The plot shows a negative linear
relationship. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.18Scatter plot of US population every ten years since the 1790 census versus census year: The plot shows a strong non-linear trend,
with very little scatter, indicative of the systematic, approximately
exponential growth . . . . . . . . . . . . . . . . . . . . . . . . .
12.19Scatter plot of Y1 and X1 from Anscombe data set 1. . . . . . . .
12.20Scatter plot of Y2 and X2 from Anscombe data set 2. . . . . . . .
12.21Scatter plot of Y3 and X3 from Anscombe data set 3. . . . . . . .
12.22Scatter plot of Y4 and X4 from Anscombe data set 4. . . . . . . .
396
415
420
420
421
422
423
424
425
427
428
429
430
431
432
433
434
434
435
444
445
445
446
469
470
xv
13.3 Sampling distribution of the mean diameter of ball bearings in Ex 10| 0.14) = P (|T | 0.62) .
ample 13.4 used to compute P (|X
473
475
13.5 Sampling distribution for the two variances of ball bearing diameters
in Example 13.6 used to compute P (F 1.41) + P (F 0.709) . .
476
491
504
511
for X/,
based on a sample of size n = 10 from an exponential
population . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
516
for X/,
based on a larger sample of size n = 100 from an exponential population . . . . . . . . . . . . . . . . . . . . . . . . . . .
517
15.1 A distribution for the null hypothesis, H0 , in terms of the test statistic, QT , where the shaded rejection region, QT > q, indicates a signicance level, . . . . . . . . . . . . . . . . . . . . . . . . . .
557
559
564
565
565
15.6 Box plot for Method A scores including the null hypothesis mean,
H0 : = 75, shown along with the sample average, x
, and the
95% condence interval based on the t-distribution with 9 degrees
of freedom. Note how the upper bound of the 95% condence interval
lies to the left of, and does not touch, the postulated H0 value . .
574
xvi
15.7 Box plot for Method B scores including the null hypothesis mean,
H0 , = 75, shown along with the sample average, x
, and the 95%
condence interval based on the t-distribution with 9 degrees of freedom. Note how the the 95% condence interval includes the postulated H0 value . . . . . . . . . . . . . . . . . . . . . . . . . . .
574
15.8 Box plot of dierences between the Before and After weights,
including a 95% condence interval for the mean dierence, and the
hypothesized H0 point, 0 = 0 . . . . . . . . . . . . . . . . . . .
15.9 Box plot of the Before and After weights including individual
data means. Notice the wide range of each data set . . . . . . . .
15.10A plot of the Before and After weights for each patient. Note
how one data sequence is almost perfectly correlated with the other;
in addition note the relatively large variability intrinsic in each data
set compared to the dierence between each point . . . . . . . . .
588
590
590
15.12 and power values for hypothesis test of Fig 15.11 with Ha
N (2.5, 1). Top:; Bottom: Power = (1 ) . . . . . . . . . . . .
15.13Rejection regions for one-sided tests of a single variance of a normal
population, at a signicance level of = 0.05, based on n = 10
samples. The distribution is 2 (9); Top: for Ha : 2 > 02 , indicating
rejection of H0 if c2 > 2 (9) = 16.9; Bottom: for Ha : 2 < 02 ,
indicating rejection of H0 if c2 < 21 (9) = 3.33 . . . . . . . . .
592
594
602
604
15.15Rejection regions for the two-sided tests of the equality of the vari2
2
= B
,
ances of the process A and process B yield data, i.e., H0 : A
at a signicance level of = 0.05, based on n = 50 samples each.
The distribution is F (49, 49), with the rejection region shaded; since
the test statistic, f = 0.27, falls within the rejection region to the
left, we reject H0 in favor of Ha . . . . . . . . . . . . . . . . . . .
606
649
654
xvii
16.3 The Gaussian assumption regarding variability around the true regression line giving rise to N (0, 2 ): The 6 points represent the
data at x1 , x2 , . . . , x6 ; the solid straight line is the true regression
line which passes through the mean of the sequence of the indicated
Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . .
655
16.4 The tted straight line to the Density versus Ethanol Weight % data:
The additional terms included in the graph, S, R-Sq and R-Sq(adj)
are discussed later . . . . . . . . . . . . . . . . . . . . . . . . .
659
16.5 The tted regression line to the Density versus Ethanol Weight %
data (solid line) along with the 95% condence interval (dashed line).
The condence interval is narrowest at x = x
and widens for values
further away from x
. . . . . . . . . . . . . . . . . . . . . . . . .
664
16.6 The tted straight line to the Cranial circumference versus Finger
length data. Note how the data points are widely scattered around
the tted regression line. (The additional terms included in the
graph, S, R-Sq and R-Sq(adj) are discussed later) . . . . . . . . .
667
16.7 The tted straight line to the Highway MPG versus Engine Capacity
data of Table 12.5 (leaving out the two inconsistent data points)
along with the 95% condence interval (long dashed line) and the
95% prediction interval (short dashed line). (Again, the additional
terms included in the graph, S, R-Sq and R-Sq(adj) are discussed
later). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
670
681
683
692
695
703
xviii
16.13Modeling the dependence of the boiling points (BP) of hydrocarbon
compounds in Table 16.1 on the number of carbon atoms in the compound: Top: Fitted cubic curve of BP versus n, the number of carbon
atoms; Bottom: standardized residuals versus tted value yi . There
appears to be little or no systematic structure left in the residuals,
suggesting that the cubic model provides an adequate description of
the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
705
707
17.1 Probability plots for safety data postulated to be exponentially distributed, each showing (a) rank ordered data; (b) theoretical tted
cumulative probability distribution line along with associated 95%
condence intervals; (c) a list of summary statistics, including the
p-value associated with a formal goodness-of-t test. The indication
from the p-values is that there is no evidence to reject H0 ; therefore
the model appears to be adequate . . . . . . . . . . . . . . . . .
738
739
740
17.4 Normal probability plot for the residuals of the regression analysis
of the dependence of thermal conductivity, k, on Temperature in
Example 16.5. The postulated model, a two-parameter regression
model with Gaussian distributed zero mean errors, appears valid. .
741
17.5 Chi-Squared test results for inclusions data and a postulated Poisson
model. Top panel: Bar chart of Expected and Observed frequencies, which shows how well the model prediction matches observed
data; Bottom Panel: Bar chart of contributions to the Chi-squared
statistic, showing that the group of 3 or more inclusions is responsible for the largest model-observation discrepancy, by a wide margin. 744
xix
18.2 Probability plot of interspike intervals data with postulated Gamma
model and Anderson-Darling test for the pyramidal tract cell of a
monkey. Top panel: when awake (PT-W); Bottom panel: when asleep
(PT-S). The p-values for the A-D tests indicate no evidence to reject
the null hypothesis . . . . . . . . . . . . . . . . . . . . . . . . .
776
800
802
19.3 Normal probability plots of the residuals from the one-way classication ANOVA model in Example 19.1. Top panel: Plot obtained
directly from the ANOVA analysis which does not provide any test
statistic or signicance level; Bottom panel: Subsequent goodnessof-t test carried out on saved residuals; note the high p-value associated with the A-D test. . . . . . . . . . . . . . . . . . . . . .
804
807
810
19.6 2 factorial design for factors A and B showing the four experimental
points; represents low values, + represents high values for each
factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
815
19.8 Normal probability plot for the eects, using Lenths method to
identify A, D and AD as signicant. . . . . . . . . . . . . . . . .
19.9 Normal probability plot for the residuals of the Etch rate model in
Eq (19.46) obtained upon projection of the experimental data to
retain only the signicant terms A, Gap (x1 ), D, Power (x2 ), and
the interaction AD, Gap*Power (x1 x2 ). . . . . . . . . . . . . . .
826
830
832
835
19.11The 3-factor Box-Behnken response surface design and its constituent parts: X1 , X2 : 22 factorial points moved to the center of
X3 to give the darker shaded circles at the edge-centers of the X3
axes; X2 , X3 : 22 factorial points moved to the center of X1 to give
the lighter shaded circles at the edge-centers of the X1 axes; X1 , X3 :
22 factorial points moved to the center of X2 to give the solid circles
at the edge-centers of the X2 axes; the center point, open circle. .
836
xx
20.1 Chi-Squared test results for Prussian army death by horse kicks data
and a postulated Poisson model. Top panel: Bar chart of Expected
and Observed frequencies; Bottom Panel: Bar chart of contributions to the Chi-squared statistic. . . . . . . . . . . . . . . . . .
20.2 Initial prior distribution, a Gamma (2,0.5), used to obtain a Bayesian
estimate for the Poisson mean number of deaths per unit-year parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.3 Recursive Bayesian estimates using yearly data sequentially, compared with the standard maximum likelihood estimate, 0.61,
(dashed-line). . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.4 Final posterior distribution (dashed line) along with initial prior
distribution (solid line). . . . . . . . . . . . . . . . . . . . . . .
20.5 Quadratic regression model t to US Population data along with
both the 95% condence interval and the 95% prediction interval.
20.6 Standardized residuals from the regression model t to US Population data. Top panel: Residuals versus observation order; Bottom
panel: Normal probability plot. Note the left-over pattern indicative
of serial correlation, and the unusual observations identied for the
1940 and 1950 census years in the top panel; note also the general
deviation of the residuals from the theoretical normal probability
distribution line in the bottom panel. . . . . . . . . . . . . . . .
20.7 Percent average relative population growth rate in the US for each
census year from 1800-2000 divided into three equal 70-year periods.
Period 1: 1800-1860; Period 2: 1870-1930; Period 3: 1940-2000. . .
20.8 Normal probability plot for the residuals from the ANOVA model
for Percent average relative population growth rate versus Period
with Period 1: 1800-1860; Period 2: 1870-1930; Period 3: 1940-2000.
20.9 Standardized residual plots for Yield response surface model: versus tted value, and normal probability plot. . . . . . . . . . . .
20.10Standardized residual plots for Adhesion response surface model:
versus tted value, and normal probability plot. . . . . . . . . . .
20.11Response surface and contour plots for Yield as a function of Additive and Temperature (with Time held at 60.00). . . . . . . . .
20.12Response surface and contour plots for Adhesion as a function of
Additive and Temperature (with Time held at 60.00). . . . . . . .
20.13Overlaid contours for Yield and Adhesion showing feasible region for desired optimum. The planted ag indicates the optimum
values of the responses along with the corresponding setting of the
factors Additive and Temperature (with Time held at 60.00) that
achieve this optimum. . . . . . . . . . . . . . . . . . . . . . . .
20.14Schematic diagram of folded helicopter prototype . . . . . . . . .
20.15Paper helicopter prototype . . . . . . . . . . . . . . . . . . . . .
861
864
867
868
874
875
877
878
884
885
886
887
888
891
893
902
902
xxi
21.3 Sampling-analyzer system: basic conguration . . . . . . . . . . .
21.4 Sampling-analyzer system: conguration with redundant solenoid
valve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.5 Fluid ow system with a cross link . . . . . . . . . . . . . . . .
21.6 Typical failure rate (hazard function) curve showing the classic three
distinct characteristic periods in the lifetime distributions of a population of items . . . . . . . . . . . . . . . . . . . . . . . . . .
907
907
909
913
926
927
927
928
928
929
22.1 OC Curve for a lot size of 1000, sample size of 32 and acceptance
number of 3: AQL is the acceptance quality level; RQL is the rejection quality level. . . . . . . . . . . . . . . . . . . . . . . . . . .
22.2 OC Curve for a lot size of 1000, generated for a sampling plan for an
AQL= 0.004 and an RQL = 0.02, leading to a required sample size
of 333 and acceptance number of 3. Compare with the OC curve in
Fig 22.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.3 A generic SPC chart for the generic process variable Y indicating a
sixth data point that is out of limits. . . . . . . . . . . . . . . .
22.4 The X-bar chart for the average length measurements for 6-inch nails
determined from samples of three measurements obtained every 5
mins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.5 The S-chart for the 6-inch nails process data of Example 22.2. . .
22.6 The combination Xbar-R chart for the 6-inch nails process data of
Example 22.2. . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.7 The combination I-MR chart for the Mooney viscosity data. . . .
22.8 P-chart for the data on defective mechanical pencils: note the 9th
observation that is outside the UCL. . . . . . . . . . . . . . . . .
22.9 C-chart for the inclusions data presented in Chapter 1, Table 1.2,
and discussed in subsequent chapters: note the 33rd observation that
is outside the UCL, otherwise, the process appears to be operating
in statistical control . . . . . . . . . . . . . . . . . . . . . . . .
22.10Time series plot of the original Mooney viscosity data of Fig 22.7
and Table 22.2, and of the shifted version showing a step increase of
0.7 after sample 15. . . . . . . . . . . . . . . . . . . . . . . . .
939
943
946
948
951
952
954
956
958
959
22.11I-chart for the shifted Mooney viscosity data. Even with = 0.5, it
is not sensitive enough to detect the step change of 0.7 introduced
after sample 15. . . . . . . . . . . . . . . . . . . . . . . . . . .
960
xxii
22.12Two one-sided CUSUM charts for the shifted Mooney viscosity data.
The upper chart uses dots; the lower chart uses diamonds; the nonconforming points are represented with the squares. With the same
= 0.5, the step change of 0.7 introduced after sample 15 is identied after sample 18. Compare with the I-Chart in Fig 22.11. . . .
962
22.13Two one-sided CUSUM charts for the original Mooney viscosity data
using the same characteristics as those in Fig 22.12. The upper
chart uses dots; the lower chart uses diamonds; there are no nonconforming points. . . . . . . . . . . . . . . . . . . . . . . . . .
962
22.14EWMA chart for the shifted Mooney viscosity data, with w = 0.2.
Note the staircase shape of the control limits for the earlier data
points. With the same = 0.5, the step change of 0.7 introduced
after sample 15 is detected after sample 18. The non-conforming
points are represented with the squares. Compare with the I-Chart
in Fig 22.11 and the CUSUM charts in Fig 22.12. . . . . . . . . .
964
22.15The EWMA chart for the original Mooney viscosity data using the
same characteristics as in Fig 22.14. There are no non-conforming
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
965
23.1 Examples of the bivariate Gaussian distribution where the two random variables are uncorrelated ( = 0) and strongly positively correlated ( = 0.9). . . . . . . . . . . . . . . . . . . . . . . . . . .
981
992
994
995
23.5 Plot of the scores and loading for the second principal component.
The distinct trend indicated in the scores should be interpreted along
with the loadings by comparison to the full original data set in Fig
23.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
996
23.6 Scores and loading plots for the rst two components. Top panel:
Scores plot indicates a quadratic relationship between the two scores
t1 and t2 ; Bottom panel: Loading vector plot indicates that in the
new set of coordinates, the original variables contain mostly pure
components PC1 and PC2 indicated by a distinctive North/South
and West/East alignment of the data vectors, with like variables
clustered together according to the nature of the component contributions. Compare to the full original data set in Fig 23.2. . . . . .
998
xxiii
23.8 Control limits for Q and T 2 for process data represented with two
principal components. . . . . . . . . . . . . . . . . . . . . . . . 1001
xxiv
List of Tables
1.1
1.2
1.3
1.4
1.5
.
.
.
.
.
13
16
18
19
21
2.1
44
3.1
3.2
3.3
63
65
85
4.1
96
103
104
4.2
4.3
5.1
5.2
5.3
5.4
5.5
. . . . .
. . . . .
. . . . .
. . . . .
Example
. . . . .
151
152
152
153
202
7.4
8.1
8.2
241
245
9.1
318
7.1
7.2
7.3
162
207
208
210
xxv
xxvi
10.1 Summary of maximum entropy probability models . . . . . .
11.1 Theoretical distribution of probabilities of possible outcomes of
an IVF treatment . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Elsner, et al., data of outcomes of a 42-month IVF treatment
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Binomial model prediction of Elsner, et al. data in Table 11.2
11.4 Elsner data stratied by age indicating variability in the probability of success estimates . . . . . . . . . . . . . . . . . . .
11.5 Stratied binomial model prediction of Elsner, et al. data. . .
12.1 Number and Type of injuries incurred by welders in the USA
from 1980-1989 . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Frozen Ready meals in France, in 2002 . . . . . . . . . . . . .
12.3 Group classication and frequencies for YA data . . . . . . . .
12.4 Number of raisins dispensed into trial-sized Raising Bran cereal boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5 Gasoline mileage ratings for a collection of two-seater automobiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6 Descriptive statistics for yield data sets YA and YB . . . . . .
12.7 The Anscombe data set 1 . . . . . . . . . . . . . . . . . . . .
12.8 The Anscombe data sets 2, 3, and 4 . . . . . . . . . . . . . .
356
373
376
378
379
382
419
422
425
430
433
441
443
443
549
15.1
15.2
15.3
15.4
15.5
15.6
558
566
571
577
579
550
586
587
598
601
604
608
645
649
658
xxvii
16.3 Density and weight percent of ethanol in ethanol-water mixture: model t and residual errors . . . . . . . . . . . . . . . .
16.4 Cranial circumference and nger lengths . . . . . . . . . . . .
16.5 ANOVA Table for Testing Signicance of Regression . . . . .
16.6 Thermal conductivity measurements at various temperatures
for a metal . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.7 Laboratory experimental data on Yield . . . . . . . . . . . .
17.1 Table of values for safety data probability plot . . . . . . . .
18.1 A professors teaching evaluation scores organized by student
type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.2 Interspike intervals data . . . . . . . . . . . . . . . . . . . . .
18.3 Summary of Selected Nonparametric Tests and their Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.1 Data table for typical single-factor experiment . . . . . . .
19.2 One-Way Classication ANOVA Table . . . . . . . . . . .
19.3 Data table for typical single-factor, two-way classication,
periment . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.4 Two-Way Classication ANOVA Table . . . . . . . . . . .
19.5 Data table for typical two-factor experiment . . . . . . . .
19.6 Two-factor ANOVA Table . . . . . . . . . . . . . . . . . .
. .
. .
ex. .
. .
. .
. .
659
666
675
679
693
735
759
773
779
799
801
806
808
813
813
858
859
862
866
869
871
877
880
921
949
953
956
xxviii
Contents
0 Prelude
0.1 Approach Philosophy . . . . . . . . . . . . . . . . . . . . . .
0.2 Four basic principles . . . . . . . . . . . . . . . . . . . . . . .
0.3 Summary and Conclusions . . . . . . . . . . . . . . . . . . .
1
1
3
5
Foundations
II
Probability
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
12
12
14
14
16
17
18
22
25
.
.
.
.
.
.
.
.
.
.
.
.
33
34
34
35
41
41
42
42
43
43
44
47
48
53
xxix
xxx
3 Fundamentals of Probability Theory
3.1 Building Blocks . . . . . . . . . . . . .
3.2 Operations . . . . . . . . . . . . . . . .
3.2.1 Events, Sets and Set Operations
3.2.2 Set Functions . . . . . . . . . . .
3.2.3 Probability Set Function . . . . .
3.2.4 Final considerations . . . . . . .
3.3 Probability . . . . . . . . . . . . . . . .
3.3.1 The Calculus of Probability . . .
3.3.2 Implications . . . . . . . . . . . .
3.4 Conditional Probability . . . . . . . . .
3.4.1 Illustrating the Concept . . . . .
3.4.2 Formalizing the Concept . . . . .
3.4.3 Total Probability . . . . . . . . .
3.4.4 Bayes Rule . . . . . . . . . . . .
3.5 Independence . . . . . . . . . . . . . . .
3.6 Summary and Conclusions . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
58
61
61
65
67
68
69
69
71
72
72
73
74
76
77
78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
89
90
90
94
94
95
95
98
100
102
102
105
107
107
113
115
116
119
119
122
122
123
124
124
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxi
5 Multidimensional Random Variables
137
5.1 Introduction and Denitions . . . . . . . . . . . . . . . . . . 138
5.1.1 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . 138
5.1.2 2-Dimensional (Bivariate) Random Variables . . . . . 139
5.1.3 Higher-Dimensional (Multivariate) Random Variables
140
5.2 Distributions of Several Random Variables . . . . . . . . . . 141
5.2.1 Joint Distributions . . . . . . . . . . . . . . . . . . . . 141
5.2.2 Marginal Distributions . . . . . . . . . . . . . . . . . . 144
5.2.3 Conditional Distributions . . . . . . . . . . . . . . . . 147
5.2.4 General Extensions . . . . . . . . . . . . . . . . . . . . 153
5.3 Distributional Characteristics of Jointly Distributed Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.2 Covariance and Correlation . . . . . . . . . . . . . . . 157
5.3.3 Independence . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 163
6 Random Variable Transformations
6.1 Introduction and Problem Denition .
6.2 Single Variable Transformations . . . .
6.2.1 Discrete Case . . . . . . . . . . .
6.2.2 Continuous Case . . . . . . . . .
6.2.3 General Continuous Case . . . .
6.2.4 Random Variable Sums . . . . .
6.3 Bivariate Transformations . . . . . . .
6.4 General Multivariate Transformations .
6.4.1 Square Transformations . . . . .
6.4.2 Non-Square Transformations . .
6.4.3 Non-Monotone Transformations .
6.5 Summary and Conclusions . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
171
172
172
173
175
176
177
182
184
184
185
188
188
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
197
198
199
199
201
201
205
208
209
209
210
212
212
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxii
III
Distributions
213
217
218
219
219
220
221
221
221
222
222
222
223
224
224
225
225
225
226
227
230
230
231
232
234
236
236
237
239
240
243
257
259
260
264
271
272
276
276
278
279
290
292
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxiii
9.3
9.4
9.2.4
9.2.5
Ratio
9.3.1
9.3.2
297
300
300
301
307
308
309
311
314
316
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
363
364
365
365
367
371
371
372
373
375
375
xxxiv
11.4.2 Binomial Model versus Clinical Data . . . . . . . . . .
11.5 Problem Solution: Model-based IVF Optimization and Analysis
11.5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 Model-based Analysis . . . . . . . . . . . . . . . . . .
11.5.3 Patient Categorization and Theoretical Analysis of
Treatment Outcomes . . . . . . . . . . . . . . . . . . .
11.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . .
11.6.1 General Discussion . . . . . . . . . . . . . . . . . . . .
11.6.2 Theoretical Sensitivity Analysis . . . . . . . . . . . . .
11.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . .
11.7.1 Final Wrap-up . . . . . . . . . . . . . . . . . . . . . .
11.7.2 Conclusions and Perspectives on Previous Studies and
Guidelines . . . . . . . . . . . . . . . . . . . . . . . . .
IV
Statistics
377
384
385
386
390
392
392
394
395
395
397
403
12 Introduction to Statistics
12.1 From Probability to Statistics . . . . . . . . . . . . . . .
12.1.1 Random Phenomena and Finite Data Sets . . . . .
12.1.2 Finite Data Sets and Statistical Analysis . . . . . .
12.1.3 Probability, Statistics and Design of Experiments .
12.1.4 Statistical Analysis . . . . . . . . . . . . . . . . . .
12.2 Variable and Data Types . . . . . . . . . . . . . . . . . .
12.3 Graphical Methods of Descriptive Statistics . . . . . . . .
12.3.1 Bar Charts and Pie Charts . . . . . . . . . . . . .
12.3.2 Frequency Distributions . . . . . . . . . . . . . . .
12.3.3 Box Plots . . . . . . . . . . . . . . . . . . . . . . .
12.3.4 Scatter Plots . . . . . . . . . . . . . . . . . . . . .
12.4 Numerical Descriptions . . . . . . . . . . . . . . . . . . .
12.4.1 Theoretical Measures of Central Tendency . . . . .
12.4.2 Measures of Central Tendency: Sample Equivalents
12.4.3 Measures of Variability . . . . . . . . . . . . . . .
12.4.4 Supplementing Numerics with Graphics . . . . . .
12.5 Summary and Conclusions . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
407
408
408
411
414
415
417
419
419
424
427
431
436
436
438
440
442
446
13 Sampling
13.1 Introductory Concepts . . . . . . . . . . . . . . . . .
13.1.1 The Random Sample . . . . . . . . . . . . . . .
13.1.2 The Statistic and its Distribution . . . . . .
13.2 The Distribution of Functions of Random Variables .
13.2.1 General Overview . . . . . . . . . . . . . . . .
13.2.2 Some Important Sampling Distribution Results
13.3 Sampling Distribution of The Mean . . . . . . . . . .
13.3.1 Underlying Probability Distribution Known . .
13.3.2 Underlying Probability Distribution Unknown .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
459
460
460
461
463
463
463
465
465
467
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxv
13.3.3 Limiting Distribution of the Mean
13.3.4 Unknown . . . . . . . . . . . . .
13.4 Sampling Distribution of the Variance . .
13.5 Summary and Conclusions . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
467
470
473
476
14 Estimation
487
14.1 Introductory Concepts . . . . . . . . . . . . . . . . . . . . . 488
14.1.1 An Illustration . . . . . . . . . . . . . . . . . . . . . . 488
14.1.2 Problem Denition and Key Concepts . . . . . . . . . 489
14.2 Criteria for Selecting Estimators . . . . . . . . . . . . . . . . 490
14.2.1 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . 490
14.2.2 Eciency . . . . . . . . . . . . . . . . . . . . . . . . . 491
14.2.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . 492
14.3 Point Estimation Methods . . . . . . . . . . . . . . . . . . . 493
14.3.1 Method of Moments . . . . . . . . . . . . . . . . . . . 493
14.3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . 496
14.4 Precision of Point Estimates . . . . . . . . . . . . . . . . . . 503
14.5 Interval Estimates . . . . . . . . . . . . . . . . . . . . . . . . 506
14.5.1 General Principles . . . . . . . . . . . . . . . . . . . . 506
14.5.2 Mean of a Normal Population; Known . . . . . . . . 507
14.5.3 Mean of a Normal Population; Unknown . . . . . . 508
14.5.4 Variance of a Normal Population . . . . . . . . . . . . 510
14.5.5 Dierence of Two Normal Populations Means . . . . . 512
14.5.6 Interval Estimates for Parameters from other Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
14.6 Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . 518
14.6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 518
14.6.2 Basic Concept . . . . . . . . . . . . . . . . . . . . . . 519
14.6.3 Bayesian Estimation Results . . . . . . . . . . . . . . 520
14.6.4 A Simple Illustration . . . . . . . . . . . . . . . . . . . 521
14.6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 524
14.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 527
15 Hypothesis Testing
15.1 Introduction . . . . . . . . . . . . . . . . . . . .
15.2 Basic Concepts . . . . . . . . . . . . . . . . . . .
15.2.1 Terminology and Denitions . . . . . . . .
15.2.2 General Procedure . . . . . . . . . . . . .
15.3 Concerning Single Mean of a Normal Population
15.3.1 Known; the z-test . . . . . . . . . . .
15.3.2 Unknown; the t-test . . . . . . . . .
15.3.3 Condence Intervals and Hypothesis Tests
15.4 Concerning Two Normal Population Means . . .
15.4.1 Population Standard Deviations Known .
15.4.2 Population Standard Deviations Unknown
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
551
552
554
554
560
561
563
570
575
576
576
578
xxxvi
15.4.3 Paired Dierences . . . . . . . . . . . . . . . . . . .
15.5 Determining , Power, and Sample Size . . . . . . . . . . .
15.5.1 and Power . . . . . . . . . . . . . . . . . . . . . .
15.5.2 Sample Size . . . . . . . . . . . . . . . . . . . . . . .
15.5.3 and Power for Lower-Tailed and Two-Sided Tests
15.5.4 General Power and Sample Size Considerations . . .
15.6 Concerning Variances of Normal Populations . . . . . . . .
15.6.1 Single Variance . . . . . . . . . . . . . . . . . . . . .
15.6.2 Two Variances . . . . . . . . . . . . . . . . . . . . .
15.7 Concerning Proportions . . . . . . . . . . . . . . . . . . . .
15.7.1 Single Population Proportion . . . . . . . . . . . . .
15.7.2 Two Population Proportions . . . . . . . . . . . . .
15.8 Concerning Non-Gaussian Populations . . . . . . . . . . .
15.8.1 Large Sample Test for Means . . . . . . . . . . . . .
15.8.2 Small Sample Tests . . . . . . . . . . . . . . . . . . .
15.9 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . .
15.9.1 General Principles . . . . . . . . . . . . . . . . . . .
15.9.2 Special Cases . . . . . . . . . . . . . . . . . . . . . .
15.9.3 Asymptotic Distribution for . . . . . . . . . . . .
15.10Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.11Summary and Conclusions . . . . . . . . . . . . . . . . . .
16 Regression Analysis
16.1 Introductory Concepts . . . . . . . . . . . . . . .
16.1.1 Dependent and Independent Variables . . .
16.1.2 The Principle of Least Squares . . . . . . .
16.2 Simple Linear Regression . . . . . . . . . . . . . .
16.2.1 One-Parameter Model . . . . . . . . . . . .
16.2.2 Two-Parameter Model . . . . . . . . . . . .
16.2.3 Properties of OLS Estimators . . . . . . . .
16.2.4 Condence Intervals . . . . . . . . . . . . .
16.2.5 Hypothesis Testing . . . . . . . . . . . . . .
16.2.6 Prediction and Prediction Intervals . . . . .
16.2.7 Coecient of Determination and the F-Test
16.2.8 Relation to the Correlation Coecient . . .
16.2.9 Mean-Centered Model . . . . . . . . . . . .
16.2.10 Residual Analysis . . . . . . . . . . . . . . .
16.3 Intrinsically Linear Regression . . . . . . . . . .
16.3.1 Linearity in Regression Models . . . . . . .
16.3.2 Variable Transformations . . . . . . . . . .
16.4 Multiple Linear Regression . . . . . . . . . . . . .
16.4.1 General Least Squares . . . . . . . . . . . .
16.4.2 Matrix Methods . . . . . . . . . . . . . . .
16.4.3 Some Important Special Cases . . . . . . .
16.4.4 Recursive Least Squares . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
585
591
591
593
598
599
600
601
603
606
607
610
613
613
614
616
616
619
622
623
624
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
647
648
650
651
652
652
653
660
661
664
668
671
676
677
678
682
682
685
686
687
688
694
698
xxxvii
16.5 Polynomial Regression . . . . . . . . . .
16.5.1 General Considerations . . . . . .
16.5.2 Orthogonal Polynomial Regression
16.6 Summary and Conclusions . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
700
700
704
710
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
731
732
733
733
734
736
737
739
739
742
745
18 Nonparametric Methods
18.1 Introduction . . . . . . . . . . . . . . . . . . . . .
18.2 Single Population . . . . . . . . . . . . . . . . . .
18.2.1 One-Sample Sign Test . . . . . . . . . . . .
18.2.2 One-Sample Wilcoxon Signed Rank Test . .
18.3 Two Populations . . . . . . . . . . . . . . . . . . .
18.3.1 Two-Sample Paired Test . . . . . . . . . . .
18.3.2 Mann-Whitney-Wilcoxon Test . . . . . . .
18.4 Probability Model Validation . . . . . . . . . . . .
18.4.1 The Kolmogorov-Smirnov Test . . . . . . .
18.4.2 The Anderson-Darling Test . . . . . . . . .
18.5 A Comprehensive Illustrative Example . . . . . .
18.5.1 Probability Model Postulate and Validation
18.5.2 Mann-Whitney-Wilcoxon Test . . . . . . .
18.6 Summary and Conclusions . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
757
758
760
760
763
765
766
766
770
770
771
772
772
775
777
19 Design of Experiments
19.1 Introductory Concepts . . . . . . . . . . . . .
19.1.1 Experimental Studies and Design . . . .
19.1.2 Phases of Ecient Experimental Studies
19.1.3 Problem Denition and Terminology . .
19.2 Analysis of Variance . . . . . . . . . . . . . . .
19.3 Single Factor Experiments . . . . . . . . . . .
19.3.1 One-Way Classication . . . . . . . . .
19.3.2 Kruskal-Wallis Nonparametric Test . . .
19.3.3 Two-Way Classication . . . . . . . . .
19.3.4 Other Extensions . . . . . . . . . . . . .
19.4 Two-Factor Experiments . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
791
793
793
794
795
796
797
797
805
805
811
811
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xxxviii
19.5 General Multi-factor Experiments . . .
19.6 2k Factorial Experiments and Design .
19.6.1 Overview . . . . . . . . . . . . .
19.6.2 Design and Analysis . . . . . . .
19.6.3 Procedure . . . . . . . . . . . . .
19.6.4 Closing Remarks . . . . . . . . .
19.7 Screening Designs: Fractional Factorial
19.7.1 Rationale . . . . . . . . . . . . .
19.7.2 Illustrating the Mechanics . . . .
19.7.3 General characteristics . . . . . .
19.7.4 Design and Analysis . . . . . . .
19.7.5 A Practical Illustrative Example
19.8 Screening Designs: Plackett-Burman . .
19.8.1 Primary Characteristics . . . . .
19.8.2 Design and Analysis . . . . . . .
19.9 Response Surface Designs . . . . . . . .
19.9.1 Characteristics . . . . . . . . . .
19.9.2 Response Surface Designs . . . .
19.9.3 Design and Analysis . . . . . . .
19.10Introduction to Optimal Designs . . . .
19.10.1 Background . . . . . . . . . . . .
19.10.2 Alphabetic Optimal Designs .
19.11Summary and Conclusions . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
814
814
814
816
817
821
822
822
822
823
825
827
832
833
833
834
834
835
836
837
837
838
839
Applications
895
xxxix
21 Reliability and Life Testing
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
21.2 System Reliability . . . . . . . . . . . . . . . . . . .
21.2.1 Simple Systems . . . . . . . . . . . . . . . . .
21.2.2 Complex Systems . . . . . . . . . . . . . . . .
21.3 System Lifetime and Failure-Time Distributions . .
21.3.1 Characterizing Time-to-Failure . . . . . . . .
21.3.2 Probability Models for Distribution of Failure
21.4 The Exponential Reliability Model . . . . . . . . . .
21.4.1 Component Characteristics . . . . . . . . . .
21.4.2 Series Conguration . . . . . . . . . . . . . .
21.4.3 Parallel Conguration . . . . . . . . . . . . .
21.4.4 m-of-n Parallel Systems . . . . . . . . . . . .
21.5 The Weibull Reliability Model . . . . . . . . . . . .
21.6 Life Testing . . . . . . . . . . . . . . . . . . . . . . .
21.6.1 The Exponential Model . . . . . . . . . . . .
21.6.2 The Weibull Model . . . . . . . . . . . . . . .
21.7 Summary and Conclusions . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Times
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Per. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
899
900
901
901
906
911
911
913
914
914
915
916
917
918
919
919
922
923
933
934
936
936
938
944
944
944
946
958
964
964
965
966
967
969
969
970
971
977
978
978
979
981
xl
23.1.4 Hotellings T -Squared Distribution
23.1.5 The Wilks Lambda Distribution .
23.1.6 The Dirichlet Distribution . . . . .
23.2 Multivariate Data Analysis . . . . . . . .
23.3 Principal Components Analysis . . . . .
23.3.1 Basic Principles of PCA . . . . . .
23.3.2 Main Characteristics of PCA . . .
23.3.3 Illustrative example . . . . . . . .
23.3.4 Other Applications of PCA . . . .
23.4 Summary and Conclusions . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
982
982
983
984
985
986
990
991
999
1002
Appendix
1005
Index
1009
Chapter 0
Prelude
0.1
0.2
0.3
Approach Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Four basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3
5
From weather forecasts and life insurance premiums for non-smokers to clinical
tests of experimental drugs and defect rates in manufacturing facilities, and
in numerous other ways, randomly varying phenomena exert a subtle but pervasive inuence on everyday life. In most cases, one can be blissfully ignorant
of the true implications of the presence of such phenomena without consequence. In science and engineering, however, the inuence of randomly varying phenomena can be such that even apparently simple problems can become
dramatically complicated by the presence of random variabilitydemanding
special methods and analysis tools for obtaining valid and useful solutions.
The primary aim of this book is to provide the reader with the basic fundamental principles, methods, and tools for formulating and
solving engineering problems involving randomly varying phenomena.
Since this aim can be achieved in several dierent ways, this chapter is
devoted to presenting this books approach philosophy.
0.1
Approach Philosophy
Random Phenomena
randomly varying phenomena of one sort or another; and the vast majority of
such problems cannot always be idealized and reduced to the more familiar
deterministic types without destroying the very essence of the problem. For
example, in determining which of two catalysts A or B provides the greater
yield in a chemical manufacturing process , it is well-known that the respective
yields YA and YB , as observed experimentally, are randomly varying quantities. Chapter 1 presents a full-scale discussion of this problem. For now, we
simply note that with catalyst A, fty dierent experiments performed under
essentially identical conditions will result in fty dierent values (realizations)
for YA . Similarly for catalyst B, one obtains fty distinct values for YB from
fty dierent experiments replicated under identical conditions. The rst 10
experimental data points for this example are shown in the table below.
YA % YB %
74.04 75.75
75.29 68.41
75.62 74.19
75.91 68.10
77.21 68.10
75.07 69.23
74.23 70.14
74.92 69.22
76.57 74.17
77.77 70.23
Observe that because of the variability inherent in the data, some of the YA
values are greater than some of the YB values; but the converse is also true
some YB values are greater than some YA values. So how does one determine
reliably and condentlywhich catalyst (if any) really provides the greater
yield? Clearly, special methods and analysis tools are required for handling
this apparently simple problem: the deterministic idealization of comparing a
single observed value of YA (say the rst entry, 74.04) with a corresponding
single observed value of YB (in this case 75.75) is incapable of producing a
valid answer. The primary essence of this problem is the variability inherent
in the data which masks the fact that one catalyst does in fact provide the
greater yield.
This book takes a more fundamental, rst-principles approach to the
issue of dealing with random variability and uncertainty in engineering problems. This is in contrast to the typical engineering statistics approach on the
one hand, or the problem-specic approach on the other. With the former
approach, most of the emphasis is on how to use certain popular statistical
techniques to solve some of the most commonly encountered engineering problems, with little or no discussion of why the techniques are eective. With the
latter approach, a particular topic (say Design of Experiments) is selected
and dealt with in depth, and the appropriate statistical tools are presented
and discussed within the context of the specic problem at the core of the
Prelude
selected topic. By denition, such an approach excludes all other topics that
may be of practical interest, opting to make up in depth what it gives up in
breadth.
The approach taken in this book is based on the premise that emphasizing fundamentals and basic principles, and then illustrating
these with examples, equips the reader with the means of dealing
with a range of problems wider than that explicitly covered in the
book.
0.2
1. If characterized properly, random phenomena are subject to rigorous mathematical analysis in much the same manner as deterministic phenomena.
Random phenomena are so-called because they show no apparent regularity, appearing to occur haphazardlytotally at random; the observed variations do not seem to obey any discernible rational laws and therefore appear to
be entirely unpredictable. However, the unpredictable irregularities of the individual observations (or, more generally, the detail) of random phenomena
in fact co-exist with surprisingly predictable ensemble, or aggregate, behavior. This fact makes rigorous analysis possible; it also provides the basis for
employing the concept and calculus of probability to develop a systematic
framework for characterizing random phenomena in terms of probability distribution functions.
The rst order of business is therefore to seek to understand random phenomena and to develop techniques for characterizing them appropriately. Part
I, titled FOUNDATIONS: Understanding Random Variability, and Part II,
titled PROBABILITY: Characterizing Random Variability, are devoted to
these respective tasks. Ultimately, probabilityand the probability distribution functionare introduced as the theoretical constructs for eciently describing our knowledge of the real-world phenomena in question.
2. By focusing on the underlying phenomenological mechanisms , it is possible
to develop appropriate theoretical characterizations of random phenomena in
terms of ideal models of the observed variability.
Within the probabilistic framework, the ensemble, or aggregate behavior
Random Phenomena
Prelude
them. Clearly then, the sheer vastness of the subject matter of engineering
applications of probability and statistics renders completely unreasonable any
hope of comprehensive coverage in a single introductory text.
Nevertheless, how probability and statistics are employed in practice to
deal successfully with various problems created by random variability and
uncertainty can be discussed in such a way as to equip the student with
the tools needed to approach, with condence, other problems that are not
addressed explicitly in this book.
Part V, titled APPLICATIONS: Dealing with Random Variability in Practice, consists of three chapters each devoted to a specic application topic of
importance in engineering practice. Entire books have been written, and entire courses taught, on each of the topics to which we will devote only one
chapter; the coverage is therefore designed to be more illustrative than comprehensive, providing the basis for absorbing and employing more eciently,
the more extensive material presented in these other books or courses.
0.3
This chapter has been primarily concerned with setting forth this books
approach to presenting the fundamentals and engineering applications of probability and statistics. The four basic principles on which the more fundamental,
rst principles approach is based were presented, providing the rationale for
the scope and organization of the material to be presented in the rest of the
book.
The approach is designed to produce the following result:
A course of study based on this book should provide the reader with
a reasonable fundamental understanding of random phenomena, a
working knowledge of how to model and analyze such phenomena,
and facility with using probability and statistics to cope with random variability and uncertainty in some key engineering problems.
The book should also prepare the student to absorb and employ the material presented in more problem-specic courses such as Design of Experiments,
Time Series Analysis, Regression Analysis, Statistical Process Control, etc, a
bit more eciently.
Random Phenomena
Part I
Foundations
Part I: Foundations
Understanding Random Variability
10
Part I: Foundations
Understanding Random Variability
Chapter 1
Two Motivating Examples
1.1
1.2
1.3
1.4
11
12
12
14
14
15
17
18
22
23
25
26
27
28
12
Random Phenomena
1.1
1.1.1
The Problem
TABLE 1.1:
Yield Data
for Process A versus Process B
YA %
YB %
74.04 75.29 75.75 68.41
75.63 75.92 74.19 68.10
77.21 75.07 68.10 69.23
74.23 74.92 70.14 69.23
76.58 77.77 74.17 70.24
75.05 74.90 70.09 71.91
75.69 75.31 72.63 78.41
75.19 77.93 71.16 73.37
75.37 74.78 70.27 73.64
74.47 72.99 75.82 74.42
73.99 73.32 72.14 78.49
74.90 74.88 74.88 76.33
75.78 79.07 70.89 71.07
75.09 73.87 72.39 72.04
73.88 74.23 74.94 70.02
76.98 74.85 75.64 74.62
75.80 75.22 75.70 67.33
77.53 73.99 72.49 71.71
72.30 76.56 69.98 72.90
77.25 78.31 70.15 70.14
75.06 76.06 74.09 68.78
74.82 75.28 72.91 72.49
76.67 74.39 75.40 76.47
76.79 77.57 69.38 75.47
75.85 77.31 71.37 74.12
13
14
Random Phenomena
3. If yes, is YA YB > 2?
Clearly, making the proper decision hinges on our ability to answer these
questions with condence.
1.1.2
Observe that the real essence of the problem is random variability: if each
experiment had resulted in the same single, constant number for YA and another for YB , the problem would be deterministic in character, and each of
the 3 associated questions would be trivial to answer. Instead, the random
phenomena inherent in the experimental determination of the true process
yields have been manifested in the observed variability, so that we are uncertain about the true values of YA and YB , making it not quite as trivial to
solve the problem.
The sources of variability in this case can be shown to include the measurement procedure, the measurement device itself, raw materials, and process
conditions. The observed variability is therefore intrinsic to the problem and
cannot be idealized away. There is no other way to solve this problem rationally without dealing directly with the random variability.
Next, note that YA and YB data (observations) take on values on a continuous scale i.e. yield values are real and can be located anywhere on the
real line, as opposed to quantities that can take on integer values only (as is
the case with the second example discussed later). The variables YA and YB
are therefore said to be continuous and this example illustrates decisionmaking under uncertainty when the random phenomena in question involve
continuous variables.
The main issues with this problem are as follows:
1. Characterization: How should the quantities YA and YB be characterized
so that the questions raised above can be answered properly?
2. Quantication: Are there such things as true values of the quantities
YA and YB ? If so, how should these true values be best quantied?
3. Application: How should the characterization and quantication of YA
and YB be used to answer the 3 questions raised above?
1.1.3
Before outlining procedures for solving this problem, it is helpful to entertain some notions that the intuition of a good scientist or engineer will
suggest. For instance, the concept of the arithmetic average of a collection
of n data points, x1 , x2 , x3 , . . . , xn , dened by:
1
xi
n i=1
n
x
=
(1.1)
15
is well-known to all scientists and engineers, and the intuitive notion of employing this single computed value to represent the data set is almost instinctive.
It seems reasonable therefore to consider representing YA with the computed
average obtained from the data, i.e. yA = 75.52, and similarly, representing
YB with yB = 72.47. We may now observe right away that yA > yB , which
now seems to suggest not only that YA > YB , but since yA yB = 3.05, that
the dierence in fact exceeds the threshold of 2%.
As intuitively appealing as these arguments might be, they raise some
important additional questions:
1. The variability of individual values of the data yAi around the average
value yA = 75.52 is noticeable; that of yBi around the average value yB =
72.47 even more so. How condent then are we about the arguments
presented above, and in the implied recommendation to prefer process
A to B, based as they are on the computed averages? (For example,
there are some 8 values of yBi > yA ; what should we make of this fact?)
2. Will it (or should it) matter that
72.30 < yAi < 79.07
67.33 < yBi < 78.41
(1.2)
so that the observed data are seen to vary over a range of yield values
that is 11.08 units wide for process B as opposed to 6.77 for A? The
averages give no indication of these extents of variability.
3. More fundamentally, is it always a good idea to work with averages? How
reasonable is it to characterize the entire data set with the average?
4. If new sets of data are gathered, the new averages computed from them
will almost surely dier from the corresponding values computed from
the current set of data shown here. Observe therefore that the computed
averages yA and yB are themselves clearly subject to random variability.
How can we then be sure that using averages oers any advantages,
since, like the original data, these averages are also not free from random
variability?
5. How were the data themselves collected? What does it mean concretely
that the 50 experiments were carefully performed? Is it possible that
the experimental protocols used may have impaired our ability to answer
the questions posed above adequately? Conversely, are there protocols
that are particularly calibrated to improve our ability to answer these
questions adequately?
Obviously therefore there is a lot more to dealing with this example problem
than merely using the intuitively appealing notion of averages.
Let us now consider a second, dierent but somewhat complementary,
example.
16
Random Phenomena
TABLE 1.2:
inclusions
glass sheets
0 1 1
2 0 2
1 2 0
1 1 5
2 1 0
1 0 0
1.2
Number of
on sixty 1-sq meter
1
2
1
2
0
2
0
3
0
0
1
4
0
2
1
0
1
0
1
0
0
1
0
1
0
0
0
4
0
1
2
2
1
1
1
0
2
0
1
1
1
1
17
1.3
Even though the two illustrative problems presented above are dierent in
so many ways (one involves continuous variables, the other a discrete variable;
one is concerned with comparing two entities to each other, the other pits a
single set of data against a design target), the systematic approach to solving
such problems provided by probability and statistics applies to both in a
unied way. The fundamental issues at stake may be stated as follows:
In light of its dening characteristics of intrinsic variability, how
should randomly varying quantities be characterized and quantied precisely in order to facilitate the solution of practical problems
involving them?
18
Random Phenomena
TABLE 1.3:
Group classication
and frequencies for YA data (from the
proposed process)
Relative
YA group Frequency Frequency
71.51-72.50
1
0.02
2
0.04
72.51-73.50
9
0.18
73.51-74.50
74.51-75.50
17
0.34
75.51-76.50
7
0.14
8
0.16
76.51-77.50
77.51-78.50
5
0.10
78.51-79.50
1
0.02
TOTAL
50
1.00
What now follows is a somewhat informal examination of the ideas and concepts behind these time-tested techniques. The purpose is to motivate and
provide context for the more formal discussions in upcoming chapters.
1.3.1
Let us revisit the example data sets and consider the following alternative
approach to the data representation. Instead of focusing on individual observations as presented in the tables of raw data, what if we sub-divided the
observations into small groups (called bins) and re-organized the raw data
in terms of how frequently members of each group occur? One possible result
is shown in Tables 12.3 and 1.4 respectively for process A and process B. (A
dierent bin size will lead to a slightly dierent group classication but the
principles remain the same.)
This reclassication indicates, for instance, that for YA , there is only one
observation between 71.51 and 72.50 (the actual number is 72.30), but there
are 17 observations between 74.51 and 75.50; for YB on the other hand, 3
observations fall in the [67.51-68.50] group whereas there are 8 observations
between 69.51 and 70.50. The relative frequency column indicates what
proportion of the original 50 data points are found in each group. A plot of
this reorganization of the data, known as the histogram, is shown in Figure
12.8 for YA and Figure 1.2 for YB .
The histogram, a term rst used by Pearson in 1895, is a graphical representation of data from a group-classication and frequency-of-occurrence
perspective. Each bar represents a distinct group (or class) within the data
set, with the bar height proportional to the group frequency. Because this
graphical representation provides a picture of how the data are distributed
in terms of the frequency of occurrence of each group (how much each group
19
TABLE 1.4:
Group classication
and frequencies for YB data (from the
incumbent process)
Relative
YB group Frequency Frequency
66.51-67.50
1
0.02
3
0.06
67.51-68.50
68.51-69.50
4
0.08
8
0.16
69.51-70.50
4
0.04
70.51-71.50
71.51-72.50
7
0.14
4
0.08
72.51-73.50
6
0.12
73.51-74.50
74.51-75.50
5
0.10
6
0.12
75.51-76.50
0
0.00
76.51-77.50
77.51-78.50
2
0.04
0
0.00
78.51-79.50
TOTAL
50
1.00
18
16
14
Frequency
12
10
8
6
4
2
0
72
73
74
75
YA
76
77
78
79
20
Random Phenomena
9
8
7
Frequency
6
5
4
3
2
1
0
68
70
72
74
76
78
YB
21
TABLE 1.5:
Group
classication and frequencies for the
inclusions data
Relative
Frequency Frequency
X
0
22
0.367
23
0.383
1
11
0.183
2
3
1
0.017
4
2
0.033
1
0.017
5
6
0
0.000
TOTAL
60
1.000
22
Random Phenomena
25
Frequency
20
15
10
Inclusions
5% of the glass sheets (3 out of 60) have more than 3 inclusions, the remaining
95% have 3 or fewer; 93.3% (56 out of 60) have 2 or fewer inclusions. The
important point is that such quantitative characteristics of the data variability
(made possible by the histogram) is potentially useful for answering practical
questions about what one can reasonably expect from this process.
1.3.2
Theoretical Distributions
How can the benets of the histogram be consolidated into a useful tool
for quantitative analysis of randomly varying phenomena? The answer: by appealing to a fundamental axiom of random phenomena: that conceptually, as
more observations are made, the shape of the data histogram stabilizes, and
tends to the form of the theoretical distribution that characterizes the random
phenomenon in question, in the limit as the total number of observations approaches innity. It is important to note that this concept does not necessarily
require that an innite number of observations actually be obtained in practice, even if this were possible. The essence of the concept is that an underlying
theoretical distribution exists for which the frequency distribution represented
by the histogram is but a nite sample approximation; that the underlying theoretical distribution is an ideal model of the particular phenomenon
responsible for generating the nite number of observations contained in the
current data set; and hence that this theoretical distribution provides a reasonable mathematical characterization of the random phenomenon.
As we show later, these theoretical distributions may be derived from rst
principles given sucient knowledge regarding the underlying random phenomena. And, as the brief informal examination of the illustrative histograms
23
above indicates, these theoretical distributions can be used for various things.
For example, even though we have not yet provided any concrete denition
of the term probability, neither have we given any concrete justications of
its usage in this context, still from the discussion in the previous section, the
reader can intuitively attest to the reasonableness of the following statements:
the probability that YA 74.5 is 0.76; or the probability that YB 74.5
is 0.26; or the probability that X 1 is 0.75. Parts II and III are
devoted to establishing these ideas more concretely and more precisely.
A Preview
It turns out that the theoretical distribution for each yield data set is:
f (y|, ) =
(y)2
1
e 22 ; < y <
2
(1.3)
which, when superimposed on each histogram, is shown in Fig 1.4 for YA , and
Fig 1.5 for YB , when the indicated characteristic parameters are specied
as = 75.52, = 1.43 for YA , and = 72.47, = 2.76 for YB .
Similarly, the theoretical distribution for the inclusions data is:
e x
; x = 0, 1, 2, . . .
(1.4)
x!
where the characteristic parameter = 1.02 is the average number of inclusions in each glass sheet. In similar fashion to Eq 4.155, it also provides
a theoretical characterization and quantication of the random phenomenon
responsible for the variability observed in the inclusions data. From it we
are able, for example, to compute the theoretical probabilities of observing
0, 1, 2, . . ., inclusions in any one glass sheet manufactured by this process. A
plot of this theoretical probability distribution function is shown in Fig 22.41
(compare with the histogram in Fig 1.3).
The full detail of precisely what all this means is discussed in subsequent
chapters; for now, this current brief preview serves the purpose of simply indicating how the expression in Eqs 4.155 and 4.40 provide a theoretical means
of characterizing (and quantifying) the random phenomenon involved respectively in the yield data and in the inclusions data. Expressions such as this are
called probability distribution functions (pdfs) and they provide the basis
for rational analysis of random variability via the concept of probability.
Precisely what this concept of probability is, how it gives rise to pdfs, and
how pdfs are used to solve practical problems and provide answers to the sorts
of questions posed by these illustrative examples, constitute the primary focus
of the remaining chapters in the book.
At this point, it is best to defer the rest of the discussion until when we
revisit these two problems at appropriate places in upcoming chapters where
we show that:
f (x|) =
24
Random Phenomena
Histogram of YA
Normal
18
Mean
StDev
N
16
75.52
1.432
50
14
Frequency
12
10
8
6
4
2
0
72
73
74
75
76
77
78
79
YA
Histogram of YB
Normal
9
Mean
StDev
N
72.47
2.764
50
7
Frequency
6
5
4
3
2
1
0
68
70
72
YB
74
76
78
25
Distribution Plot
Poisson, Mean=1.02
0.4
Probability
0.3
0.2
0.1
0.0
FIGURE 1.6: Theoretical probability distribution function for a Poisson random variable with parameter = 1.02. Compare with the inclusions data histogram in Fig 1.3
1.4
26
Random Phenomena
REVIEW QUESTIONS
1. What decision is to be made in the yield improvement problem of Section 1.1?
2. What are the economic factors to be taken into consideration in deciding what
to do with the yield improvement problem?
3. What is the essence of the yield improvement problem as discussed in Section
1.1?
4. What are some of the sources of variability associated with the process yields?
5. Why are the yield variables, YA and YB , continuous variables?
6. What single value is suggested as intuitive for representing a collection of n
data points, x1 , x2 , . . . , xn ?
7. What are some of the issues raised by entertaining the idea of representing the
yield data sets with the arithmetic averages yA and yB ?
8. Why is the number of inclusions found on each glass sheet a discrete variable?
9. What are some sources of variability associated with the glass manufacturing process which may ultimately be responsible for the variability observed in the number
of inclusions?
10. What is a frequency distribution and how is it obtained from raw data?
11. Why will bin size aect the appearance of a frequency distribution?
12. What is a histogram and how is it obtained from data?
13. What is the primary advantage of a histogram over a table of raw data?
27
EXERCISES
Section 1.1
1.1 The variance of a collection of n data points, y1 , y2 , . . . , yn , is dened as:
n
)2
i=1 (yi y
(1.5)
s2 =
n1
where y is the arithmetic average of the data set. From the yield data in Table 1.1,
obtain the variances s2A and s2B for the YA and YB data sets, respectively. Which is
greater, s2A or s2B ?
1.2 Even though the data sets in Table 1.1 were not generated in pairs, obtain the
50 dierences,
di = yAi yBi ; i = 1, 2, . . . , 50,
(1.6)
for corresponding values of YA and YB as presented in this table. Obtain a histogram
of di and compute the arithmetic average,
n
1
di .
d =
n i=1
(1.7)
What do these results suggest about the possibility that YA may be greater than YB ?
1.3 A set of theoretical results to be established later (see Chapter 4 Exercises) state
that, for di and d dened in Eq (1.7), and variance s2 dened in Exercise 1,
d =
s2d
yA yB
(1.8)
s2A + s2B
(1.9)
28
Random Phenomena
two sets of histograms with the corresponding histograms in Figs 12.8 and 1.2.
1.7 From the frequency distribution in Table 12.3 and the values computed for the
average, yA , and variance, s2A of the yield data set, YA , determine the percentage of
the data contained in the interval yA 1.96sA , where sA is the positive square root
of the variance, s2A .
1.8 Repeat Exercise 1.7 for the YB data in Table 1.4. Determine the percentage of
the data contained in the interval yB 1.96sB .
1.9 From Table 1.5 determine the value of x such that only 5% of the data exceeds
this value.
1.10 Using = 75.52 and = 1.43, compute theoretical values of the function in
Eq 4.155 at the center points of the frequency groups for the YA data in Table 12.3;
i.e., for y = 72, 73, . . . , 79. Compare these theoretical values with the corresponding
relative frequency values.
1.11 Repeat Exercise 1.10 for YB data and Table 1.4.
1.12 Using = 1.02, compute theoretical values of the function f (x|) in Eq 4.40
at x = 0, 1, 2, . . . 6 and compare with the corresponding relative frequency values in
Table 1.5.
APPLICATION PROBLEMS
1.13 The data set in the table below is the time (in months) from receipt to publication (sometimes known as time-to-publication) of 85 papers published in the January
2004 issue of a leading chemical engineering research journal.
19.2
9.0
17.2
8.2
4.5
13.5
20.7
7.9
19.5
8.8
18.7
7.4
9.7
13.7
8.1
8.4
10.8
15.1
5.3
12.0
3.0
18.5
5.8
6.8
14.5
3.3
11.1
16.4
7.3
7.4
7.3
5.2
10.2
3.1
9.6
12.9
17.3
6.0
24.3
21.3
19.3
2.5
9.1
8.1
9.8
15.4
15.7
8.2
8.8
7.2
12.8
4.2
4.2
7.8
9.5
3.9
8.7
5.9
5.3
1.8
10.1
10.0
18.7
5.6
3.3
7.3
11.3
2.9
5.4
15.2
8.0
11.7
17.2
4.0
3.8
7.4
5.3
10.6
15.2
11.5
5.9
20.1
12.2
12.0
8.8
(i) Generate a histogram of this data set. Comment on the shape of this histogram
29
and why, from the nature of the variable in question, such a shape may not be
surprising.
(ii) From the histogram of the data, what is the most popular time-to-publication,
and what fraction of the papers took longer than this to publish?
1.14 Refer to Problem 1.13. Let each raw data entry in the data table be xi .
(i) Generate a set of 85 sample average publication time, yi , from 20 consecutive
times as follows:
y1
20
1
xi
20 i=1
(1.10)
y2
21
1
xi
20 i=2
(1.11)
y3
...
yj
1
xi
20 i=3
...
20+(j1)
1
22
20
(1.12)
xi
(1.13)
i=j
For values of j 66, yj should be obtained by replacing x86 , x87 , x88 , . . . , which do
not exist, with x1 , x2 , x3 , . . . , respectively (i.e., for these purposes treat the given
xi data like a circular array). Plot the histogram for this generated yi data and
compare the shape of this histogram with that of the original xi data.
(ii) Repeat part (i) above, this time for zi data generated from:
zj =
1
20
20+(j1)
yi
(1.14)
i=j
for j = 1, 2, . . . , 85. Compare the histogram of the zi data with that of the yi data
and comment on the eect of averaging on the shape of the data histograms.
1.15 The data shown in the table below is a four-year record of the number of
recordable safety incidents occurring at a plant site each month.
1
0
2
0
0
1
2
1
0
0
0
0
0
1
1
0
2
0
2
0
2
0
0
0
0
0
1
0
0
0
2
0
0
0
1
1
1
0
1
0
0
0
0
0
1
1
0
1
(i) Find the average number of safety incidents per month and the associated variance. Construct a frequency table of the data and plot a histogram.
(ii) From the frequency table and the histogram, what can you say about the
chances of obtaining each of the following observations, where x represents the
number of observed safety incidents per month: x = 0, x = 1, x = 2, x = 3, x = 4
and x = 5?
(iii) Consider the postulate that a reasonable model for this phenomenon is:
f (x) =
e0.5 0.5x
x!
(1.15)
30
Random Phenomena
1
272
263
11
215
206
2
319
313
12
245
235
3
253
251
13
248
237
4
325
312
14
364
350
5
236
227
15
301
288
6
233
227
16
203
195
7
300
290
17
197
193
8
260
251
18
217
216
9
268
262
19
210
202
10
276
263
20
223
214
yO
Total no. of
older patients
(out of 100)
with pregnancy outcome x
32
41
21
5
1
0
yY
Total no. of
younger patients
(out of 100)
with pregnancy outcome x
8
25
35
23
8
1
The data shows x, the number of live births per delivered pregnancy, along
with how many in each group had the pregnancy outcome of x. For example, the
rst entry indicates that the IVF treatment was unsuccessful for 32 of the older
patients, with the corresponding number being 8 for the younger patients; 41
older patients delivered singletons, compared with 25 for the younger patients; 21
older patients and 35 younger patients each delivered twins; etc. Obtain a relative
frequency distribution for these data sets and plot the corresponding histograms.
Determine the average number of live births per delivered pregnancy for each group
31
and compare these values. Comment on whether or not these data sets indicate that
the outcomes of the IVF treatments are dierent for these two groups.
32
Random Phenomena
Chapter 2
Random Phenomena, Variability and
Uncertainty
2.1
2.2
2.3
2.4
2.5
34
34
35
35
37
37
41
41
42
42
43
43
44
45
46
46
47
48
49
50
51
When John Stuart Mills stated in his 1862 book, A System of Logic: Ratiocinative and Inductive, that ...the very events which in their own nature
appear most capricious and uncertain, and which in any individual case no
attainable degree of knowledge would enable us to foresee, occur, when considerable numbers are taken into account, with a degree of regularity approaching
to mathematical ..., he was merely articulatingastutely for the timethe
then-radical, but now well-accepted, concept that randomness in scientic observation is not a synonym for disorder; it is order of a dierent kind. The more
familiar kind of order informs determinism: the concept that, with sucient
mechanistic knowledge, all physical phenomena are entirely predictable and
thus describable by precise mathematical equations. But even classical physics,
that archetypal deterministic science, had to make room for this other kind
33
34
Random Phenomena
2.1
2.1.1
(2.2)
The rate of heat loss is determined precisely and consistently for any given
specic values of each entity on the right hand side of this equation.
The concept of determinism, that the phenomenon in question is precisely
determinable in every relevant detail, is central to much of science and engineering and has proven quite useful in analyzing real systems, and in solving practical problemswhether it is computing the trajectory of rockets for
35
=
=
A + Ai
B + Bi
(2.3)
(2.4)
with A and B representing the true but unknown yields obtainable from
processes A and B respectively, and Ai and Bi representing the superimposed
randomly varying componentthe sources of the random variability evident
in each observation yAi and yBi . Identical values of A do not produce identical
values of yAi in Eq (2.3); neither will identical values of B produce identical
values of yBi in Eq (2.4). In the second case of the glass process and the
number of inclusions per square meter, the idealization is:
xi = + i
(2.5)
where is the true number of inclusions associated with the process and i is
the superimposed random component responsible for the observed randomness
in the actual number of inclusion xi found on each individual glass sheet upon
inspection.
These two perspectives, determinism and randomness, are thus two
opposite idealizations of natural phenomena, the former when deterministic
aspects of the phenomenon are considered to be overwhelmingly dominant
over any random components, the latter case when the random components
are dominant and central to the problem. The principles behind each conceptual idealization, and the analysis technique appropriate to each, are now
elucidated with a chemical engineering illustration.
2.1.2
36
Random Phenomena
C0 G(t)
Fluid
element
F m3/s
lm
lA
F
secs
(2.7)
for each dye element to traverse the reactor. Hence, , the residence time for
an ideal plug ow reactor (PFR) is a deterministic quantity because its value
is exactly and precisely determinable from Eq (2.7) given F, A and l.
Keep in mind that the determinism that informs this analysis of the PFR
37
C0 G(t)
F m3/s
Volume
V m3
residence time arises directly as a consequence of the central plug ow idealization. Any departures from such idealization, especially the presence of
signicant axial dispersion (leading to a non-at uid velocity prole), will
result in dye molecules no longer arriving at the outlet at precisely the same
time.
Randomness and the CSTR
With the continuous stirred tank reactor (CSTR), the reactant stream
continuously ows into a tank that is vigorously stirred to ensure uniform
mixing of its content, while the product is continuously withdrawn from the
outlet (see Fig 2.2). The assumptions (idealizations) in this case are:
the reactor tank has a xed, constant volume, V m3 ;
the contents of the tank are perfectly mixed.
Once again, let us consider that a bolus of red dye of concentration C0
moles/m3 is instantaneously injected into the inlet stream at time t = 0; and
again, ask: how much time does each molecule of red dye spend in the reactor?
Unlike with the plug ow reactor, observe that it is impossible to answer this
question `
a-priori, or precisely: because of the vigorous stirring of the reactor
content, some dye molecules will exit almost instantaneously; others will stay
longer, some for a very long time. In fact, it can be shown that theoretically,
0 < < . Hence in this case, , the residence time, is a randomly varying
quantity that can take on a range of values from 0 to ; it cannot therefore be
adequately characterized as a single number. Notwithstanding, as all chemical
engineers know, the random phenomenon of residence times for ideal CSTRs
can, and has been, analyzed systematically (see for example, Hill, 19771).
1 C.G. Hill, Jr, An Introduction to Chemical Engineering Kinetics and Reactor Design,
Wiley, NY, 1977, pp 388-396.
38
Random Phenomena
dC
= F Cin F C
dt
(2.8)
where Cin is the dye concentration in the inlet stream. If we dene the parameter as
V
=
(2.9)
F
and note that the introduction of a bolus of dye of concentration C0 at t = 0
implies:
(2.10)
Cin = C0 (t)
where (t) is the Dirac delta function, then Eq (2.8) becomes:
dC
= C + C0 (t)
dt
(2.11)
C0 t/
e
(2.12)
1 /
e
(2.14)
recognizable to all chemical engineers as the familiar exponential instantaneous residence time distribution function for the ideal CSTR. The reader
39
0.20
fT
0.15
0.10
0.05
0.00
10
15
20
25
30
35
FIGURE 2.3: Instantaneous residence time distribution function for the CSTR:
(with = 5).
should take good note of this expression: it shows up a few more times and
in various guises in subsequent chapters. For now, let us observe that, even
though (a) the residence time for a CSTR, , exhibits random variability,
potentially able to take on values between 0 and (and is therefore not describable by a single value); so that (b) it is therefore impossible to determine
with absolute certainty precisely when any individual dye molecule will leave
the reactor; even so (c) the function, f (), shown in Eq (4.41), mathematically
characterizes the behavior of the entire ensemble of dye molecules, but in a
way that requires some explanation:
1. It represents how the residence times of uid particles in the well-mixed
CSTR are distributed over the range of possible values 0 < <
(see Fig 2.3).
2. This distribution of residence times is a well-dened, well-characterized
function, but it is not a description of the precise amount of time a particular individual dye molecule will spend in the reactor; rather it is a
description of how many (or what fraction) of the entire collection of
dye molecules will spend what amount of time in the reactor. For example, in broad terms, it indicates that a good fraction of the molecules
have relatively short residence times, exiting the reactor quickly; a much
smaller but non-zero fraction have relatively long residence times. It can
also provide more precise statements as follows.
3. From this expression (Eq (4.41)), we can determine the fraction of dye
molecules that have remained in the reactor for an amount of time less
than or equal to some time t, (i.e. molecules exiting the reactor with
40
Random Phenomena
age less than or equal to t): we do this by integrating f () with respect
to , as follows, to obtain
t
1 /
F (t) =
e
d = 1 et/
(2.15)
0
from which we see that F (0), the fraction of dye molecules with age less
than or equal to zero is exactly zero: indicating the intuitively obvious
that, no matter how vigorous the mixing, each dye molecule spends at
least a nite, non-zero, amount of time in the reactor (no molecule exits
instantaneously upon entry).
On the other hand, F () = 1, since
1 /
e
F () =
d = 1
(2.16)
again indicating the obvious: if we wait long enough, all dye molecules
will eventually exit the reactor as t . In other words, the fraction
of molecules exiting the reactor with age less than is exactly 1.
4. Since the fraction of molecules that will have remained in the reactor
for an amount of time less than or equal to t is F (t), and the fraction
that will have remained in the reactor for less than or equal to t + t
is F (t + t), then the fraction with residence time in the innitesimal
interval between t and t + t) is given by:
t+t
[t (t + t)] = F (t + t) F (t) =
t
1 /
e
d
(2.17)
(2.18)
5. And nally, the average residence time may be determined from the
expression in Eq (4.41) (and Eq (2.16)) as:
1 /
1
/
e
d
d
0
0 e
=
= 1 /
=
(2.19)
1
d
0 e
where the numerator integral is evaluated via integration by parts. Observe from the denition of above (in Eq (2.9)) that this result makes
perfect sense, strictly from the physics of the problem: particles in a
stream owing at the rate F m3 /s through a well-mixed reactor of volume V m3 , will spend an average of V /F = seconds in the reactor.
We now observe in conclusion two important points: (i) even though at no
point in the preceding discussion have we made any overt or explicit appeal
41
2.2
2.2.1
In such diverse areas as actuarial science, biology, chemical reactors, demography, economics, nance, genetics, human mortality, manufacturing quality assurance, polymer chemistry, etc., one repeatedly encounters a surprisingly common theme whereby phenomena which, on an individual level, appear entirely unpredictable, are well-characterized as ensembles (as demonstrated above with residence time distribution in CSTRs). For example, as
far back as 1662, in a study widely considered to be the genesis of population
demographics and of modern actuarial science by which insurance premiums
are determined today, the British haberdasher, John Graunt (1620-1674), had
observed that the number of deaths and the age at death in London were surprisingly predictable for the entire population even though it was impossible to
predict which individual would die when and in what manner. Similarly, while
the number of monomer molecules linked together in any polymer molecule
chain varies considerably, how many chains of a certain length a batch of
polymer product contains can be characterized fairly predictably.
Such natural phenomena noted above have come to be known as Random
Mass Phenomena, with the following dening characteristics:
1. Individual observations appear irregular because it is not possible to
predict each one with certainty; but
2. The ensemble or aggregate of all possible outcomes is regular, wellcharacterized and determinable;
3. The underlying phenomenological mechanisms accounting for the nature and occurrence of the specic observations determines the character of the ensemble;
4. Such phenomenological mechanisms may be known mechanistically (as
was the case with the CSTR), or its manifestation may only be deter-
42
Random Phenomena
mined from data (as was the case with John Graunts mortality tables
of 1662).
2.2.2
While ensemble characterizations provide a means of dealing systematically with random mass phenomena, many practical problems still involve
making decisions about specic, inherently unpredictable, outcomes. For example, the insurance company still has to decide what premium to charge each
individual on a person-by-person basis. When decisions must be made about
specic outcomes of random mass phenomena, uncertainty is an inevitable
consequence of the inherent variability. Furthermore, the extent or degree
of variability directly aects the degree of uncertainty: tighter clustering of
possible outcomes implies less uncertainty, whereas a broader distribution of
possible outcomes implies more uncertainty. The most useful mathematical
characterization of ensembles must therefore permit not only systematic analysis, but also a rational quantication of the degree of variability inherent in
the ensemble, and the resulting uncertainty associated with each individual
observation as a result.
2.2.3
43
Embedded in these questions are the following aliated questions that arise
as a consequence: (a) how was {xi }ni=1 obtained in (1); will the procedure for
obtaining the data aect how well we can answer question 1? (b) how was
f (x) determined in (2)?
Subsequent chapters are devoted to dealing with these fundamental problems systematically and in greater detail.
2.3
2.3.1
Introducing Probability
Basic Concepts
e x
; x = 0, 1, 2, . . .
x!
(2.20)
44
Random Phenomena
TABLE 2.1:
Computed
probabilities of occurrence of
various number of inclusions
for = 2 in Eq (9.2)
x = No of f (x) prob of
inclusions occurrence
0
0.135
0.271
1
2
0.271
3
0.180
0.090
4
5
0.036
..
..
.
.
0.001
8
9
0.000
2.3.2
Interpreting Probability
45
is assigned to indicate the degree of uncertainty associated with the occurrence of a particular outcome. As with temperature the conceptual quantity,
how a numerical value is determined for the probability of the occurrence
of a particular outcome under any specic circumstance depends on the circumstance itself. To carry the analogy with temperature a bit further: while
a thermometer capable of determining temperature to within half a degree
will suce in one case, a more precise device, such as a thermocouple, may
be required in another case, and an optical pyrometer for yet another case.
Whatever the case, under no circumstance should the device employed to determine its numerical value usurp the role of, or become the surrogate for,
temperature the quantity. This is important in properly interpreting probability, the conceptual entity: how an appropriate value is to be determined
for probability, an important practical problem in its own right, should not
be confused with the quantity itself.
With these ideas in mind, let us now consider several standard perspectives
of probability that have evolved over the years. These are best understood as
various techniques for how numerical values are determined rather than what
probability is.
`
Classical (A-Priori)
Probability
Consider a random phenomenon for which the total number of possible
outcomes is known to be N , all of which are equally likely; of these, let NA
be the number of outcomes in which A is observed (i.e. outcomes that are
favorable to A). Then according to the classical (or `
a-priori) perspective,
the probability of the occurrence of outcome A is dened as
P (A) =
NA
N
(2.21)
For example, in tossing a single perfect die once, the probability of observing
a 3 is, according to this viewpoint, evaluated as 1/6, since the total number of
possible outcomes is 6 of which only 1 is favorable to the desired observation
of 3. Similarly, if B is the outcome that one observes an odd number of dots,
then P (B) = 3/6 = 0.5.
Observe that according to this view, no experiments have been performed
yet; the formulation is based entirely on an `
a-priori enumeration of N and
NA . However, this intuitively appealing perspective is not always applicable:
What if all the outcomes are not equally likely?
How about random phenomena whose outcomes cannot be characterized
as cleanly in this fashion, say, for example, the prospect of a newly purchased refrigerator lasting for 25 years without repair? or the prospect
of snow falling on a specic April day in Wisconsin?
What Eq. (2.21) provides is an intuitively appealing (and theoretically sound)
means of determining an appropriate value for P (A); but it is restricted only
46
Random Phenomena
to those circumstances where the random phenomenon in question is characterized in such a way that N and NA are natural and easy to identify.
`
Relative Frequency (A-Posteriori)
Probability
On the opposite end of the spectrum from the `
a-priori perspective is the
following alternative: consider an experiment that is repeated n times under identical conditions, where the outcomes involving A have been observed
a-posteriori, the probability of the occurrence of
to occur nA times. Then, `
outcome A is dened as
nA
P (A) = lim
(2.22)
n n
The appeal of this viewpoint is not so much that it is just as intuitive as the
previous one, but that it is also empirical, making no assumptions about equal
likelihood of outcomes. It is based on the actual performance of experiments
and the actual `
a-posteriori observation of the relative frequency of occurrences
of the desired outcome. This perspective provides a prevalent interpretation
of probability as the theoretical value of long range relative frequencies. In
fact, this is what motivates the notion of the theoretical distribution as the
limiting form to which the empirical frequency distribution tends with the
acquisition of increasing amounts of data.
However, this perspective also suers from some limitations:
How many trials, n, is sucient for Eq (2.22) to be useful in practice?
How about random phenomena for which the desired outcome does not
lend itself to repetitive experimentation under identical conditions, say,
for example, the prospect of snow falling on a specic April day in Wisconsin? or the prospect of your favorite team winning the basketball
championship next year?
Once again, these limitations arise primary because Eq (2.22) is simply just
another means of determining an appropriate value for P (A) that happens
to be valid only when the random phenomenon is such that the indicated repeated experimentation is not only possible and convenient, but for which, in
practice, truncating after a suciently large number of trials to produce a
nite approximation presents no conceptual dilemma. For example, after tossing a coin 500 times and obtaining 251 heads, declaring that the probability
of obtaining a head upon a single toss as 0.5 presents no conceptual dilemma
whatsoever.
Subjective Probability
There is yet another alternative perspective whereby P (A) is taken simply
as a measure of the degree of (personal) belief associated with the postulate
that A will occur, the value having been assigned subjectively by the individual concerned, akin to betting odds. Thus, for example, in rolling a perfect
die, the probability of obtaining a 3 is assigned strictly on the basis of what the
47
2.4
Beginning with the next chapter, Part II is devoted to an axiomatic treatment of probability, including basic elements of probability theory, random
variables, and probability distribution functions, within the context of a comprehensive framework for systematically analyzing random phenomena.
The central conceptual elements of this framework are: (i) a formal representation of uncertain outcomes with the random variable, X; and (ii) the
mathematical characterization of this random variable by the probability distribution function (pdf), f (x). How the probabilities are distributed over the
entire aggregate collection of all possible outcomes, expressed in terms of the
random variable, X, is contained in this pdf. The following is a procedure for
problem-solving within this framework:
1. Problem Formulation: Dene and formulate the problem appropriately.
Examine the random phenomenon in question, determine the random
variable(s), and assemble all available information about the underlying
mechanisms;
2. Model Development : Identify, postulate, or develop an appropriate ideal
model of the relevant random variability in the form of the probability
distribution function f (x);
3. Problem Solution: Use the model to solve the relevant problem (analysis,
prediction, inference, estimation, etc.);
4. Results validation: Analyze and validate the result and, if necessary,
return to any of the preceding steps as appropriate.
48
Random Phenomena
2.5
49
REVIEW QUESTIONS
1. If not a synonym for disorder, then what is randomness in scientic observation?
2. What is the concept of determinism?
3. Why are the expressions in Eqs (16.2) and (2.2) considered deterministic?
4. What is an example phenomenon that had to be ignored in order to obtain the
deterministic expressions in Eq (16.2)? And what is an example phenomenon that
had to be ignored in order to obtain the deterministic expressions in Eq (2.2)?
5. What are the main characteristics of randomness as described in Subsection
2.1.1?
6. Compare and contrast determinism and randomness as two opposite idealizations
of natural phenomena.
7. Which idealized phenomenon does residence time in a plug ow reactor (PFR)
represent?
8. What is the central plug ow idealization in a plug ow reactor, and how will
departures from such idealization aect the residence time in the reactor?
9. Which idealized phenomenon does residence time in a continuous stirred-tank
reactor (CSTR) represent?
10. On what principle is the mathematical model in Eq (2.8) based?
11. What does the expression in Eq (4.41) represent?
12. What observation by John Graunt is widely considered to be the genesis of
population demographics and of modern actuarial science?
13. What are the dening characteristics of random mass phenomena?
50
Random Phenomena
EXERCISES
Section 2.1
2.1 Solve Eq (2.11) explicitly to conrm the result in Eq (2.12).
2.2 Plot the expression in Eq (2.15) as a function of the scaled time variable, t = t/ ;
determine the percentage of dye molecules with age less than or equal to the mean
residence time, .
2.3 Show that
1 /
e
d =
(2.23)
and
x2
1
f (x) = e 18 ; < x <
3 2
(2.24)
1 y2
f (y) = e 2 ; < y <
2
(2.25)
represent how the occurrences of all the possible outcomes of the two randomly
varying, continuous variables, X and Y , are distributed. Plot these two distribution
51
functions on the same graph. Which of these variables has a higher degree of uncertainty associated with the determination of any particular outcome. Why?
2.5 When a fair coin is tossed 4 times, it is postulated that the probability of
obtaining x heads is given by the probability distribution function:
f (x) =
4!
0.54
x!(4 x)!
(2.26)
APPLICATION PROBLEMS
2.7 For each of the following two-reactor congurations:
(a) two plug ow reactors in series where the length of reactor 1 is l1 m, and
that of reactor 2 is l2 m, but both have the same uniform cross-sectional area
A m2 ;
(b) two continuous stirred tank reactors with volumes V1 and V2 m3 ;
(c) the PFR in Fig 2.1 followed by the CSTR in Fig 2.2;
given that the ow rate through each reactor ensemble is constant at F m3 /s, obtain
the residence time, , or the residence time distribution, f (), as appropriate. Make
any assumption you deem appropriate about the concentration C1 (t) and C2 (t) in
the rst and second reactors, respectively.
2.8 In the summer of 1943 during World War II, a total of 365 warships were attacked
by Kamikaze pilots: 180 took evasive action and 60 of these were hit; the remaining
185 counterattacked, of which 62 were hit. Using a relative frequency interpretation
and invoking any other assumption you deem necessary, determine the probability
that any attacked warship will be hit regardless of tactical response. Also determine
the probability that a warship taking evasive action will be hit and the probability
that a counterattacking warship will be hit. Compare these three probabilities and
discuss what this implies regarding choosing an appropriate tactical response. (A
full discussion of this problem is contained in Chapter 7.)
2.9 Two American National Football League (NFL) teams, A and B, with respective Win-Loss records 9-6 and 12-3 after 15 weeks, are preparing to face each other
in the 16th and nal game of the regular season.
(i) From a relative frequency perspective of probability, use the supplied information
(and any other assumption you deem necessary) to compute the probability of Team
A winning any generic game, and also of Team B winning any generic game.
52
Random Phenomena
(ii) When the two teams play each other, upon the presupposition that past record
is the best indicator of a teams chances of winning a new game, determine reasonable values for P (A), the probability that team A wins the game, and P (B), the
probability that team B wins, assuming that this game does not end up in a tie.
Note that for this particular case,
P (A) + P (B) = 1
(2.27)
Part II
Probability
53
55
56
Chapter 3
Fundamentals of Probability Theory
3.1
3.2
3.3
3.4
3.5
3.6
Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Events, Sets and Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Probability Set Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Final considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 The Calculus of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Illustrating the Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Formalizing the Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
60
61
64
67
68
69
69
71
72
72
73
74
76
77
78
79
80
84
The paradox of randomly varying phenomena that the aggregate ensemble behavior of unpredictable, irregular, individual observations is stable and
regular provides a basis for developing a systematic analysis approach.
Such an approach requires temporarily abandoning the futile task of predicting individual outcomes and instead focussing on characterizing the aggregate
ensemble in a mathematically appropriate manner. The central element is a
machinery for determining the mathematical probability of the occurrence
of each outcome and for quantifying the uncertainty associated with any attempts at predicting the intrinsically unpredictable individual outcomes. How
this probability machinery is assembled from a set of simple building blocks
and mathematical operations is presented in this chapter, along with the basic concepts required for its subsequent use for systematic analysis of random
57
58
Random Phenomena
3.1
Building Blocks
59
60
Random Phenomena
The set B = {T T T } consists of the only outcome involving the
occurrence of 3 tails; it therefore represents the event that 3 tails are
observed.
The set C = {HHH, HHT, HT H, T HH} consists of the outcomes
involving the occurrence of at least 2 heads; it represents the event that
at least 2 heads are observed.
Similarly, the set D = {HHH} represents the event that 3 heads
are observed.
A simple or elementary event is one that consists of one and only one
outcome of the experiment; i.e. a set with only one element. Thus, in Example 3.2, set B and set D are examples of elementary events. Any other event
consisting of more than one outcome is a complex or compound event. Sets A
and C in Example 3.2 are compound events. (One must be careful to distinguish between the set and its elements. The set B in Example 3.2 contains
one element, TTT, but the set is not the same as the element. Thus, even
though the elementary event consists of a single outcome, one is not the same
as the other).
Elementary events possess an important property that is crucial to the
development of probability theory:
An experiment conducted once produces one and only one outcome;
The elementary event consists of only one outcome;
One and only one elementary event can occur for every experimental
trial;
Therefore:
Simple (elementary) events are mutually exclusive.
In Example 3.2, sets B and D represent elementary events; observe that if one
occurs, the other one cannot. Compound events do not have this property. In
this same example, observe that if, after a trial, the outcome is HTH (a tail
sandwiched between two heads), event A has occurred (we have observed
precisely 2 heads), but so has event C, which requires observing 2 or more
heads. In the language of sets, the element HTH belongs to both set A and
set B.
An elementary event therefore consists of a single outcome and cannot be decomposed into a simpler event; a compound event, on the other
hand, consists of a collection of more than one outcome and can therefore be
composed from several simple events.
3.2
61
Operations
3.2.1
We earlier dened the sample space as a set whose elements are all the
possible outcomes of an experiment. Events are also sets, but they consist of
only certain elements from that share a common attribute. Thus,
Of all the subsets of , there are two special ones with important connotations: , the empty set consisting of no elements at all, and itself. In
the language of events, the former represents the impossible event, while the
latter represents the certain event.
Since they are sets, events are amenable to analysis using precisely the
same algebra of set operations union, intersection and complement
which we now briey review.
1. Union: A B represents the set of elements that are either in A or B. In
general,
A1 A2 A3 . . . Ak =
Ai
(3.2)
i=1
is the set of elements that are in at least one of the k sets, {Ai }k1 .
2. Intersection: A B represents the set of elements that are in both A and
62
Random Phenomena
B. In general,
A1 A2 A3 . . . Ak =
Ai
(3.3)
i=1
is the set of elements that are common to all the k sets, {Ai }k1 .
To discuss the third set operation requires two special sets: The universal set (or universe), typically designated , and the null (or empty) set,
typically designated . The universal set consists of all possible elements of
interest, while the null set contains no elements. (We have just recently introduced such sets above but in the specic context of the sample space of an
experiment; the current discussion is general and not restricted to the analysis
of randomly varying phenomena and their associated sample spaces.)
These sets have the special properties that for any set A,
A
A
=
=
A;
(3.4)
(3.5)
(3.6)
(3.7)
and
(A )
(A B)
(A B)
= ; =
= A;
= A B
= A B
(3.8)
(3.9)
(3.10)
(3.11)
63
TABLE 3.1:
Certain event
Impossible event
A
Non-occurrence of event A
Event A or B
AB
AB
Events A and B
The following table presents some information about the nature of subsets of
interpreted in the language of events.
Note in particular that if A B = , A and B are said to be disjoint sets
(with no elements in common); in the language of events, this implies that
event A occurring together with event B is impossible. Under these circumstances, events A and B are said to be mutually exclusive.
Example 3.3 PRACTICAL ILLUSTRATION OF SETS AND
EVENTS
Samples from various batches of a polymer resin manufactured at a plant
site are tested in a quality control laboratory before release for sale. The
result of the tests allows the manufacturer to classify the product into
the following 3 categories:
1. Meets or exceeds quality requirement; Assign #1; approve for sale
as 1st quality.
2. Barely misses quality requirement; Assign #2; approve for sale as
2nd grade at a lower price.
3. Fails completely to meet quality requirement; Assign #3; reject as
poor grade and send back to be incinerated.
Identify the experiment, outcome, trial, sample space and the events
associated with this practical problem.
Solution:
1. Experiment: Take a sample of polymer resin and carry out the
prescribed product quality test.
2. Trial: Each trial involves taking a representative sample from each
polymer resin batch and testing it as prescribed.
3. Outcomes: The assignment of a number 1, 2, or 3 depending on
how the result of the test compares to the product quality requirements.
4. Sample space: The set = {1, 2, 3} containing all possible outcomes.
5. Events: The subsets of the sample space are identied as follows:
E0 = {}; E1 = {1}; E2 = {2}; E3 = {3}; E4 = {1, 2}; E5 =
{1, 3}; E6 = {2, 3}; E7 = {1, 2, 3}. Note that there are 8 in all. In
general, a set with n distinct elements will have 2n subsets.
64
Random Phenomena
E1 E2
(3.12)
E5
E1 E3
(3.13)
E6
E2 E3
(3.14)
E7
E1 E2 E3
(3.15)
65
TABLE 3.2:
Name
Allison
Ben
Chrissy
Daoud
Evan
Fouad
Gopalan
Helmut
Ioannis
Jim
Katie
Larry
Moe
Nathan
Olu
3.2.2
Set Functions
A function F (.), dened on the subsets of such that it assigns one and
only one real number to each subset of , is known as a set function. By
this denition, no one subset can be assigned more than one number by a set
function. The following examples illustrate the concept.
Example 3.6 SET FUNCTIONS DEFINED ON THE SET OF
STUDENTS IN A CLASSROOM
The following table shows a list of attributes associated with 15 students
in attendance on a particular day in a 600 level course oered at the
University of Delaware. Let set A be the subset of female students and
B, the subset of male students. Obtain the real number assigned by the
following set functions:
1. N (A), the total number of female students in class;
2. N (), the total number of students in class;
3. M (B), the sum total amount of money carried by the male students;
4. H(A),
the average height (in inches) of female students;
5. Y + (B), the maximum age, in years, of male students
Solution:
1. N (A) = 3;
2. N () = 15;
3. M (B) = $293.00;
4. H(A)
= 67 ins.
66
Random Phenomena
B
A
3
6
37
(3.18)
(3.19)
67
so that there are 46 parts that are either defective or from the old batch.
3.2.3
Let P (.) be an additive set function dened on all subsets of , the sample
space of all the possible outcomes of an experiment, such that:
1. P (A) 0 for every A ;
2. P () = 1;
3. P (A B) = P (A) + P (B) for all mutually exclusive events A and B
then P (.) is a probability set function.
Remarkably, these three simple rules (axioms) due to Kolmogorov, are
sucient to develop the mathematical theory of probability. The following
are important properties of P (.) arising from these axioms.
1. To each event A, it assigns a non-negative number, P (A), its probability;
2. To the certain event , it assigns unit probability;
3. The probability that either one or the other of two mutually exclusive
events A, B will occur is the sum of the probabilities that each event
will occur.
The following corollaries are important consequences of the foregoing three
axioms:
Corollary 1. P (A ) = 1 P (A).
The probability of non-occurrence of A is 1 minus the probability of its occurrence. Equivalently, the combined probability of the occurrence of an event
and of its non-occurrence add up to 1. This follows from the fact that
= A A ;
(3.20)
that A and A are disjoint sets; that P (.) is an additive set function, and that
P () = 1.
Corollary 2. P () = 0.
The probability of an impossible event occurring is zero. This follows from the
fact that = and from corollary 1 above.
Corollary 3. A B P (A) P (B).
If A is a subset of B then the probability of occurrence of A is less than, or
equal to, the probability of the occurrence of B. This follows from the fact
that under these conditions, B can be represented as the union of 2 disjoint
sets:
(3.21)
B = A (B A )
68
Random Phenomena
(3.22)
(3.23)
3.2.4
Final considerations
Thus far, in assembling the machinery for dealing with random phenomena
by characterizing the aggregate ensemble of all possible outcomes, we have
encountered the sample space , whose elements are all the possible outcomes
of an experiment; we have presented events as collections of these outcomes
(and hence subsets of ); and nally P (.), the probability set function dened
on subsets of , allows the axiomatic denition of the probability of an event.
What we need next is a method for actually obtaining any particular probability P (A) once the event A has been dened. Before we can do this, however,
for completeness, a set of nal considerations are in order.
Even though as presented, events are subsets of , not all subsets of
are events. There are all sorts of subtle mathematical reasons for this, including the (somewhat unsettling) case in which consists of innitely many
elements, as is the case when the outcome is a continuous entity and can
therefore take on values on the real line. In this case, clearly, is the set of
all real numbers. A careful treatment of these issues requires the introduction
of Borel elds (see for example, Kingman and Taylor, 1966, Chapter 111 ).
This is necessary because, as the reader may have anticipated, the calculus of
probability requires making use of set operations, unions and intersections, as
well as sequences and limits of events. As a result, it is important that sets
resulting from such operations are themselves events. This is strictly true of
Borel elds.
Nevertheless, for all practical purposes, and most practical applications, it
is often not necessary to distinguish between the subsets of and genuine
events. For the reader willing to accept on faith the end resultthe probability
1 Kingman, J.F.C. and Taylor, S.J., Introduction to the Theory of Measure and Probability, Cambridge University Press, 1966.
69
3.3
Probability
3.3.1
Once the sample space for any random experiment has been specied
and the events (subsets of the sample space) identied, the following is the
procedure for determining the probability of any event A, based on the important property that elementary events are mutually exclusive:
(3.24)
(3.25)
B = {d3 , d5 , . . . , dN }
(3.26)
P (B) = 1 p1 p2 p4
(3.27)
and for
then
The following examples illustrate how probabilities pi may be assigned to
elementary events.
70
Random Phenomena
Example 3.8 ASSIGNMENTS FOR EQUIPROBABLE OUTCOMES
The experiment of tossing a coin 3 times and recording the observed
number of heads and tails was considered in Examples 3.1 and 3.2.
There the sample space was obtained in Eq (4.5) as:
= {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T },
(3.28)
a set with 8 elements that comprise all the possible outcomes of the experiment. Several events associated with this experiment were identied
in Example 3.2.
If there is no reason for any one of the 8 possible outcomes to be
any more likely to occur that any other one, the outcomes are said to
be equiprobable and we assign a probability of 1/8 to each one. This
gives rise to the following equiprobale assignment of probability to the
8 elementary events:
Note that
P (E1 )
P {HHH} = 1/8
P (E2 )
P {HHT } = 1/8
P (E3 )
..
.
P {HT H} = 1/8
P (E7 )
P {T T H} = 1/8
P (E8 )
P {T T T } = 1/8
8
1
pi =
8
P (Ei ) = 1
(3.29)
(3.30)
(3.31)
identied in Example 3.2 (the event that exactly 2 heads are observed)
consists of three elementary events E2 , E3 and E4 , so that
A = E2 E3 E4 ,
(3.32)
(3.33)
Other means of probability assignment are possible, as illustrated by the following example.
`
Example 3.9 ALTERNATIVE ASSIGNMENTS FROM APRIORI KNOWLEDGE
Consider the manufacturing example discussed in Examples 3.3 and 3.4.
71
0.75
(3.34)
P (E2 )
0.15
(3.35)
P (E3 )
0.10
(3.36)
(3.37)
P (E5 )
(3.38)
P (E6 )
(3.39)
3.3.2
Implications
72
Random Phenomena
G (Graduate)
12
U (Undergraduate)
38
10
3.4
3.4.1
Conditional Probability
Illustrating the Concept
Consider a chemical engineering thermodynamics class consisting of 50 total students of which 38 are undergraduates and the rest are graduate students.
Of the 12 graduate students, 8 are chemistry students; of the 38 undergraduates, 10 are chemistry students. We may dene the following sets:
, the (universal) set of all students (50 elements);
G, the set of graduate students (12 elements);
C, the set of chemistry students (18 elements)
Note that the set G C, the set of graduate chemistry students, contains 8
elements. (See Fig 3.2.)
We are interested in the following problem: select a student at random;
given that the choice results in a chemistry student, what is the probability
that she/he is a graduate student? This is a problem of nding the probability
of the occurrence of an event conditioned upon the prior occurrence of another
one.
73
B
A
3.4.2
P (A)
P (A )
=
P ()
1
(3.41)
Returning now to the previous illustration, we see that the required quantity is P (G|C), and by denition,
P (G|C) =
8/50
P (G C)
=
= 8/18
P (C)
18/50
(3.42)
74
Random Phenomena
B
A
A B*
AB
(3.43)
(3.44)
3.4.3
Total Probability
It is possible to obtain total probabilities when only conditional probabilities are available. We now present some very important results relating
conditional probabilities to total probability.
Consider events A and B, not necessarily disjoint. From the Venn diagram
in Fig 3.4, we may write A as the union of 2 disjoint sets as follows:
A = (A B) (A B )
(3.45)
In words, this expression states that the points in A are made up of two
groups: the points in A that are also in B, and the points in A that are not
in B. And because the two sets are disjoint, so that the events they represent
are mutually exclusive, we have:
P (A) = P (A B) + P (A B )
(3.46)
75
A
A B2
A B3
A B1
B1
A Bk
B2
.....
B3
Bk
(3.47)
(3.48)
or, alternatively,
Bi
(3.49)
= B1 B2 B3 . . . Bk =
i=1
(3.50)
which is a partitioning of the set A as a union of k disjoint sets (See Fig 3.5).
As a result,
P (A) = P (A B1 ) + P (A B2 ) + . . . + P (A Bk )
(3.51)
P (A Bi ) = P (A|Bi )P (Bi )
(3.52)
but since
we immediately obtain
P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + . . . + P (A|Bk )P (Bk )
Thus:
P (A) =
k
i=1
P (A|Bi )P (Bi )
(3.53)
(3.54)
76
Random Phenomena
an expression that is sometimes referred to as the Theorem of total probability used to compute total probability P (A) from P (A|Bi ) and P (Bi ).
The following example provides an illustration.
Example 3.10 TOTAL PROBABILITY
A company manufactures light bulbs of 3 dierent types (T1 , T2 , T3 )
some of which are defective right from the factory. From experience
with the manufacturing process, it is known that the fraction of defective
Type 1 bulbs is 0.1; Types 2 and 3 have respective defective fractions
of 1/15 and 0.2.
A batch of 200 bulbs were sent to a quality control laboratory for
testing: 100 Type 1, 75 Type 2, and 25 Type 3. What is the probability
of nding a defective bulb?
Solution:
The supplied information may be summarized as follows: Prior conditional probabilities of defectiveness,
P (D|T1 ) = 0.1; P (D|T2 ) = 1/15; P (D|T3 ) = 0.2;
(3.55)
(3.56)
Assuming equiprobable outcomes, this number distribution immediately implies the following:
P (T1 ) = 100/200 = 0.5; P (T2 ) = 0.375; P (T3 ) = 0.125
(3.57)
3.4.4
Bayes Rule
P (Bi A)
P (A)
(3.59)
77
but
P (Bi A) = P (A Bi ) = P (A|Bi )P (Bi )
(3.60)
which, when substituted into (3.59), gives rise to a very important result:
P (A|Bi )P (Bi )
P (Bi |A) = k
i=1 P (A|Bi )P (Bi )
(3.61)
This famous result, due to the Revd. Thomas Bayes (1763), is known as
Bayes Rule and we will encounter it again in subsequent chapters. For now,
it is an expression that can be used to compute the (unknown) `
a-posteriori
probability P (Bi |A) of events Bi from the `
a-priori probabilities P (Bi ) and
the (known) conditional probabilities P (A|Bi ). It indicates that the unknown
a-posteriori probability is proportional to the product of the `
`
a-priori probability and the known conditional probability we wish to reverse; the constant
of proportionality is the reciprocal of the total probability of event A.
This result is the basis of an alternative approach to data analysis (discussed in Section 14.6 of Chapter 14) wherein available prior information is
incorporated in a systematic fashion into the analysis of experimental data.
3.5
Independence
For two events A and B, the conditional probability P (A|B) was dened
earlier in Eq.(5.33). In general, this conditional probability will be dierent
from the unconditional probability P (A), indicating that the knowledge that
B has occurred aects the probability of the occurrence of A.
However, when the occurrence of B has no eect on the occurrence of A,
then the events A and B are said to be independent and
P (A|B) = P (A)
(3.62)
so that the conditional and unconditional probabilities are identical. This will
occur when
P (A B)
= P (A)
(3.63)
P (B)
so that
P (A B) = P (A)P (B)
(3.64)
Thus, when events A and B are independent, the probability of the two events
happening concurrently is the product of the probabilities of each one occurring by itself. Note that the expression in Eq.(3.64) is symmetric in A and B
so that if A is independent of B, then B is also independent of A.
This is another in the collection of very important results used in the
78
Random Phenomena
P (A)P (B)
(3.65)
P (B C) =
P (A C) =
P (B)P (C)
P (A)P (C)
(3.66)
(3.67)
(3.68)
P (A B C) =
3.6
This chapter has been primarily concerned with assembling the machinery
of probability from the building blocks of events in the sample space,
the collection of all possible randomly varying outcomes of an experiment.
We have seen how the probability of an event A arises naturally from the
probability set function, an additive set function dened on the set that
satises the three axioms of Kolmogorov.
Having established the concept of probability and how the probability of
any subset of can be computed, a straightforward extension to special events
restricted to conditioning sets in led to the related concept of conditional
probability. The idea of total probability, the result known as Bayes rule, and
especially the concept of independence all arise naturally from conditional
probability and have profound consequences for random phenomena analysis
that cannot be fully appreciated until much later.
We note in closing that the presentation of probability in this chapter (especially as a tool for solving problems involving randomly varying phenomena)
is still quite rudimentary because the development is not quite complete yet.
The nal step in the development of the probability machinery, undertaken
primarily in the next chapter, requires the introduction of the random variable, X, from which the analysis tool, the probability distribution function,
f (x), emerges and is fully characterized.
Here are some of the main points of the chapter again:
Events, as subsets of the sample space, , can be elementary (simple) or
compound (complex); if elementary, then they are mutually exclusive; if
compound, then they can be composed from several simple events.
79
P (A B)
P (B)
REVIEW QUESTIONS
1. What are the ve basic building blocks of probability theory as presented in Section 3.1? Dene each one.
2. What is a simple (or elementary) event and how is it dierent from a complex
(or compound) event?
3. Why are elementary events mutually exclusive?
4. What is the relationship between events and the sample space?
5. In the language of events, what does the empty set, , represent? What does the
entire sample space, , represent?
80
Random Phenomena
6. Given two sets A and B, in the language of events, what do the following sets
represent: A ; A B; and A B?
7. What does it mean that two events A and B are mutually exclusive?
8. What is a set function in general and what is an additive set function in particular?
9. What are the three fundamental properties of a probability set function (also
known as Kolmogorovs axioms)?
10. How is the probability of any event A determined from the elements events
in ?
11. For any two sets A and B, what is the denition of P (A|B), the conditional
probability of A given B? If the two sets are disjoint such that A B = , in words,
what does P (A|B) mean in this case?
12. How does one obtain total probability from partial (i.e., conditional) probabilities?
13. What is Bayes rule and what is it used for?
14. Given P (A|Bi ) and P (Bi ), how does one reverse the probability to determine
P (Bi |A)?
15. What does it mean for two events A and B to be independent?
16. What is P (A B) when two events A and B are (i) mutually exclusive, and (ii)
independent?
EXERCISES
Section 3.1
3.1 When two diceone black with white dots, the other black with white dots
are tossed once, simultaneously, and the number of dots shown on each dies top
face after coming to rest are recorded as an ordered pair (nB , nW ), where nB is the
number on the black die, and nW the number on the white die,
(i) identify the experiment, what constitutes a trial, the outcomes, and the sample
space.
(ii) If the sum of the numbers on the two dice is S, i.e.,
S = nB + nW ,
(3.69)
81
disapprove, so that the outcome of one such opinion sample is the ordered triplet
(n0 , n1 , n2 ). Write mathematical expressions in terms of the numbers n0 , n1 , and n2
for the following events:
(i) A = {Unanimous support for the policy}; and A , the complement of A.
(ii) B = {More students disapprove than approve}; and B .
(iii) C = {More students are indierent than approve};
(iv) D = {The majority of students are indierent }.
Section 3.2
3.3 Given the following two sets A and B:
A
{x : x = 1, 3, 5, 7, . . .}
(3.70)
{x : x = 0, 2, 4, 6, . . .}
(3.71)
nd A B and A B.
3.4 Let Ak = {x : 1/(k + 1) x 1} for k = 1, 2, 3, . . .. Find the set B dened by:
B = A1 A2 A3 . . . =
Ai
(3.72)
i=1
3.5 For sets A, B, C, subsets of the universal set , establish the following identities:
(A B)
A B
(3.73)
A B
A (B C)
(A B) (A C)
(3.75)
A (B C)
(A B) (A C)
(3.76)
(A B)
(3.74)
3.6 For every pair of sets A, B, subsets of the sample space upon which the
probability set function P (.) has been dened, prove that:
P (A B) = P (A) + P (B) P (A B)
(3.77)
3.7 In a certain engineering research and development company, apart from the support sta which number 25, all other employees are either engineers or statisticians
or both. The total number of employees (including the support sta) is 100. Of
these, 50 are engineers, and 40 are statisticians; the number of employees that are
both engineers and statisticians is not given. Find the probability that an employee
chosen at random is not one of those classied as being both an engineer and a
statistician.
Section 3.3
3.8 For every set A, let the set function Q(.) be dened as follows:
f (x)
Q(A) =
(3.78)
where
x
1
2
f (x) =
; x = 0, 1, 2, . . .
3
3
(3.79)
82
Random Phenomena
evaluate P (A), P (A ), P (A A )
3.10 For the experiment of rolling two diceone black with white dots, the other
black with white dotsonce, simultaneously, presented in Exercise 3.1, rst obtain
, the sample space, and, by assigning equal probability to each of the outcomes,
determine the probability of the following events:
(i) A = {nB + nW = 7}, i.e. the sum is 7;
(ii) B = {nB < nW };
(iii) B , the complement of B;
(iv) C = {nB = nW }, i.e. the two dice show the same number;
(v) D = {nB + nW = 5 or 9}.
3.11 A black velvet bag contains three red balls and three green balls. Each experiment involves drawing two balls at once, simultaneously, and recording their colors,
R for red, and G for green.
(i) Obtain the sample space, assuming that balls of the same color are indistinguishable.
(ii) Upon assigning equal probability to each element in the sample space, determine
the probability of drawing two balls of dierent colors.
(iii) If the balls are distinguishable and numbered from 1 to 6, and if the two balls
are drawn sequentially, not simultaneously, now obtain the sample space and from
this determine the probability of drawing two balls of dierent colors.
3.12 An experiment is performed by selecting a card from an ordinary deck of 52
playing cards. The outcome, , is the type of card chosen, classied as: Ace,
King, Queen, Jack, and others. The random variable X() assigns the
number 4 to the outcome if is an Ace; X() = 3 if the outcome is a King;
X() = 2 if the outcome is a Queen, and X() = 1 if the outcome is a Jack;
X() = 0 for all other outcomes.
(i) What is the space V of this random variable?
(ii) If the probability set function P () dened on the subsets of the original sample
space assigns a probability 1/52 to each of these outcomes, describe the induced
probability set function PX (A) induced on all the subsets of the space V by this
random variable.
(iii) Describe a physical (scientic or engineering) problem for which the above would
be a good surrogate model.
3.13 Obtain the sample space, , for the experiment involving tossing a fair coin 4
times. Upon assigning equal probability to each outcome, determine the probabilities
of obtaining, 0, 1, 2, 3, or 4 heads. Conrm that your result is consistent with the
83
postulate that the probability model for this phenomenon is given by the probability
distribution function:
n!
f (x) =
(3.81)
px (1 p)nx
x!(n x)!
where f (x) is the probability of obtaining x heads in n = 4 tosses, and p = 12 is the
probability of obtaining a head in a single toss of the coin. (See Chapter 8.)
3.14 In the fall of 2007, k students born in 1989 attended an all-freshman introductory general engineering class at the University of Delaware. Conrm that if p is the
probability that at least two of the students have the same birthday then:
1p=
1
365!
(365 k)! (365)k
(3.82)
Show that for a class with 23 or more students born in 1989, the probability of at
least 2 students sharing the same birthday, is more than 1/2, i.e., if k > 23 then
p > 1/2.
Sections 3.4 and 3.5
3.15 Six simple events, with probabilities P (E1 ) = 0.11; P (E2 ) = P (E5 ) =
0.20; P (E3 ) = 0.25; P (E4 ) = 0.09; P (E6 ) = 0.15, constitute the entire set of outcomes of an experiment. The following events are of interest:
A = {E1 , E2 }; B = {E2 , E3 , E4 }; C = {E5 , E6 }; D = {E1 , E2 , E5 }
Determine the following probabilities:
(i) P (A), P (B), P (C), P (D);
(ii) P (A B), P (A B); P (A D), P (A D); P (B C), P (B C);
(iii) P (B|A), P (A|B); P (B|C), P (D|C)
Which of the events A, B, C and D are mutually exclusive?
3.16 Assuming that giving birth to a boy or a girl is equally likely, and further, that
no multiple births have occurred, rst, determine the probability of a family having
three boys in a row. Now consider the conjecture (based on empirical data) that, for
a family that has already had two boys in a row, the probability of having a third
boy is 0.8. Under these conditions, what is now the probability of a family having
three boys in a row?
3.17 As a follow-up to the concept of independence of two events A and B,
Event A is said to be attracted to event B if
P (A|B) > P (A)
(3.83)
(3.84)
(Of course, when P (A|B) = P (A), the two events have been previously identied
as independent.) Establish the result that if B attracts A, then: (i) A attracts B
84
Random Phenomena
APPLICATION PROBLEMS
3.23 Patients suering from manic depression and other similar disorders are sometimes treated with lithium, but the dosage must be monitored carefully because
lithium toxicity, which is often fatal, can be dicult to diagnose. A new assay used
to determine lithium concentration in blood samples is being promoted as a reliable
way to diagnose lithium toxicity because the assay result is purported to correlate
very strongly with toxicity.
A careful study of the relationship between this blood assay and lithium toxicity
in 150 patients yielded results summarized in Table 3.3. Here A+ indicates high
lithium concentrations in the blood assay and A indicates low lithium concentration; L+ indicates conrmed Lithium toxicity and L indicates no lithium toxicity.
(i) From these data, compute the following probabilities regarding the lithium toxicity status of a patient chosen at random::
85
TABLE 3.3:
Lithium toxicity
study results
Lithium Toxicity
Assay L+
L
Total
A+
30
17
47
A
21
82
103
Total
51
92
150
1. P (L+ ), the probability that the patient has lithium toxicity (regardless of the
blood assay result);
2. P (L+ |A+ ), the conditional probability that the patient has lithium toxicity
given that the blood assay result indicates high lithium concentration. What
does this value indicate about the potential benet of having this assay result
available?
3. P (L+ |A ) the conditional probability that the patient has lithium toxicity
given that the blood assay result indicates low lithium concentration. What
does this value indicate about the potential for missed diagnoses?
(ii) Compute the following probabilities regarding the blood lithium assay:
1. P (A+ ), the (total) probability of observing high lithium blood concentration
(regardless of actual lithium toxicity status);
2. P (A+ |L+ ) the conditional probability that the blood assay result indicates
high lithium concentration given that the patient indeed has lithium toxicity.
Why do you think that this quantity is referred to as the sensitivity of the
assay, and what does the computed value indicate about the sensitivity of
the particular assay in this study?
3. From information about P (L+ ) (as the prior probability of lithium toxicity)
along with the just computed values of P (A+ ) and P (A+ |L+ ) as the relevant
assay results, now use Bayes Rule to compute P (L+ |A+ ) as the posterior
probability of lithium toxicity after obtaining assay data, even though it has
already been computed directly in (i) above.
3.24 An experimental crystallizer produces ve dierent polymorphs of the same
crystal via mechanisms that are currently not well-understood. Types 1, 2 and 3
are approved for pharmaceutical application A; Types 2, 3 and 4 for a dierent
application B; Type 5 is mostly unstable and has no known application. How much
of each type is made in any batch varied randomly, but with the current operating
procedure, 30% of the total product made by the crystallizer in a month is of Type
1; 20% is of Type 2, with the same percentage of Types 3 and 4; and 10% is of Type
5. Assuming that the polymorhps can be separated without loss,
(i) Determine the probability of making product in a month that can be used for
application A;
(ii) Given a batch ready to be shipped for application B, what is the probabilities
that any crystal selected at random is of Type 2? What is the probability that it is
of Type 3 or Type 4. State any assumptions you may need to make.
86
Random Phenomena
(iii) What is the probability that an order change to one for application A can be
lled from a batch ready to be shipped for application B?
(iv) What is the converse probability that an order change to one for application B
can be lled given a batch that is ready to be shipped for application A?
3.25 A test for a relatively rare disease involves taking from the patient an appropriate tissue sample which is then assessed for abnormality. A few sources of error
are associated with this test. First, there is a small, but non-zero probability, s ,
that the tissue sampling procedure will miss abnormal cells primarily because these
cells (at least in the earlier stages) being relatively few in number, are randomly distributed in the tissue and tend not to cluster. In addition, during the examination
of the tissue sample itself, there is a probability, f , of failing to identify an abnormality when present; and a probability, m , of misclassifying a perfectly normal cell
as abnormal.
If the proportion of the population with this disease who are subjected to this
test is D ,
(i) In terms of the given parameters, determine the probability that the test result is
correct. (Hint: rst compute the probability that the test result is incorrect, keeping
in mind that the test may identify an abnormal cell incorrectly as normal, or a
normal cell as abnormal.)
(ii) Determine the probability of a false positive (i.e., returning an abnormality result
when none exists).
(iii) Determine the probability of a false negative (i.e., failing to identify an abnormality that is present).
3.26 Repeat Problem 3.25 for the specic values of s = 0.1; f = 0.05; m = 0.1
for a population in which 2% have the disease. A program sponsored by the Center
for Disease Control (CDC) is to be aimed at reducing the number of false positives
and/or false negatives by reducing one of the three probabilities s , f , and m .
Which of these parameters would you recommend and why?
3.27 A manufacturer of at-screen TVs purchases pre-cut glass sheets from three
dierent manufacturers, M1 , M2 and M3 , whose products are characterized in the
TV manufacturers incoming material quality control lab as premier grade, Q1 ,
acceptable grade, Q2 , and marginal grade, Q3 , on the basis of objective, measurable quality criteria, such as inclusions, warp, etc. Incoming glass sheets deemed
unacceptable are rejected and returned to the manufacturer. An incoming batch of
425 accepted sheets has been classied by an automatic classifying system as shown
in the table below.
Quality
Manufacturer
M1
M2
M3
Premier
Q1
110
150
76
Acceptable
Q2
25
33
13
Marginal
Q3
15
2
1
Total
150
185
90
87
(ii) Determine the probability that it is of premier grade given that it is from
manufacturer M1 ; also determine the probability that is of premier grade given
that it is from either manufacturer M2 or M3 .
(iii) Determine the probability that it is from manufacturer M3 given that it is of
marginal grade; also determine the probability that it is from manufacturer M2
given that it is of acceptable grade.
3.28 In a 1984 report2 , the IRS published the information shown in the following
table regarding 89.9 million federal tax returns it received, the income bracket of
the lers, and the percentage audited.
Income
Bracket
Below $10, 000
$10, 000 $24, 999
$25, 000 $49, 999
$50, 000 and above
Number of
lers (millions)
31.4
30.7
22.2
5.5
Percent
Audited
0.34
0.92
2.05
4.00
(i) Determine the probability that a tax ler selected at random from this population
would be audited.
(ii) Determine the probability that a tax ler selected at random is in the $25, 000
$49, 999 income bracket and was audited.
(iii) If we know that a tax ler selected at random was audited, determine the
probability that this person belongs in the $50, 000 and above income bracket.
2 Annual Report of Commissioner and Chief Counsel, Internal Revenue Service, U.S.
Department of Treasury, 1984, p 60.
88
Random Phenomena
Chapter 4
Random Variables and Distributions
4.1
4.2
4.3
4.4
4.5
4.6
90
90
93
94
95
95
98
100
102
102
104
107
107
113
115
116
119
119
122
122
123
124
124
126
129
133
Even though the machinery of probability as presented thus far can already be
used to solve some practical problems, its development is far from complete.
In particular, with a sample space of raw outcomes that can be anything from
attributes and numbers, to letters and other sundry objects, this most basic
form of probability will be quite tedious and inecient in dealing with general
random phenomena. This chapter and the next one are devoted to completing
the development of the machinery of probability with the introduction of the
concept of the random variable, from which arises the probability distribution functionan ecient mathematical form for representing the ensemble
behavior of general random phenomena. The emergence, properties and characteristics of the probability distribution function are discussed extensively in
89
90
Random Phenomena
this chapter for single dimensional random variables; the discussion is generalized to multi-dimensional random variables in the next chapter.
4.1
4.1.1
In general, the sample space presented thus far may be quite tedious
to describe and inecient to analyze mathematically if its elements are not
numbers. To facilitate mathematical analysis, it is desirable to nd a means
of converting this sample space into one with real numbers. This is achieved
via the vehicle of the random variable dened as follows:
Upon the introduction of this entity, X, the following happens (See Fig
4.1):
1. is mapped onto V , i.e.
V = {x : X() = x, }
(4.1)
so that V is the set of all values x generated from X() = x for all
elements in the sample space ;
2. The probability set function encountered before, P , dened on , gives
rise to another probability set function, PX , dened on V and induced
by X. PX is therefore often referred to as an induced probability set
function.
The role of PX in V is identical to that of P in . Thus, for any arbitrary
subsect A of V , PX (A) is the probability of event A occurring.
The primary question of practical importance may now be stated as follows: How does one nd PX (A) in the new setting created by the introduction
of the random variable X, given the original sample space , and the original
probability set function P dened on it?
The answer is to go back to what we know, i.e., to nd that set A
91
*A
Z
X(Z) = x
FIGURE 4.1: The original sample space, , and the corresponding space V induced
by the random variable X
which corresponds to the set of values of in that are mapped by X into
A, i.e.
(4.2)
A = { : and X() A}
Such a set A is called the pre-image of A, that set on the original sample
space from which A is obtained when X is applied on its elements (see Fig
4.1). We now simply dene
PX (A) = P (A )
(4.3)
P {X() A} = P { A }
(4.4)
since, by denition of A ,
from where we see how X induces PX (.) from the known P (.). It is easy to
show that the induced PX is an authentic probability set function in the spirit
of Kolmogorovs axioms.
Remarks:
1. The random variable is X; the value it takes is the real number x. The
one is a completely dierent entity from the other.
2. The expression P (X = x) will be used to indicate the probability that
the application of the random variable X results in an outcome with
assigned value x; or, more simply, the probability that the random
variable X takes on a particular value x. As such, X = x should
not be confused with the familiar arithmetic statement of equality or
equivalence.
3. In many instances, the starting point is the space V and not the tedious
sample space , with PX (.) already dened so that there is no further
need for reference to a P (.) dened on .
92
Random Phenomena
(4.5)
(4.6)
(4.7)
since these are all the possible values that X can take.
(2) To obtain PX (A), rst we nd A , the pre-image of A in . In this
case,
(4.8)
A = {5 , 6 , 7 }
so that upon recalling the probability set function P (.) generated in
Chapter 3 on the assumption of equiprobable outcomes, we obtain
P (A ) = 3/8, hence,
PX (A) = P (A ) = 3/8
(4.9)
The next two examples illustrate sample spaces that occur naturally in the
form of V .
Example 4.2 SAMPLE SPACE FOR SINGLE DIE TOSS EXPERIMENT
Consider an experiment in which a single die is thrown and the outcome
is the number that shows up on the dies top face when it comes to rest.
Obtain the sample space of all possible outcomes.
Solution:
The required sample space is the set {1, 2, 3, 4, 5, 6}, since this set
of numbers is an exhaustive collection of all the possible outcomes of
this experiment. Observe that this is a set of real numbers, so that it
is already in the form of V . We can therefore dene a probability set
function directly on it, with no further need to obtain a separate V and
an induced PX (.).
93
(4.10)
(4.11)
As an exercise, (see Exercise 4.7) the reader should compute the probability
PX (A) of the event A that X = 7, assuming equiprobable outcomes for each
die toss.
94
4.1.2
Random Phenomena
Practical Considerations
Rigor and precision are intrinsic to mathematics and mathematical analysis; without the former, the latter simply cannot exist. Such is the case with
the mathematical concept of the random variable as we have just presented
it: rigor demands that X be specied in this manner, as a function through
whose agency each element of the sample space of an experiment becomes
associated with an unambiguous numerical value. As illustrated in Fig 4.1, X
therefore appears as a mapping from one space, , that can contain all sorts
of raw objects, into one that is more conducive to mathematical analysis, V ,
containing only real numbers. Such a formal denition of the random variable
tends to appear sti, and almost sterile; and those encountering it for the rst
time may be unsure of what it really means in practice.
As a practical matter, the random variable may be considered (informally)
as an experimental outcome whose numerical value is subject to random variations with each exact replicate performance (trial) of the experiment. Thus,
for example, with the three coin-toss experiment discussed earlier, by specifying the outcome of interest as the total number of tails observed, we see
right away that the implied random variable can take on numerical values 0,
1, 2, or 3, even though the raw outcomes will consist of T s and Hs; also what
value the random variable takes is subject to random variation each time the
experiment is performed. In the same manner, we see that in attempting to
determine the temperature of an equilibrium mixture of ice and water, the observed temperature measurement in C takes on numerical values that vary
randomly around the number 0.
4.1.3
(4.12)
(4.13)
observe that the random variable space V in this case is given by:
V = {x : 0 x 1}.
(4.14)
95
(4.15)
(4.16)
Note that the two component random variables X1 and X2 are not
independent since their sum, X1 + X2 , by virtue of the experiment, is
constrained to equal 3 always.
What is noted briey here for two dimensions can be generalized to ndimensions, and the next chapter is devoted entirely to a discussion of multidimensional random variables.
4.2
4.2.1
Distributions
Discrete Random Variables
Let us return once more to Example 4.1 and, this time, for each element
of V , compute P (X = x), and denote this by f (x); i.e.
f (x) = P (X = x)
(4.17)
(4.18)
(4.19)
96
Random Phenomena
Likewise,
f (2) = P (X = 2) = 3/8
(4.20)
f (3) = P (X = 3) = 1/8
(4.21)
This function, f (x), indicates how the probabilities are distributed over the
entire random variable space.
Of importance also is a dierent, but related, function, F (x), dened as:
F (x) = P (X x)
(4.22)
the probability that the random variable X takes on values less than or equal
to x. For the specic example under consideration, we have: F (0) = P (X
0) = 1/8. As for F (1) = P (X 1), since the event A = {X 1} consists of
two mutually exclusive elementary events A0 = {X = 0} and A1 = {X = 1},
it then follows that:
F (1) = P (X 1) = P (X = 0) + P (X = 1) = 1/8 + 3/8 = 4/8
(4.23)
(4.25)
TABLE 4.1:
f (x) and F (x) for
the three coin-toss
experiments of
Example 4.1
x
f (x) F (x)
0
1/8
1/8
1
3/8
4/8
2
3/8
7/8
3
1/8
8/8
The function, f (x), is referred to as the probability distribution function
(pdf), or sometimes as the probability mass function; F (x) is known as the
cumulative distribution function, or sometimes simply as the distribution function.
Note, once again, that X can assume only a nite number of discrete
values, in this case, 0, 1, 2, or 3; it is therefore a discrete random variable,
and both f (x) and F (x) are discrete functions. As shown in Fig 4.2, f (x) is
characterized by non-zero spikes at values of x = 0, 1, 2 and 3, and F (x) by
the indicated staircase form.
97
1.0
0.35
0.8
0.30
F(x)
f(x)
0.6
0.25
0.4
0.20
0.2
0.15
0.0
0.10
0.0
0.5
1.0
1.5
x
2.0
2.5
3.0
2
x
FIGURE 4.2: Probability distribution function, f (x), and cumulative distribution function, F (x), for 3-coin toss experiment of Example 4.1
Let x0 = 0, x1 = 1, x2 = 2, x3 = 3; then
P (X = xi ) = f (xi ) for i = 0, 1, 2, 3
(4.26)
1/8;
3/8;
f (xi ) =
3/8;
1/8;
x0
x1
x2
x3
=0
=1
=2
=3
(4.27)
and the two functions in Table 4.1 are related explicitly according to the
following expression:
F (xi ) =
i
f (xj )
(4.28)
j=0
We may now also note the following about the function f (xi ):
f (xi ) > 0; xi
3
i=0
f (xi ) = 1
These ideas may now be generalized beyond the specic example used above.
98
Random Phenomena
Denition: Let there exist a sample space (along with a probability set function, P , dened on its subsets), and a random variable X, with an attendant random variable space V : a function f
dened on V such that:
1. f (x) 0; x V ;
x f (x) = 1; x V ;
3. PX (A) = xA f (x); for A V (and when A contains the
single element xi , PX (X = xi ) = f (xi ))
2.
4.2.2
Denition: The function f dened on the space V (whose elements consist of segments of the real line) such that:
1. f (x) 0; x V ;
2. f has at most a nite number of discontinuities in every nite
interval;
3. The (Riemann) integral, V f (x)dx = 1;
4. PX (A) = A f (x)dx; for A V
is called a probability density function of the continuous random
variable X.
(The second point above, unnecessary for the discrete case, is a mathematical ne point needed to safeguard against pathological situations where the
99
from where we may now observe that when F (x) possesses a derivative,
dF (x)
= f (x)
dx
(4.30)
This f (x) is the continuous counterpart of the discrete f (x) encountered earlier; but rather than express the probability that X takes on a particular point
value xi (as in the discrete case), the continuous f (x) expresses a measure of
the probability that X lies in the innitesimal interval between xi and xi + dx.
Observe, from item 4 in the denition given above, that:
P (xi X xi + dx) =
xi +dx
(4.31)
xi
=
=
P (x X) + P (x X x + dx)
F (x) + P (x X x + dx)
(4.33)
and therefore:
P (x X x + dx) = F (x + dx) F (x)
(4.34)
which, upon introducing Eq (4.31) for the LHS, dividing by dx, and taking
limits as dx 0, yields:
F (x + dx) F (x)
dF (x)
lim
= f (x)
(4.35)
=
dx0
dx
dx
establishing Eq (4.30).
In general, we can use Eq (4.29) to establish that, for any arbitrary b a,
P (a X b) =
(4.36)
100
Random Phenomena
For the sake of completeness, we note that F (x), the cumulative distribution function, is actually the more fundamental function for determining
probabilities. This is because, regardless of whether X is continuous or discrete, F (.) can be used to determine all desired probabilities. Observe from
the foregoing discussion that the expression
P (a1 < X a2 ) = F (a2 ) F (a1 )
(4.37)
4.2.3
We have now seen that the pdf f (x) (or equivalently, the cdf F (x)) is the
function that indicates how the probabilities of occurrence of various outcomes
and events arising from the random phenomenon in question are distributed
over the entire space of the associated random variable X.
Let us return once more to the three coin-toss example: we understand
that the random phenomenon in question is such that we cannot predict, `
apriori, the specic outcome of each experiment; but from the ensemble aggregate of all possible outcomes, we have been able to characterize, with f (x), the
behavior of an associated random variable of interest, X, the total number
of tails obtained in the experiment. (Note that other random variables could
also be dened for this experiment: for example, the total number of heads,
or the number of tosses until the appearance of the rst head, etc.) What
Table 4.1 provides is a complete description of the probability of occurrence
for the entire collection of all possible events associated with this random
variablea description that can now be used to analyze the particular random phenomenon of the total number of tails observed when a coin is tossed
three times.
For instance, the pdf f (x) indicates that, even though we cannot predict a
specic outcome precisely, we now know that after each experiment, observing
no tails (X = 0) is just as likely as observing all tails (X = 3), each with
a probability of 1/8. Also, observing two tails is just as likely as observing
one tail, each with a probability of 3/8, so that these latter group of events
are three times as likely as the former group of events. Note the symmetry of
the distribution of probabilities indicated by f (x) for this particular random
phenomenon.
It turns out that these specic results can be generalized for the class of
random phenomena to which the three coin-toss example belongs a class
characterized by the following features:
101
n!
px (1 p)nx ; x = 0, 1, 2, . . . , n
x!(n x)!
(4.38)
The results in Table 4.1 are obtained for the special case n = 3; p = 0.5.
Such functions as these provide convenient and compact mathematical
representations of the desired ensemble behavior of random variables; they
constitute the centerpiece of the probabilistic framework the fundamental
tool used for analyzing random phenomena.
We have, in fact, already encountered in earlier chapters, several actual
pdfs for some real-world random variables. For example, we had stated in
Chapter 1 (thus far without justication) that the continuous random variable
representing the yield obtained from the example manufacturing processes has
the pdf:
(x)2
1
f (x) = e 22 ; < x <
(4.39)
2
We are able to use this pdf to compute the probabilities of obtaining yields
in various intervals on the real line for the two contemplated processes, once
the parameters and are specied for each process.
We had also stated in Chapter 1 that, for the (discrete) random variable
X representing the number of inclusions found on the manufactured glass
sheet, the pdf is:
e x
; x = 0, 1, 2, . . .
(4.40)
f (x) =
x!
from which, again, given a specic value for the parameter , we are able to
compute the probabilities of nding any given number of inclusions on any
selected glass sheet. And in Chapter 2, we showed, using chemical engineering
principles, that the pdf for the (continuous) random variable X, representing
the residence time in an ideal CSTR, is given by:
f (x) =
1 x/
e
;0 < x <
(4.41)
102
Random Phenomena
These pdfs are all ideal models of the random variability associated with
each of the random variables in question; they make possible rigorous and
precise mathematical analyses of the ensemble behavior of the respective random phenomena. Such mathematical representations are systematically derived for actual, specic real-world phenomena of practical importance in Part
III, where the resulting pdfs are also discussed and analyzed extensively.
The rest of this chapter is devoted to taking a deeper look at the fundamental characteristics and general properties of the pdf, f (x), for singledimensional random variables; the next chapter is devoted to a parallel treatment for multi-dimensional random variables.
4.3
Mathematical Expectation
We begin our investigations into the fundamental characteristics of a random variable, X, and its pdf, f (x), with one of the most important: the
mathematical expectation or expected value. As will soon become clear,
the concept of expectations of random variables (or functions of random variables) is of signicant practical importance; but before giving a formal denition, we rst provide a motivation and an illustration of the concept.
4.3.1
Consider a game where each turn involves a player drawing a ball at random from a black velvet bag containing 9 balls, identical in every way except
that 5 are red, 3 are blue and one is green. The player receives $1.00 for
drawing a red ball, $4.00 for a blue ball, and $10.00 for the green ball, but
each turn at the game costs $4.00 to play. The question is: Is this game worth
playing?
The primary issue, of course, is the random variation in the color of the
drawn ball each time the game is played. Even though simple and somewhat
articial, this example provides a perfect illustration of how to solve problems
involving random phenomena using the probabilistic framework.
To arrive at a rational decision regarding whether to play this game or
not, we proceed as follows, noting rst the following characteristics of the
phenomenon in question:
Experiment : Draw a ball at random from a bag containing 9 balls composed as given above; note the color of the drawn ball, then replace the
ball;
Outcome: The color of the drawn ball: R = Red; B = Blue; G = Green.
Probabilistic Model Development
103
TABLE 4.2:
The pdf f (x) for
the ball-drawing
game
x
f (x)
1
5/9
4
3/9
10
1/9
From the problem denition, we see that the sample space is given by:
= {R, R, R, R, R, B, B, B, G}
(4.42)
The random variable, X, is clearly the monetary value assigned to the outcome
of each draw; i.e. in terms of the formal denition, X assigns the real number
1 to R, 4 to B, and 10 to G. (Informally, we could just as easily say that X is
the amount of money received upon each draw.) The random variable space
V is therefore given by:
V = {1, 4, 10}
(4.43)
And now, since there is no reason to think otherwise, we assume that each
outcome is equally probable, in which case the probability distribution for the
random variable X is obtained as follows:
PX (X = 1) = P (R) =
PX (X = 4) = P (B)
PX (X = 10) = P (G)
=
=
5/9
(4.44)
3/9
1/9
(4.45)
(4.46)
so that f (x), the pdf for this discrete random variable, is as shown in the
Table 4.2, or, mathematically as:
5/9; x1 = 1
3/9; x2 = 4
(4.47)
f (xi ) =
1/9; x3 = 10
0;
otherwise
This is an ideal model of the random phenomenon underlying this game; it
will now be used to analyze the problem and to decide rationally whether to
play the game or not.
Using the Model
We begin by observing that this is a case where it is possible to repeat the
experiment a large number of times; in fact, this is precisely what the person
setting up the game wants each player to do: play the game repeatedly! Thus,
if the game is played a very large number of times, say n, it is reasonable from
the model to expect 5n/9 red ball draws, 3n/9 blue ball draws, and n/9 green
104
Random Phenomena
ball draws; the corresponding nancial returns will be $(5n/9), $(4 3n/9),
and $(10 n/9), respectively, in each case.
Observe now that after n turns at the game, we would expect the total
nancial returns in dollars, say Rn , to be:
Rn =
3n
n
5n
+4
+ 10
1
= 3n
9
9
9
(4.48)
TABLE 4.3:
game
Ball
Color
Expected # of Financial
times drawn
returns
(after n trials) per draw
Red
5n/9
1
Blue
3n/9
4
n/9
10
Green
Total
Expected nancial
returns
(after n trials)
$5n/9
$12n/9
$10n/9
3n
In the meantime, the total cost Cn , the amount of money, in dollars, paid
out to play the game, would have been 4n. On the basis of these calculations,
therefore, the expected net gain (in dollars) after n trials, Gn , is given by
Gn = Rn Cn = n
(4.49)
indicating a net loss of $n, so that the rational decision is not to play the
game. (The house always wins!)
Eq (4.48) implies that the expected return per draw will be:
Rn
=
n
5
3
1
1 + 4 + 10
= 3,
9
9
9
(4.50)
Rn
=
xi f (xi )
n
i=1
(4.51)
4.3.2
105
(4.53)
(4.54)
(4.55)
(4.56)
(4.57)
106
Random Phenomena
indicating that with this experiment, the expected, or average, number
of tails per toss is 1.5, which makes perfect sense.
(2) The expected nancial return for the ball-draw game is obtained
formally from Eq (4.47) as:
E(X) = (1 5/9 + 4 3/9 + 10 1/9) = 3.0
(4.58)
0;
otherwise
(2) Find the expected value of the random variable, X, the residence
time in a CSTR, whose pdf f (x) is given in Eq (4.41).
Solution:
(1) First, we observe that Eq (4.59) is a legitimate pdf because
2
2
1
1
f (x)dx =
(4.60)
xdx = x2 = 1
4
0 2
0
and, by denition,
2
1 2 2
1
4
E(X) =
xf (x)dx =
x dx = x3 =
2 0
6
3
0
(2) In the case of the residence time,
1 x/
1 x/
xe
E(X) =
dx =
xe
dx
0
(4.61)
(4.62)
An important property of the mathematical expectation of a random variable X is that for any function of this random variable, say G(X),
for discrete X
i G(xi )f (xi );
E[G(X)] =
(4.64)
G(x)f
(x);
for
continuous
X
107
(4.65)
where c1 and c2 are constants, then from Eq (4.64) above, in the discrete case,
we have that:
(c1 xi + c2 )f (xi )
E(c1 X + c2 ) =
i
c1
xi f (xi ) + c2
f (xi )
c1 E(X) + c2
(4.66)
so that:
E(c1 X + c2 ) = c1 E(X) + c2
(4.67)
4.4
Characterizing Distributions
4.4.1
Moments of a Distributions
(4.68)
for any integer k. The expectation of this function is known as the k th (ordinary) moment of the random variable X (or, equivalently, the k th (ordinary)
moment of the pdf, f (x)), dened by:
mk = E[X k ]
(4.69)
(4.70)
Thus, the expected value of X, E(X), is also the same as the rst (ordinary)
108
Random Phenomena
(4.71)
for any constant value a and integer k. The expectation of this function is
known as the k th moment of the random variable X about the point a (or,
equivalently, the k th moment of the pdf, f (x), about the point a). Of particular
interest are the moments about the mean value , dened by:
k = E[(X )k ]
(4.72)
known as the central moments of the random variable X (or of the pdf, f (x)).
Observe from here that 0 = 1, and 1 = 0, always, regardless of X or ; these
therefore provide no particularly useful information regarding the characteristics of any particular X. However, provided that the conditions of absolute
convergence and absolute integrability hold, the higher central moments exist
and do in fact provide very useful information about the random variable X
and its distribution.
Second Central Moment: Variance
Observe from above that the quantity
2 = E[(X )2 ]
(4.73)
is the lowest central moment of the random variable X that contains any
meaningful information about the average deviation of a random variable
from its mean value. It is called the variance of X and is sometimes represented
as 2 (X). Thus,
Note that
(4.74)
(4.75)
(4.76)
(4.77)
(4.78)
109
(4.79)
(4.80)
provides a dimensionless measure of the relative amount of variability displayed by the random variable.
Third Central Moment: Skewness
The third central moment,
3 = E[(X )3 ]
(4.81)
known as the coecient of skewness, is often the more commonly used measure
precisely because it is dimensionless. For a perfectly symmetric distribution,
negative deviations from the mean exactly counterbalance positive deviations,
and both 3 and 3 vanish.
When there are more values of X to the left of the mean than to the
right, (i.e. when negative deviations from the mean dominate), 3 < 0 (as is
3 ), and the distribution is said to skew left or is negatively skewed. Such
distributions will have long left tails, as illustrated in Fig 4.3. An example
random variable with this characteristic is the gasoline-mileage (in miles per
gallon) of cars in the US. While many cars get relatively high gas-mileage,
there remains a few classes of cars (SUVs, Hummers, etc) with gas-mileage
much worse than the ensemble average. It is this latter class that contribute
to the long left tail.
On the other hand, when there are more values of X to the right of the
mean than to the left, so that positive deviations from the mean dominate,
both 3 and 3 are positive, and the distribution is said to skew right
or is positively skewed. As one would expect, such distributions will have
long right tails (see Fig 4.4). An example of this class of random variables
is the household income/net-worth in the US. While the vast majority of
household incomes/net-worth are moderate, the few truly super-rich whose
incomes/net-worth are a few orders of magnitude larger than the ensemble
110
Random Phenomena
3.0
2.5
f(x)
2.0
1.5
1.0
0.5
0.0
0.0
0.2
0.4
0.6
0.8
1.0
3.0
2.5
f(x)
2.0
1.5
1.0
0.5
0.0
0.0
0.2
0.4
0.6
0.8
1.0
111
0.4
f(x)
0.3
0.2
0.1
0.0
-3
-2
-1
0
X
FIGURE 4.5: Distributions with reference kurtosis (solid line) and mild kurtosis (dashed
line)
(4.83)
technically known as the coecient of kurtosis, that is simply called the kurtosis. Either quantity is a measure of how peaked or at a probability distribution is. A high kurtosis random variable has a distribution with a sharper
peak and thicker tails; the low kurtosis random variable on the other hand
has a distribution with a more rounded, atter peak, with broader shoulders.
For reasons discussed later, the value 4 = 3 is the accepted normal
reference for kurtosis, so that distributions for which 4 < 3 are said to be
platykurtic (mildly peaked) while those for which 4 > 3 are said to be leptokurtic (sharply peaked). Figures 4.5 and 4.6 show a reference distribution
with kurtosis 4 = 3, in the solid lines, compared to a distribution with mild
kurtosis (actually 4 = 1.8) (dashed line in Fig 4.5), and a distribution with
high kurtosis (dashed line in Fig 4.6).
Practical Applications
Of course, it is possible to compute as many moments (ordinary or central) of
112
Random Phenomena
0.4
f(x)
0.3
0.2
0.1
0.0
-10
-5
0
X
10
FIGURE 4.6: Distributions with reference kurtosis (solid line) and high kurtosis (dashed
line)
113
Finally, we note that moments of a random variable are not merely interesting theoretical characteristics; they have signicant practical applications.
For example, polymers, being macromolecules with non-uniform molecular
weights (because random events occurring during the manufacturing process
ensure that polymer molecules grow to varying sizes) are primarily characterized by their molecular weight distributions (MWDs). Not surprisingly, therefore, the performance of a polymeric material depends critically on its MWD:
for instance, with most elastomers, a narrow distribution (very low second
central moments) is associated with poor processing but superior mechanical
properties.
MWDs are so important in polymer chemistry and engineering that a wide
variety of analytical techniques have been developed for experimental determination of the MWD and the following special molecular weight averages
that are in common use:
1. Mn , the number average molecular weight, is the ratio of the rst (ordinary) moment to the zeroth ordinary moment. (In polymer applications,
the MWD, unlike a pdf f (x), is not normalized to sum or integrate to
1. The zeroth moment of the MWD is therefore not 1; it is the total
number of molecules present in the sample of interest.)
2. Mw , the weight average molecular weight, is the ratio of the second
moment to the rst moment; and
3. Mz , the so-called z average molecular weight, is the ratio of the third
moment to the second.
One other important practical characteristic of the polymeric material is its
polydispersity index, PDI, the ratio of Mw to Mn . A measure of the breadth
of the MWD, it is always > 1 and approximately 2 for most linear polymers;
for highly branched polymers, it can be as high as 20 or even higher.
What is true of polymers is also true of particulate products such as granulated sugar, or fertilizer granules sold in bags. These products are made up
of particles with non-uniform sizes and are characterized by their particle size
distributions. The behavior of these products, whether it is their ow characteristics, or how they dissolve in solution, are determined by the moments of
these distributions.
4.4.2
ie
MX (t) =
tx
e f (x)dx; for continuous X
(4.85)
(4.86)
114
Random Phenomena
(4.88)
(4.89)
so that, for t = 0,
d tX
E Xe
= E X 2 etX
dt
MX
(0) = E X 2 = m2
(4.90)
(4.91)
MX (0) = E[X n ] = mn
(4.92)
X2 2 X3 3
t +
t +
2
3!
(4.93)
Clearly, this innite series converges only under certain conditions. For those
random variables, X, for which the series does not converge, MX (t) does not
exist; but when it exists, this series converges, and by repeated dierentiation
of Eq (4.93) with respect to t, followed by taking expectations, we are then
able to establish the result in Eq (4.92).
The following are some important properties of the MGF.
1. Uniqueness: The MGF, MX (t), does not exist for all random variables,
X; but when it exists, it uniquely determines the distribution, so that if
two random variables have the same MGF, they have the same distribution. Conversely, random variables with dierent MGFs have dierent
distributions.
115
(4.94)
(4.95)
(4.98)
(4.99)
4.4.3
Characteristic Function
As alluded to above, the MGF does not exist for all random variables, a
fact that sometimes limits its usefulness. However, a similarly dened function,
the characteristic function, shares all the properties of the MGF but does not
suer from this primary limitation: it exists for all random variables.
When G(X) in Eq (4.64) is given as:
G(X) = ejtX
(4.100)
where j is the complex variable (1), then the function of the real-valued
variable t dened as,
(4.101)
X (t) = E ejtX
116
Random Phenomena
i.e.
X (t) =
jtx
i
f (xi );
ie
for discrete X;
for continuous X
ejtx f (x)dx;
(4.102)
(4.103)
jtX
e = cos2 (tX) + sin2 (tX) = 1
(4.104)
jtX
= 1 < , always, regardless of X, with the direct impliso that E e
cation that X (t) = E(ejtX ) always exists for all random variables. Thus,
anything one would have typically used the MGF for (e.g., for deriving limit
theorems in advanced courses in probability), one can always substitute the
CF when the MGF does not exist.
The reader familiar with Laplace transforms and Fourier transforms will
probably have noticed the similarities between the former and the MGF (see
Eq (4.86)), and between the latter and the CF (see Eq (4.102)). Furthermore,
the relationship between these two probability functions are also reminiscent
of the relationship between the two transforms: not all functions have Laplace
transforms; the Fourier transform, on the other hand, does not suer such
limitations.
We now state, without proof, that given the expression for the characteristic function in Eq (4.102), there is a corresponding inversion formula whereby
f (x) is recovered from X (t), given as follows:
b jtx
1
e
X (t)dt; for discrete X;
limb 2b
b
(4.105)
f (x) =
1 jtx
e
(t)dt;
for
continuous
X
X
2
In fact, the two sets of equations, Eqs (4.102) and (4.105), are formal Fourier
transform pairs precisely as in other engineering applications of the theory of
Fourier transforms. These transform pairs are extremely useful in obtaining
the pdfs of functions of random variables, most especially sums of random
variables. As with classic engineering applications of the Fourier (and Laplace)
transform, the characteristic functions of the functions of independent random
variables in question are obtained rst, being easier to obtain directly than
the pdfs; the inversion formula is subsequently invoked to recover the desired
pdfs. This strategy is employed at appropriate places in upcoming chapters.
4.4.4
Apart from the mean, variance and other higher moments noted above,
there are other characteristic attributes of importance.
117
x* = 1 (Mode)
f(x)
0.3
0.2
0.1
0.0
4
X
(4.108)
(4.109)
xm
(For the discrete random variable, replace the integral above with appropriate
118
Random Phenomena
100
F(x); Percent
80
75
60
50
40
25
2.140
1.58
1.020
20
FIGURE 4.8: The cdf of a continuous random variable X showing the lower and upper
quartiles and the median
sums.) Observe therefore that the median, xm , divides the total range of the
random variable into two parts with equal probability.
For a symmetric unimodal distribution, the mean, mode and median coincide; they are dierent for asymmetric (skewed) distributions.
Quartiles
The concept of a median, which divides the cdf at the 50% point, can be
extended to other values indicative of other fractional sectioning o of the
cdf. Thus, by referring to the median as x0.5 , or x50 , we are able to dene, in
the same spirit, the following values of the random variable, x0.25 and x0.75
(or, in terms of percentages, x25 and x75 respectively) as follows:
F (x0.25 ) = 0.25
(4.110)
(4.111)
the value of X below which lies three quarters of the population. These values
are known respectively as the lower and upper quartiles of the distribution
because, along with the median x0.5 , these values divide the population into
four quarters, each part with equal probability.
These concepts are illustrated in Fig 4.8 where the lower quartile is located
at x = 1.02; the median at x = 1.58 and the upper quartile at x = 2.14. Thus,
for this particular example, P (X < 1.02 = 0.25); P (1.02 < X < 1.50) =
0.25; P (1.58 < X < 2.14) = 0.25 and P (X > 1.58) = 0.25.
119
There is nothing restricting us to dividing the population in halves (median) or in quarters (quartiles); in general, for any 0 < q < 1, the q th quantile
is dened as that value xq of the random variable for which
xq
F (xq ) =
f (x)dx = q
(4.112)
for a continuous random variable (with the integral replaced by the appropriate sum for the discrete random variable).
This quantity is sometimes dened instead in terms of percentiles, in which
case, the q th quantile is simply the 100q percentile. Thus, the median is equivalently the half quantile, the 50th percentile, or the second quartile.
4.4.5
Entropy
(4.113)
(4.114)
(4.115)
known as the entropy of the random variable, or, its mean information content.
Chapter 10 explores how to use the concept of information and entropy to
develop appropriate probability models for practical problems in science and
engineering.
4.4.6
Probability Bounds
We now know that the pdf f (x) of a random variable contains all the
information about it to enable us compute the probabilities of occurrence of
various outcomes of interest. As valuable as this is, there are times when all
we need are bounds on probabilities, not exact values. We now discuss some
of the most important results regarding bounds on probabilities that can be
determined for any general random variable, X without specic reference to
120
Random Phenomena
any particular pdf. These results are very useful in analyzing the behavior of
random phenomena and have practical implications in determining values of
unknown population parameters.
We begin with a general lemma from which we then derive two important
results.
E[G(X)]
c
(4.116)
There are several dierent ways of proving this result; one of the most
direct is shown below.
Proof : By denition,
E[G(X)] =
G(x)f (x)dx
(4.117)
If we now divide the real line < x < into two mutually
exclusive regions, A = {x : G(x) c} and B = {x : G(x) < c}, i.e.
A is that region on the real line where G(x) c, and B is what is
left, then, Eq (4.117) becomes:
G(x)f (x)dx +
G(x)f (x)dx
(4.118)
E[G(X)] =
A
where the last inequality arises because, for all x A, (the region
over which we are integrating) G(x) c, with the net results that:
E[G(X)] cP (G(X) c)
(4.120)
121
This remarkable result holds for all random variables, X, and for any nonnegative functions of the random variable, G(X). Two specic cases of G(X)
give rise to results of special interest.
Markovs Inequality
When G(X) = X, Eq (4.116) immediately becomes:
P (X c)
E(X)
c
(4.122)
a result known as Markovs inequality. It allows us to place bounds on probabilities when only the mean value of a random variable is known. For example,
if the average number of inclusions on glass sheets manufactured in a specic
site is known to be 2, then according to Markovs inequality, the probability
of nding a glass sheet containing 5 or more inclusions at this manufacturing
site can never exceed 2/5. Thus if glass sheets containing 5 or more inclusions
are considered unsaleable, without reference to any specic probability model
of the random phenomenon in question, the plant manager concerned about
making unsaleable product can, by appealing to Markovs inequality, be sure
that things will never be worse than 2 in 5 unsaleable products.
It is truly remarkable, of course, that such statements can be made at all;
but in fact, this inequality is actually quite conservative. As one would expect,
with an appropriate probability model, one can be even more precise. (Table
2.1 in Chapter 2 in fact shows that the actual probability of obtaining 5 or
more inclusions on glass sheets manufactured at this site is 0.053, nowhere
close to the upper limit of 0.4 given by Markovs inequality.)
Chebychevs Inequality
Now let G(X) = (X )2 , and c = k 2 2 , where is the mean value of X,
and 2 is the variance, i.e. 2 = E[(x )2 ]. In this case, Eq (4.116) becomes
P [(X )2 k 2 2 ]
1
k2
(4.123)
1
,
k2
(4.124)
122
Random Phenomena
2
2
=
9
9
(4.125)
4.5
4.5.1
Survival Function
The survival function, S(x), is the probability that the random variable
X exceeds the specic value x; in lifetime applications, this translates to the
probability that the object of study survives beyond the value x, i.e.
S(x) = P (X > x)
(4.126)
(4.127)
123
Find the survival function S(x), for the random variable, X, the residence time in a CSTR, whose pdf is given in Eq (4.41). This function
directly provides the probability that any particular dye molecule survives in the CSTR beyond a time x.
Solution:
Observe rst that this random variable is continuous and non-negative
so that the desired S(x) does in fact exist. The required S(x) is given
by
1 x/
e
dx = ex/
(4.128)
S(x) =
x
We could equally well have arrived at the result by noting that the cdf
F (x) for this random variable is given by:
F (x) = (1 ex/ ).
(4.129)
4.5.2
Hazard Function
f (x)
f (x)
=
S(x)
1 F (x)
(4.130)
provides just such a function. It does for future failure what f (x) does for
lifetimes in general. Recall that by denition, because X is continuous, f (x)
provides the (unconditional) probability of a lifetime in the innitesimal interval {xi < X < xi + dx} as f (xi )dx; in the same manner, the probability of
failure occurring in that same interval, given that the object of study survived
until the beginning of the current time interval, xi , is given by h(xi )dx. In
general
P (x < X < x + dx)
f (x)dx
=
(4.131)
h(x)dx =
S(x)
P (X > x)
so that, from the denition of conditional probability given in Chapter 3,
h(x)dx is seen as equivalent to P (x < X < x + dx|X > x). h(x) is therefore
sometimes referred to as the death rate of failure rate at x of those surviving until x (i.e. of those at risk at x); it describes how the risk of failure
changes with age.
Example 4.9 HAZARD FUNCTION OF A CONTINUOUS
RANDOM VARIABLE
124
Random Phenomena
Find the hazard function h(x), for the random variable, X, the residence
time in a CSTR.
Solution:
From the given pdf and the survival function obtained in Example 4.8
above, the required function h(x) is given by,
h(x) =
1 x/
e
ex/
(4.132)
4.5.3
Analogous to the cdf, F (x), the cumulative hazard function, H(x), is dened
as:
x
H(x) =
h(u)du
(4.133)
0
It can be shown that H(x) is related to the more well-known F (x) according
to
(4.134)
F (x) = 1 eH(x)
and that the relationship between S(x) and H(x) is given by:
S(x) = eH(x)
(4.135)
H(x) = log[S(x)]
(4.136)
or, conversely,
4.6
We are now in a position to look back at this chapter and observe, with
some perspective, how the introduction of the seemingly innocuous random
variable, X, has profoundly aected the analysis of randomly varying phenomena in a manner analogous to how the introduction of the unknown
quantity, x, transformed algebra and the solution of algebraic problems. We
have seen how the random variable, X, maps the sometimes awkward and
125
tedious sample space, , into a space of real numbers; how this in turn leads
to the emergence of f (x), the probability distribution function (pdf); and
how f (x) has essentially supplanted and replaced the probability set function,
P (A), the probability analysis tool in place at the end of Chapter 3.
The full signicance of the role of f (x) in random phenomena analysis may
not be completely obvious now, but it will become more so as we progress in
our studies. So far, we have used it to characterize the random variable in
terms of its mathematical expectation, and the expectation of various other
functions of the random variable. And this has led, among other things, to our
rst encounter with the mean, variance, skewness and kurtosis, of a random
variable, important descriptors of data that we are sure to encounter again
later (in Chapter 12 and beyond).
Despite initial appearances, every single topic discussed in this chapter
nds useful application in later chapters. In the meantime, we have taken
pains to try and breathe some practical life into many of these typically dry
and formal denitions and mathematical functions. But if some, especially
the moment generating function, the characteristic function, and entropy, still
appear to be of dubious practical consequence, such lingering doubts will be
dispelled completely by Chapters 6, 8, 9 and 10. Similarly, the probability
bounds (especially Chebyshevs inequality) will be employed in Chapter 8,
and the special functions of Section 4.5 will be used extensively in their more
natural setting in Chapter 23.
The task of building an ecient machinery for random phenomena analysis, which began in Chapter 3, is now almost complete. But before the generic
pdf, f (x), introduced and characterized in this chapter begins to take on
specic, distinct personalities for various random phenomena, some residual
issues remain to be addressed in order to complete the development of the
probability machinery. Specically, the discussion in this chapter will be extended to higher dimensions in Chapter 5, and the characteristics of functions
of random variables will be explored in Chapter 6. Chapter 7 is devoted to
two application case studies that put the complete set of discussions in Part
II in perspective.
Here are some of the main points of the chapter again.
Formally, the random variable, Xdiscrete or continuousassigns to
each element , one and only one real number, X() = x, thereby
mapping onto a new space, V ; informally it is an experimental outcome whose numerical value is subject to random variations with each
exact replicate trial of the experiment.
The introduction of the random variable, X, leads directly to the emergence of f (x), the probability distribution function; it represents how the
probabilities of occurrence of all the possible outcomes of the random
experiment of interest are distributed over the entire random variable
space, and is a direct extension of P (A).
126
Random Phenomena
dF (x)
dx
= f (x).
It exists only when i |xi |f (xi ) < (absolute convergence for discrete
127
REVIEW QUESTIONS
1. Why is the raw sample space, , often tedious to describe and inecient to analyze mathematically?
2. Through what means is the general sample space converted into a space with real
numbers?
3. Formally, what is a random variable?
4. What two mathematical transformations occur as a consequence of the formal
introduction of the random variable, X?
5. How is the induced probability set function, PX , related to the probability set
function, P , dened on ?
6. What is the pre-image, A , of the set A?
7. What is the relationship between the random variable, X, and the associated real
number, x? What does the expression, P (X = x) indicate?
8. When does the sample space, , naturally occur in the form of the random variable space, V ?
9. Informally, what is a random variable?
10. What is the dierence between a discrete random variable and a continuous one?
11. What is the pdf, f (x), and what does it represent for the random variable, X?
12. What is the relationship between the pdf, f (xi ), and the cdf, F (xi ), for a discrete random variable, X?
13. What is the relationship between the pdf, f (x), and the cdf, F (x), for a continuous random variable, X?
14. Dene mathematically the expected value, E(X), for a discrete random variable
and for a continuous one.
15. What conditions must be satised for E(X) to exist?
16. Is E(X) a random variable and does it have units?
17. What is the relationship between the expected value, E(X), and the mean value,
of a random variable (or equivalently, of its distribution)?
18. Distinguish between ordinary moments and central moments of a random variable.
128
Random Phenomena
19. What are the common names by which the second, third and fourth central
moments of a random variable are known?
20. What is Cv , the coecient of variation of a random variable?
21. What is the distinguishing characteristic of a skewed distribution (positive or
negative)?
22. Give an example each of a negatively skewed and a positively skewed randomly
varying phenomenon.
23. What do the mean, variance, skewness, and kurtosis tell us about the distribution of the random variable in question?
24. What do Mn , Mw , and Mz represent for a polymer material?
25. What is the polydispersity index of a polymer and what does it indicate about
the molecular weight distribution?
26. Dene the moment generating function (MGF) of a random variable, X. Why
is it called by this name?
27. What is the uniqueness property of the MGF?
28. Dene the characteristic function of a random variable, X. What distinguishes
it from the MGF?
29. How are the MGF and characteristic function (CF) of a random variable related
to the Laplace and Fourier transforms?
30. Dene the mode, median, quartiles and percentiles of a random variable.
31. Within the context of this chapter, what is Entropy?
32. Dene Markovs inequality. It allows us to place probability bounds when what
is known about the random variable?
33. Dene Chebychevs inequality.
34. Which probability bound is sharper, the one provided by Markovs inequality
or the one provided by Chebychevs?
35. What are the dening characteristics of those random variables for which the special probability functions, the survival and hazard functions, are applicable? These
functions are used predominantly in studying what types of phenomena?
36. Dene the survival function, S(x). How is it related to the cdf, F (x)?
129
37. Dene the hazard function, h(x). How is it related to the pdf, f (x)?
38. Dene the cumulative hazard function, H(x). How is it related to the cdf, F (x),
and the survival function, S(x)?
EXERCISES
Section 4.1
4.1 Consider a family that plans to have a total of three children; assuming that
they will not have any twins, generate the sample space, , for the possible outcomes. By dening the random variable, X as the total number of female children
born to this family, obtain the corresponding random variable space, V . Given that
this particular family is genetically predisposed to having boys, with a probability,
p = 0.75 of giving birth to a boy, obtain the probability that this family will have
three boys and compare it to the probability of having other combinations.
4.2 Revisit Example 4.1 in the text, and this time, instead of tossing a coin three
times, it is tossed 4 times. Generate the sample space, ; and using the same denition of X as the total number of tails, obtain the random variable space, V , and
compute anew the probability of A, the event that X = 2.
4.3 Given the spaces and V for the double dice toss experiment in Example 4.3
in the text,
(i) Compute the probability of the event A that X = 7;
(ii) If B is the event that X = 6, and C the event that X = 10 or X = 11, compute
P (B) and P (C).
Section 4.2
4.4 Revisit Example 4.3 in the text on the double dice toss experiment and obtain
the complete pdf f (x) for the entire random variable space. Also obtain the cdf,
F (x). Plot both distribution functions.
4.5 Given the following probability distribution function for a discrete random variable, X,
x
f (x)
1
0.10
2
0.25
3
0.30
4
0.25
5
0.10
x k
n
; x = 1, 2, . . . , n
(4.137)
130
Random Phenomena
(4.138)
(i) First obtain the value of the constant, c, required for this to be a legitimate pdf,
and then obtain an expression for the cdf F (x).
(ii) Obtain P (X 1/2) and P (X 1/2).
(iii) Obtain the value xm such that
P (X xm ) = P (X xm )
(4.139)
4.8 From the distribution of residence times in an ideal CSTR is given in Eq (4.41),
determine, for a reactor with average residence time, = 30 mins, the probability
that a reactant molecule (i) spends less than 30 mins in the reactor; (ii) spends more
than 30 mins in the reactor; (iii) spends less than (30 ln 2) mins in the reactor; and
(iv) spends more than (30 ln 2) mins in the reactor.
Section 4.3
4.9 Determine E(X) for the discrete random variable in Exercise 4.5; for the continuous random variable in Exercise 4.6; and establish that E(X) for the residence
time distribution in Eq (4.41) is , thereby justifying why this parameter is known
as the mean residence time.
4.10 (Adapted from Stirzaker, 20031 ) Show that E(X) exists for the discrete random
variable, X, with the pdf:
f (x) =
4
; x = 1, 2, . . .
x(x + 1)(x + 2)
(4.140)
while E(X) does not exist for the discrete random random variable with the pdf
f (x) =
1
; x = 1, 2, . . .
x(x + 1)
(4.141)
4.11 Establish that E(X) = 1/p for a random variable X whose pdf is
f (x) = p(1 p)x1 ; x = 1, 2, 3, . . .
(4.142)
p(1 p)x1 = 1
(4.143)
x=1
4.12 From the denition of the mathematical expectation function, E(.), establish
that for the random variable, X, discrete or continuous:
E[k1 g1 (X) + k2 g2 (X)] = k1 E[g1 (X)] + k2 E[g2 (X)],
(4.144)
(4.145)
131
E(X) E(Y )
i.e., Z
X Y
(4.147)
and that
V ar(Z) = V ar(X) + V ar(Y )
(4.148)
when E[(X X )(Y Y )] = 0 (i.e., when X and Y are independent: see Chapter 5).
4.14 Given that the pdf of a certain discrete random variable X is:
f (x) =
x e
; x = 0, 1, 2, . . .
x!
(4.149)
f (x)
(4.150)
E(X)
V ar(X)
(4.151)
(4.152)
x=0
4.15 Obtain the variance and skewness of the discrete random variable in Exercise
4.5 and for the continuous random variable in Exercise 4.6. Which random variables
distribution is skewed and which is symmetric?
4.16 From the formal denitions of the moment generating function, establish Eqns
(4.95) and (4.96).
4.17 Given the pdf for the residence time for two identical CSTRs in series as
f (x) =
1 x/
xe
2
(4.153)
(i) obtain the MGF for this pdf and compare it with that derived in Example 4.7 in
the text. From this comparison, what would you conjecture to be the MGF for the
distribution of residence times for n identical CSTRs in series?
(ii) Obtain the characteristic function for the pdf in Eq (4.41) for the single CSTR
and also for the pdf in Eq (4.153) for two CSTRs. Compare the two characteristic
functions and conjecture what the corresponding characteristic function will be for
the distribution of residence times for n identical CSTRs in series.
4.18 Given that M (t) is the moment generating function of a random variable,
dene the psi-function, (t), as:
(t) = ln M (t)
(4.154)
132
Random Phenomena
(i) Prove that (0) = , and (0) = 2 , where each prime
indicates dierentiation
with respect to t; and E(X) = , is the mean of the random variable, and 2 is the
variance, dened by 2 = V ar(X) = E[(X )2 ].
(ii) Given the pdf of a discrete random variable X as:
f (x) =
x e
; x = 0, 1, 2, . . .
x!
obtain its (t) function and show, using the results in (i) above, that the mean and
variance of this pdf are identical.
4.19 The pdf for the yield data discussed in Chapter 1 was postulated as
f (y) =
(y)2
1
e 22 ; < y <
2
(4.155)
If we are given that is the mean, rst establish that the mode is also , and then
use the fact that the distribution is perfectly symmetric about to establish that
median is also , hence conrming that for this distribution, the mean, mode and
median coincide.
4.20 Given the pdf:
1 1
; < x <
(4.156)
1 + x2
nd the mode and the median and show that they coincide. For extra credit:
Establish that = E(X) does not exist.
f (x) =
4.21 Compute the median and the other quartiles for the random variable whose
pdf is given as:
x 0<x<2
f (x) =
(4.157)
0 otherwise
4.22 Given the binary random variable, X, that takes the value 1 with probability
p, and the value 0 with probability (1 p), so that its pdf is given by
1 p x = 0;
p
x = 1;
(4.158)
f (x) =
0
elsewhere.
obtain an expression for the entropy H(X) and show that it is maximized when
p = 0.5, taking on the value H (X) = 1 at this point.
Section 4.5
4.23 First show that the cumulative hazard function, H(x), for the random variable,
X, the residence time in a CSTR is the linear function,
H(x) = x
(4.159)
133
(4.161)
and from here obtain the pdf, f (y), for this random variable.
4.24 Given the pdf for the residence time for two identical CSTRs in series in Exercise 4.17, Eq (4.153), determine the survival function, S(x), and the hazard function,
h(x). Compare them to the corresponding results obtained for the single CSTR in
Example 4.8 and Example 4.9 in the text.
APPLICATION PROBLEMS
4.25 Before an automobile parts manufacturer takes full delivery of polymer resins
made by a supplier in a reactive extrusion process, a sample is processed and the
performance is tested for Toughness. The batch is either accepted (if the processed
samples Toughness equals or exceeds 140 J/m3 ) or it is rejected. As a result of
process and raw material variability, the acceptance/rejection status of each batch
varies randomly. If the supplier sends four batches weekly to the parts manufacturer, and each batch is made independently on the extrusion process, so that the
ultimate fate of one batch is independent of the fate of any other batch, dene X
as the random variable representing the number of acceptable batches a week and
answer the following questions:
(i) Obtain the sample space, , and the corresponding random variable space, V .
(ii) First, assume equal probability of acceptance and rejection, and obtain the the
pdf, f (x), for the entire sample space. If, for long term protability it is necessary
that at least 3 batches be acceptable per week, what is the probability that the
supplier will remain protable?
4.26 Revisit Problem 4.25 above and consider that after an extensive process and
control system improvement project, the probability of acceptance of a single batch
is improved to 0.8; obtain the new pdf, f (x). If the revenue from a single acceptable
batch is $20,000, but every rejected batch costs the supplier $8,000 in retrieval and
incineration fees, which will be deducted from the revenue, what is the expected net
revenue per week under the current circumstances?
4.27 A gas station situated on a back country road has only one gasoline pump
and one attendant and, on average, receives = 3 (cars/hour). The average rate at
which this lone attendant services the cars is (cars/hour). It can be shown that
the total number of cars at this gas station at any time (i.e. the one currently being
served, and those waiting in line to be served) is the random variable X with the
following pdf:
x
; x = 0, 1, 2, . . .
(4.162)
f (x) = 1
(i) Show that so long as < , the probability that the line at the gas station is
innitely long is zero.
(ii) Find the value of required so that the expected value of the total number of
134
Random Phenomena
Percent of Population
with income level, x
4
13
17
20
16
12
7
4
3
2
1
1
1 x/
e
;x > 0
(4.163)
135
no more than 15% would have to be replaced before the warranty period expires.
Find xw .
(ii) In planning for the second generation toaster, design engineers wish to set a target value to aim for ( = 2 ) such that 85% of the second generation chips survive
beyond 3 years. Determine 2 and interpret your results in terms of the implied
fold increase in mean life-span from the rst to the second generation of chips.
4.30 The probability of a single transferred embryo resulting in a live birth in an
in-vitro fertilization treatment, p, is given as 0.5 for a younger patient and 0.2 for
an older patient. When n = 5 embryos are transferred in a single treatment, it is
also known that if X is the total number of live births resulting from this treatment,
then E(X) = 2.5 for the younger patient and E(X) = 1 for the older patient, and
the associated variance, V ar(X) = 1.25 for the younger and V ar(X) = 0.8 for the
older.
(i) Use Markovs inequality and Chebyshevs inequality to obtain bounds on the
probability of each patient giving birth to quadruplets or a quintuplets at the end
of the treatment.
(ii) These bounds are known to be quite conservative, but to determine just how
conservative , compute the actual probabilities of the stated events for each patient
given that an appropriate pdf for X is
f (x) =
5!
px (1 p)5x
x!(5 x)!
(4.164)
where p is as given above. Compare the actual probabilities with the Markov and
Chebychev bounds and identify which bound is sharper.
4.31 The following data table, obtained from the United States Life Tables 196971,
(published in 1973 by the National Center for Health Statistics) shows the probability of survival until the age of 65 for individuals of the given age2 .
Age
y
0
10
20
30
35
40
45
50
55
60
Prob of survival
to age 65
0.72
0.74
0.74
0.75
0.76
0.77
0.79
0.81
0.85
0.90
The data should be interpreted as follows: the probability that all newborns, and
children up to the age of ten survive until 65 years of age is 0.72; for those older
than 10 and up to 20 years, the probability of survival to 65 years is 0.74, and so on.
Assuming that the data is still valid in 1975, a community cooperative wishes to
2 More up-to-date versions, available, for example, in National Vital Statistics Reports,
Vol. 56, No. 9, December 28, 2007 contain far more detailed information.
136
Random Phenomena
set up a life insurance program that year whereby each participant pays a relatively
small annual premium, $, and, in the event of death before 65 years, a one-time
death gratuity payment of $ is made to the participants designated beneciary.
If the participant survives beyond 65 years, nothing is paid. If the cooperative is
to realize a xed, modest expected revenue, $RE = $30, per year, per participant,
over the duration of his/her participation (mostly to cover administrative and other
costs) provide answers to the following questions:
(i) For a policy based on a xed annual premium of $90 for all participants, and
age-dependent payout, determine values for (y), the published payout for a person
of age y that dies before age 65, for all values of y listed in this table.
(ii) For a policy based instead on a xed death payout of $8, 000, and age-dependent
annual premiums, determine values for (y), the published annual premium to be
collected each year from a participant of age y.
(iii) If it becomes necessary to increase the expected revenue by 50% as a result
of increased administrative and overhead costs, determine the eect on each of the
policies in (i) and (ii) above.
(iv) If by 1990, the probabilities of survival have increased across the board by 0.05,
determine the eect on each of the policies in (i) and (ii).
Chapter 5
Multidimensional Random Variables
5.1
5.2
5.3
5.4
137
138
139
140
141
141
144
147
152
153
154
155
156
157
158
163
164
166
168
When the outcome of interest in an experiment is not one, but two or more
variables simultaneously, additional issues arise that are not fully addressed
by the probability machinery as it stands at the end of the last chapter. The
concept of the random variable, restricted as it currently is to the single, onedimensional random variable X, needs to be extended to higher dimensions;
and doing so is the sole objective of this chapter. With the introduction of a
few new concepts, new varieties of the probability distribution function (pdf)
emerge along with new variations on familiar results; together, they expand
and supplement what we already know about random variables and bring to
a conclusion the discussion we started in Chapter 4.
137
138
5.1
5.1.1
Random Phenomena
139
5.1.2
As with the single random variable case, associated with this twodimensional random variable is a space, V , and a probability set function
PX induced by X = (X1 , X2 ), where V is dened as:
V = {(x1 , x2 ) : X1 () = x1 , X2 () = x2 ; }
(5.1)
The most important point to note at this point is that the random variable
space V involves X1 and X2 simultaneously; it is not merely a union of separate spaces V1 for X1 and V2 for X2 .
An example of a bivariate random variable was presented in Example 4.4
in Chapter 4; here is another.
Example 5.1 BIVARIATE RANDOM VARIABLE AND INDUCED PROBABILITY FUNCTION FOR COIN TOSS EXPERIMENT
Consider an experiment involving tossing a coin 2 times and recording
the number of observed heads and tails: (1) Obtain the sample space ;
and (2) Dene X as a two-dimensional random variable (X1 , X2 ) where
X1 is the number of heads obtained in the rst toss, and X2 is the number of heads obtained in the second toss. Obtain the new space V . (3)
Assuming equiprobable outcomes, obtain the induced probability PX .
140
Random Phenomena
Solution:
(1) From the nature of the experiment, the required sample space, , is
given by
= {HH, HT, T H, T T }
(5.2)
consisting of all 4 possible outcomes, which may be represented respectively, as i ; i = 1, 2, 3, 4, so that
= {1 , 2 , 3 , 4 }.
(5.3)
(5.4)
since these are all the possible values that the two-dimensional X can
take.
(3) This is a case where there is a direct one-to-one mapping between
the 4 elements of the original sample space and the induced random
variables space V ; as such, for equiprobable outcomes, we obtain,
PX (1, 1)
1/4
PX (1, 0)
1/4
PX (0, 1)
1/4
PX (0, 0)
1/4
(5.5)
In making sense of the formal denition given here for the bivariate (2dimensional) random variable, the reader should keep in mind the practical
considerations presented in Chapter 4 for the single random variable. The same
issues there apply here. In a practical sense, the bivariate random variable
may be considered simply, if informally, as an experimental outcome with two
components, each with numerical values that are subject to random variations
with exact replicate performance of the experiment.
For example, consider a polymer used for packaging applications, for which
the quality measurements of interest are melt index (indicative of the molecular weight distribution), and density (indicative of co-polymer composition). With each performance of lab analysis on samples taken from the manufacturing process, the values obtained for each of these quantities are subject
to random variations. Without worrying so much about the original sample
space or the induced one, we may consider the packaging polymer quality characteristics directly as the two-dimensional random variable whose components
are melt index (as X1 ), and density (as X2 ).
We now note that it is fairly common for many textbooks to use X and Y
to represent bivariate random variables. We choose to use X1 and X2 because
it oers a notational convenience that facilitates generalization to n > 2.
5.1.3
141
5.2
5.2.1
1/4; x1 = 1, x2
1/4; x1 = 1, x2
1/4; x1 = 0, x2
f (x1 , x2 ) =
1/4; x1 = 0, x2
0;
otherwise
=1
=0
=1
=0
(5.7)
showing how the probabilities are distributed over the 2-dimensional random variable space, V . Once again, we note the following about the function
f (x1 , x2 ):
f (x1 , x2 ) > 0; x1 , x2
x1
x2 f (x1 , x2 ) = 1
142
Random Phenomena
These results are direct extensions of the axiomatic statements given earlier
for the discrete single random variable pdf.
The probability that both X1 < x1 and X2 < x2 is given by the cumulative
distribution function,
F (x1 , x2 ) = P (X1 < x1 , X2 < x2 )
(5.8)
F (x1 , x2 )
x1 x2
(5.9)
is called the joint probability density function for the continuous twodimensional random variables X1 and X2 . As with the discrete case, the formal
properties of the continuous joint pdf are:
1. f (x1 , x2 ) 0; x1 , x2 V ;
2. f has at most a nite number of discontinuities in every nite interval
in V ;
3. The double integral, x1 x2 f (x1 , x2 )dx1 dx2 = 1;
4. PX (A) = A f (x1 , x2 )dx1 , dx2 ; for A V
143
Thus,
P (a1 X1 a2 ; b1 X2 b2 ) =
b2
a2
(5.10)
a1
0
elsewhere
(1) Establish that this is a legitimate pdf; and (2) obtain the probability
that the system lasts more than two years; (3) obtain the probability
that the electronic component functions for more than 5 years and the
control valve for more than 10 years.
Solution:
(1) If this is a legitimate joint pdf, then the following should hold:
f (x1 , x2 )dx1 dx2 = 1
(5.12)
0
=
=
1
5e0.2x1 0
10e0.1x2 0
50
1
(5.13)
(5.15)
Thus, the probability that the system lasts beyond the rst two years
144
Random Phenomena
is 0.549.
(3) The required probability, P (X1 > 5; X2 > 10) is obtained as:
1 (0.2x1 +0.1x2 )
e
dx1 dx2
P (X1 > 5; X2 > 10) =
50
10
5
e0.1x2 10 = (0.368)2
=
e0.2x1 5
=
0.135
(5.16)
5.2.2
Marginal Distributions
Consider the joint pdf f (x1 , x2 ) for the 2-dimensional random variable
(X1 , X2 ); it represents how probabilities are jointly distributed over the entire
(X1 , X2 ) plane in the random variable space. Were we to integrate over the
entire range of X2 (or sum over the entire range in the discrete case), what is
left is the following function of x1 in the continuous case:
f1 (x1 ) =
f (x1 , x2 )dx2
(5.17)
f (x1 , x2 )
(5.18)
x2
This function, f1 (x1 ), characterizes the behavior of X1 alone, by itself, regardless of what is going on with X2 .
Observe that, if one wishes to determine P (a1 < X1 < a2 ) with X2 taking
any value, by denition, this probability is determined as:
a2
f (x1 , x2 )dx2 dx1
(5.19)
P (a1 < X1 < a2 ) =
a1
an expression that is reminiscent of probability computations for single random variable pdfs.
145
f (x1 , x2 )dx1 ,
(5.21)
obtained by integrating out X1 from the joint pdf of X1 and X2 ; or, in the
discrete case, it is:
f2 (x2 ) =
f (x1 , x2 )
(5.22)
x1
These pdfs, f1 (x1 ) and f2 (x2 ), respectively represent the probabilistic characteristics of each random variable X1 and X2 considered in isolation, as opposed to f (x1 , x2 ) that represents the joint probabilistic characteristics when
considered together. The formal denitions are given as follows:
and
f2 (x2 ) =
f (x1 , x2 )dx1
(5.24)
for continuous random variables, and, for discrete random variables, as the functions:
f1 (x1 ) =
f (x1 , x2 )
(5.25)
x2
and
f2 (x2 ) =
f (x1 , x2 )
(5.26)
x1
Each marginal pdf possesses all the usual properties of pdfs, i.e., for continuous random variables,
146
Random Phenomena
with the integrals are replaced with sums for the discrete case. An illustrative example follows.
Example 5.3 MARGINAL DISTRIBUTIONS OF CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the marginal distributions of the joint pdfs given in Example 5.2
for characterizing the reliability of the commercial polymer reactors
temperature control system. Recall that the component random variables are X1 , the lifetimes (in years) of the control hardware electronics,
and X2 , the lifetime of the control valve on the cooling water line; the
joint pdf is as given in Eq (5.11):
f (x1 , x2 ) =
1 (0.2x1 +0.1x2 )
e
;
50
0 < x1 <
0 < x2 <
elsewhere
Solution:
(1) For this continuous bivariate random variable, we have from Eq
(5.17) that:
1 (0.2x1 +0.1x2 )
dx2
e
f1 (x1 ) =
50
0
1 0.2x1
1
=
e0.1x2 dx2 = e0.2x1
(5.27)
e
50
5
0
Similarly, from Eq (5.21), we have,
1 (0.2x1 +0.1x2 )
f2 (x2 ) =
dx1
e
50
0
1 0.1x2
1 0.1x2
=
e0.2x1 dx1 =
e
e
50
10
0
(5.28)
As an exercise, the reader should conrm that each of these marginal distributions is a legitimate pdf in its own right.
These ideas extend directly to n > 2 random variables whose joint pdf
is given by f (x1 , x2 , , xn ). There will be n separate marginal distributions
fi (xi ); i = 1, 2, , n, each obtained by integrating (or summing) out every
other random variable except the one in question, i.e.,
f1 (x1 ) =
147
(5.30)
It is important to note that when n > 2, marginal distributions themselves
can be multivariate. For example, f12 (x1 , x2 ) is what is left of the joint pdf
f (x1 , x2 , , xn ) after integrating (or summing) over the remaining (n 2)
variables; it is a bivariate pdf of the two surviving random variables of interest.
The concepts are simple and carry over directly; however, the notation can
become quite confusing if one is not careful. We shall return to this point a
bit later in this chapter.
5.2.3
Conditional Distributions
If the joint pdf f (x1 , x2 ) of a bivariate random variable provides a description of how the two component random variables vary jointly; and if the
marginal distributions f1 (x1 ) and f2 (x2 ) describe how each random variable
behaves by itself, in isolation, without regard to the other; there remains yet
one more characteristic of importance: a description of how X1 behaves for
given specic values of X2 , and vice versa, how X2 behaves for specic values
of X1 (i.e., the probability distribution of X1 conditioned upon X2 taking on
specic values, and vice versa). Such conditional distributions are dened
as follows:
f (x1 , x2 )
; f2 (x2 ) > 0
f2 (x2 )
(5.31)
The similarity between these equations and the expression for conditional
probabilities of events dened as sets, as given in Eq (3.40) of Chapter 3
P (A|B) =
P (A B)
P (B)
(5.33)
148
Random Phenomena
the numerator of which is recognized from Eq (5.21) as the marginal distribution of X2 so that:
f2 (x2 )
=1
(5.35)
f (x1 |x2 )dx1 =
f2 (x2 )
The same result holds for f (x2 |x1 ) in Eq (5.32) when integrated with respect
of x2 ; and, by replacing the integrals with sums, we obtain identical results
for the discrete case.
Example 5.4 CONDITIONAL DISTRIBUTIONS OF CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the conditional distributions of the 2-dimensional random variables given in Example 5.2 for the reliability of a temperature control
system.
Solution:
Recall from the previous examples that the joint pdf is:
1 (0.2x +0.1x )
1
2
; 0 < x1 <
50 e
f (x1 , x2 ) =
0 < x2 <
0
elsewhere
Recalling the result obtained in Example 5.3 for the marginal pdfs
f1 (x1 ) and f2 (x2 ), the desired conditional pdfs are given as follows:
f (x1 |x2 )
1 (0.2x1 +0.1x2 )
e
50
1 0.1x2
e
10
1 0.2x1
e
5
(5.36)
1 (0.2x1 +0.1x2 )
e
50
1 0.2x1
e
5
1 0.1x2
e
10
(5.37)
The reader may have noticed two things about this specic example: (i)
f (x1 |x2 ) is entirely a function of x1 alone, containing no x2 whose value is to
be xed; the same is true for f (x2 |x1 ) which is entirely a function of x2 , with
no dependence on x1 . (ii) In fact, not only is f (x1 |x2 ) a function of x1 alone;
it is precisely the same function as the unconditional marginal pdf f1 (x1 )
obtained earlier. The same is obtained for f (x2 |x1 ), which also turns out to
149
f(x1,x2)
1
2.0
0
1.5
0.0
0.5
x2
1.0
x1
1.0
FIGURE 5.1: Graph of the joint pdf for the 2-dimensional random variable of Example
5.5
be the same as the unconditional marginal pdf f2 (x2 ) obtained earlier. Such
circumstances do not always occur for all 2-dimensional random variables, as
the next example shows; but the special cases where f (x1 |x2 ) = f1 (x1 ) and
f (x2 |x1 ) = f2 (x2 ) are indicative of a special relationship between the two
random variables X1 and X2 , as discussed later in this chapter.
Example 5.5 CONDITIONAL DISTRIBUTIONS OF ANOTHER CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the conditional distributions of the 2-dimensional random variables whose joint pdf is given as follows:
x1 x2 ; 1 < x1 < 2
(5.38)
0 < x2 < 1
f (x1 , x2 ) =
0
elsewhere
shown graphically in Fig 5.1.
Solution:
To nd the conditional distributions, we must rst nd the marginal
distributions. (As an exercise, the reader may want to conrm that this
joint pdf is a legitimate pdf.) These marginal distributions are obtained
as follows:
1
1
x2
(x1 x2 )dx2 = x1 x2 2
(5.39)
f1 (x1 ) =
2
0
0
which simplies to give:
f1 (x1 ) =
(x1 0.5);
0;
1 < x1 < 2
elsewhere
(5.40)
150
Random Phenomena
Similarly,
f2 (x2 ) =
2
1
(x1 x2 )dx1 =
2
x21
x1 x2
2
1
(5.41)
(1.5 x2 );
0;
0 < x2 < 1
elsewhere
(5.42)
Again the reader may want to conrm that these marginal pdfs are
legitimate pdfs.
With these marginal pdfs in hand, we can now determine the required conditional distributions as follows:
f (x1 |x2 ) =
(x1 x2 )
; 1 < x1 < 2;
(1.5 x2 )
(5.43)
f (x2 |x1 ) =
(x1 x2 )
; 0 < x2 < 1;
(x1 0.5)
(5.44)
and
(The reader should be careful to note that we did not explicitly impose
the restrictive conditions x2 = 1.5 and x1 = 0.5 in the expressions given
above so as to exclude the respective singularity points for f (x1 |x2 ) and
for f (x2 |x1 ). This is because the original space over which the joint
distribution f (x1 , x2 ) was dened, V = {(x1 , x2 ) : 1 < x1 < 2; 0 < x2 <
1}, already excludes these otherwise troublesome points.)
Observe now that these conditional distributions show mutual dependence of x1 and x2 , unlike in Example 5.4. In particular, say for
x2 = 1 (the rightmost edge of the x2 -axis of the plane in Fig 5.1), the
conditional pdf f (x1 |x2 ) becomes:
f (x1 |x2 = 1) = 2(x1 1); 1 < x1 < 2;
(5.45)
whereas, for x2 = 0 (the leftmost edge of the x2 -axis of the plane in Fig
5.1), this conditional pdf becomes
f (x1 |x2 = 0) =
2x1
; 1 < x1 < 2;
3
(5.46)
Similar arguments can be made for f (x2 |x1 ) and are left as an exercise
for the reader.
The following example provides a comprehensive illustration of these distributions specically for a discrete bivariate random variable.
Example 5.6 DISTRIBUTIONS OF DISCRETE BIVARIATE
RANDOM VARIABLE
An Apple computer store in a small town stocks only three types of
hardware components: low-end, mid-level and high-end, selling
respectively for $1600, $2000 and $2400; it also only stocks two types
of monitors, the 20-inch type, selling for $600, and the 23-inch type,
TABLE 5.1:
pdf for computer
sales
X2 $600
X1
$1600 0.30
$2000 0.20
$2400 0.10
Joint
store
$900
0.25
0.10
0.05
(1) Show that f (x1 , x2 ) is a legitimate pdf and nd the sales combination (x1 , x2 ) with the highest probability, and the one with the lowest
probability.
(2) Obtain the marginal pdfs f1 (x1 ) and f2 (x2 ), and from these
compute P (X1 = $2000), regardless of X2 , (i.e., the probability of selling
a mid-level hardware component regardless of the monitor paired with
it). Also obtain P (X2 = $900) regardless of X1 , (i.e., the probability of
selling a 23-inch monitor, regardless of the hardware component with
which it is paired).
(3) Obtain the conditional pdfs f (x1 |x2 ) and f (x2 |x1 ) and determine
the highest value for each conditional probability; describe in words
what each means.
Solution:
(1) If f (x1 , x2 ) is a legitimate pdf, then it must hold that
f (x1 , x2 ) = 1
(5.47)
x2
x1
From the joint pdf shown in the table, this amounts to adding up all
the 6 entries, a simple arithmetic exercise that yields the desired result.
The combination with the highest probability is seen to be X1 =
$1600; X2 = $400 since P (X1 = $1600; X2 = $400) = 0.3; i.e., the
probability is highest (at 0.3) that any customer chosen at random
would have purchased the low-end hardware (for $1600) and the 20inch monitor (for $600). The lowest probability of 0.05 is associated
with X1 = $2400 and X2 = $900, i.e., the combination of a high-end
hardware component and a 23-inch monitor.
(2) By denition, the marginal pdf f1 (x1 ) is given by:
f1 (x1 ) =
f (x1 , x2 )
(5.48)
x2
151
152
Random Phenomena
so that, from the table, f1 (1600) = 0.3 + 0.25 = 0.55; similarly,
f1 (2000) = 0.30 and f1 (2400) = 0.15. In the same manner, the values for f2 (x2 ) are obtained as f2 (600) = 0.30 + 0.20 + 0.10 = 0.60,
and f2 (900) = 0.4. These values are combined with the original joint
pdf into a new Table 5.2 to provide a visual representation of the relationship between these distributions. The required probabilities are
TABLE 5.2:
pdfs for computer
X2
X1
$1600
$2000
$2400
f2 (x2 )
f1 (2000) = 0.30
(5.49)
P (X2 = $900)
f2 (900) = 0.40
(5.50)
f (x1 , x2 )
f (x1 , x2 )
; and f (x2 |x1 ) =
f2 (x2 )
f1 (x1 )
(5.51)
and upon carrying out the indicated divisions using the numbers contained in Table 5.2, we obtain the result shown in Table 5.3 for f (x1 |x2 ),
and in Table 5.4 for f (x2 |x1 ). From these tables, we obtain the highest conditional probability for f (x1 |x2 ) as 0.625, corresponding to the
probability of a customer buying the low end hardware component
(X1 = $1600) conditioned upon having bought the 23-inch monitor
(X2 = $900); i.e., in the entire population of those who bought the 23inch monitor, the probability is highest at 0.625 that a low-end hardware
component was purchased to go along with the monitor. When the conditioning variable is the hardware component, the highest conditional
probability f (x2 |x1 ) is a tie at 0.667 for customers buying the 20-inch
monitor (X2 = $600) conditioned upon buying the mid-range hardware
(X1 = $2000), and those buying the high-end hardware (X1 = $2400).
TABLE 5.3:
153
TABLE 5.4:
5.2.4
General Extensions
f (x1 , x2 , x3 , x4 , x5 )
f23 (x2 , x3 )
(5.52)
where f23 (x2 , x3 ) is the bivariate joint marginal pdf for cholesterol level. We
see therefore that the principles transfer quite directly, and, when dealing with
specic cases in practice (as we have just done), there is usually no confusion.
The challenge is how to generalize without confusion.
To present the results in a general fashion and avoid confusion requires
adopting a dierent notation: using the vector X to represent the entire collection of random variables, i.e., X = (X1 , X2 , , Xn ), and then partitioning
this into three distinct vectors: X , the variables of interest (X4 , X5 in the
Avandia example given above); Y, the conditioning variables (X2 , X3 in the
Avandia example), and Z, the remaining variables, if any. With this notation,
we now have
f (x , y, z)
(5.53)
f (x |y) =
fy (y)
as the most general multivariate conditional distribution.
154
Random Phenomena
5.3
The concepts of mathematical expectation and moments used to characterize the distribution of single random variables in Chapter 4 can be extended to
multivariate, jointly distributed random variables. Even though we now have
many more versions of pdfs to consider (joint, marginal and conditional), the
primary notions remain the same.
5.3.1
Expectations
(5.54)
a direct extension of the single variable denition. The discrete counterpart
is:
E[U (X)] =
U (x1 , x2 , , xn )f (x1 , x2 , , xn )
(5.55)
x1
x2
xn
(5.56)
155
(5.59)
The immediate implication is that the expected lifetime dierential favors the control valve (lifetime X2 ) so that the control hardware electronic component is expected to fail rst, with the control valve expected
to outlast it by 5 years.
Example 5.8 EXPECTATIONS OF DISCRETE BIVARIATE
RANDOM VARIABLE
From the joint pdf given in Example 5.6 for the Apple computer store
sales, obtain the expected revenue from each recorded sale.
Solution:
Recall that for this problem, the random variables of interest are X1 ,
the cost of the computer hardware component, and X2 , the cost of the
monitor in each recorded sale. The appropriate function U (X1 , X2 ), in
this case is
(5.60)
U (X1 , X2 ) = X1 + X2
the total amount of money realized on each sale. By the denition of
expectations for the discrete bivariate random variable, we have
E[U (X1 , X2 )] =
(x1 + x2 )f (x1 , x2 )
(5.61)
x2
x1
x2
(5.62)
In the special case where U (X) = e(t1 X1 +t2 X2 ) , the expectation, E[U (X)]
is the joint moment generating function, M (t1 , t2 ), for the bivariate random
variable X = (X1 , X2 ) dened by
(t X +t X )
e 1 1 2 2 f (x1 , x2 )dx1 dx2 ;
(t1 X1 +t2 X2 )
]=
M (t1 , t2 ) = E[e
(t1 X1 +t2 X2 )
f (x1 , x2 );
x1
x2 e
(5.63)
for the continuous and the discrete cases, respectively an expression that
generalizes directly for the n-dimensional random variable.
156
Random Phenomena
Marginal Expectations
Recall that for the general n-dimensional random variable X =
(X1 , X2 , Xn ), the single variable marginal distribution fi (xi ) is the distribution of the component random variable Xi alone, as if the others did not
exist. It is therefore similar to the single random variable pdf dealt with extensively in Chapter 4. As such, the marginal expectation of U (Xi ) is precisely
as dened in Chapter 4, i.e.,
U (xi )fi (xi )dxi
(5.64)
E[U (Xi )] =
(5.65)
xi
(5.68)
(5.69)
are, respectively, the marginal MGFs for f1 (x1 ) and for f2 (x2 ).
Keep in mind that in the general case, marginal distributions can be multivariate; in this case, the context of the problem at hand will make clear what
such a joint-marginal distribution will look like after the remaining variables
have been integrated out.
Conditional Expectations
As in the discussion about conditional distributions, it is best to deal with
the bivariate conditional expectations rst. For the bivariate random variable
157
X = (X1 , X2 ), the conditional expectation E[U (X1 )|X2 ] (i.e the expectation
of the function U (X1 ) conditioned upon X2 = x2 ) is obtained from the conditional distribution as follows:
U (x1 )f (x1 |x2 )dx1 ; continuous X
(5.70)
E[U (X1 )|X2 ] =
U
(x
)f
(x
|x
);
discrete
X
1
1 2
x1
with a corresponding expression for E[U (X2 )|X1 ] based on the conditional
distribution f (x2 |x1 ). In particular, when U (X1 ) = X1 (or, U (X2 ) = X2 ), the
result is the conditional mean dened by:
x1 f (x1 |x2 )dx1 ; continuous X
xi
x1 f (x1 |x2 );
2
X
1 |x2
(5.72)
as:
(x1 X1 |x2 )2 f (x1 |x2 )dx1 ;
(5.71)
discrete X
2
x1 (x1 X1 |x2 ) f (x1 |x2 );
(5.73)
5.3.2
(5.75)
158
Random Phenomena
where 1 and 2 are the positive square roots of the respective marginal
variances of X1 and X2 . is known as the correlation coecient, with the
attractive property that
1 1
(5.77)
The most important points to note about the covariance, 12 , or the correlation coecient, are as follows:
1. 12 will be positive if values of X1 > X1 are generally associated with
values of X2 > X2 , or when values of X1 < X1 tend to be associated with values of X2 < X2 . Such variables are said to be positively
correlated and will be positive ( > 0), with the strength of the correlation indicated by the absolute value of : weakly correlated variables
will have low values close to zero while strongly correlated variables will
have values close to 1. (See Fig 5.2.) For perfectly positively correlated
variables, = 1.
2. The reverse is the case when 12 is negative: for such variables, values
of X1 > X1 appear preferentially together with values of X2 < X2 ,
or else values of X1 < X1 tend to be associated more with values of
X2 > X2 . In this case, the variables are said to be negatively correlated
and will be negative ( < 0); once again, with the strength of correlation indicated by the absolute values of . (See Fig 5.3). For perfectly
negatively correlated variables, = 1.
3. If the behavior of X1 has little or no bearing with that of X2 , as one
might expect, 12 and will tend to be close to zero (See Fig 5.4); and
when the two random variables are completely independent of each
other, then both 12 and will be exactly zero.
This last point brings up the concept of stochastic independence.
5.3.3
Independence
159
40
X2
30
20
10
0
1
5
X1
50
40
X2
30
20
10
0
1
5
X1
160
Random Phenomena
35
30
X2
25
20
15
10
5
1
5
X1
f (x1 , x2 )
f1 (x1 )
(5.78)
(5.79)
161
(5.80)
which, rst of all, is item 3 in the denition above, but just as importantly,
when substituted into the numerator of the expression in Eq (5.31), i.e.,
f (x1 |x2 ) =
f (x1 , x2 )
f2 (x2 )
(5.81)
f (x1 , 0) + f (x1 , 1)
(5.82)
(5.83)
162
Random Phenomena
TABLE 5.5:
pdfs for two-coin
Example 5.1
X2
X1
0
1
f2 (x2 )
1/4 1/4
1/4 1/4
1/2 1/2
1/2;
1/2;
f1 (x1 ) =
0;
1/2;
1/2;
f2 (x2 ) =
0;
f1 (x1 )
1/2
1/2
1
x1 = 0
x1 = 1
otherwise
(5.84)
x2 = 0
x2 = 1
otherwise
(5.85)
If we now tabulate the joint pdf and the marginal pdfs, we obtain the
result in Table 5.5. It is now clear that for all x1 and x2 ,
f (x1 , x2 ) = f1 (x1 )f2 (x2 )
(5.86)
(5.87)
= 0
= 0
(5.88)
(5.89)
since, by denition,
12 = E[(X1 X1 ).(X2 X2 )]
(5.90)
(5.91)
163
5.4
The primary objective of this chapter was to extend the ideas presented in
Chapter 4 for the single random variable to the multidimensional case, where
the outcome of interest involves two or more random variables simultaneously.
With such higher-dimensional random variables, it became necessary to introduce a new variety of pdfs dierent from, but still related to, the familiar one
encountered in Chapter 4: the joint pdf to characterize joint variation among
the variables; the marginal pdfs to characterize individual behavior of each
variable in isolation from others; and the conditional pdfs, to characterize the
behavior of one random variable conditioned upon xing the others at prespecied values. This new array of pdfs provide the full set of mathematical
tools for characterizing various aspects of multivariate random variables much
as the f (x) of Chapter 4 did for single random variables.
The possibility of two or more random variables co-varying simultaneously, which was not of concern with single random variables, led to the introduction of two additional and related quantities, co-variance and correlation,
with which one quanties the mutual dependence of two random variables.
This in turn led to the important concept of stochastic independence, that
one random variable is entirely unaected by another. As we shall see in subsequent chapters, when dealing with multiple random variables, the analysis
of joint behavior is considerably simplied if the random variables in question
164
Random Phenomena
are independent. We shall therefore have cause to recall some of the results of
this chapter at that time.
Here are some of the main points of this chapter again.
A multivariate random variable is dened in the same manner as a single
random variable, but the associated space, V , is higher-dimensional;
The joint pdf of a bivariate random variable, f (x1 , x2 ), shows how
the probabilities are distributed over the two-dimensional random variable space; the joint cdf, F (x1 , x2 ), represents the probability, P (X1 <
x1 ; X2 < x2 ); they both extend directly to higher-dimensional random
variables.
In addition to the joint pdf, two other pdfs are needed to characterize
multi-dimensional random variables fully:
Marginal pdf : fi (xi ) characterizes the individual behavior of each
random variable, Xi , by itself, regardless of the others;
Conditional pdf : f (xi |xj ) characterizes the behavior of Xi conditioned upon Xj taking on specic values.
These pdfs can be used to obtain such random variable characteristics
as joint, marginal and conditional expectations.
The covariance of two random variables, X1 and X2 , dened as
12 = E[(X1 X1 )(X2 X2 )]
(where X1 and X2 , are respective marginal expectations), provides
a measure of the mutual dependence of variations in X1 and X2 . The
related correlation coecient, the scaled quantity:
=
12
1 2
(where 1 and 2 are the positive square roots of the respective marginal
variances of X1 and X2 ), has the property that 1 1, with || indicating the strength of the mutual dependence, and the sign indicating
the direction (negative or positive).
Two random variables, X1 and X2 , are independent if the behavior of
one has no bearing on the behavior of the other; more formally,
f (x1 |x2 ) = f1 (x1 ); f (x2 |x1 ) = f2 (x2 );
so that,
f (x1 , x2 ) = f (x1 )f (x2 )
165
REVIEW QUESTIONS
1. What characteristic of the Avandia clinical test makes it relevant to the discussion of this chapter?
2. How many random variables at a time can the probability machinery of Chapter
4 deal with?
3. In dealing with several random variables simultaneously, what are some of the
questions to be considered that were not of concern when dealing with single random
variables in Chapter 4?
4. Dene a bivariate random variable formally.
5. Informally, what is a bivariate random variable?
6. Dene a multivariate random variable formally.
7. State the axiomatic denition of the joint pdf of a discrete bivariate random variable and of its continuous counterpart.
8. What is the general relationship between the cdf, F (x1 , x2 ), of a continuous bivariate random variable and its pdf, f (x1 , x2 )? What conditions must be satised
for this relationship to exist?
9. Dene the marginal distributions, f1 (x1 ) and f2 (x2 ), for a two-dimensional random variable with a joint pdf f (x1 , x2 ).
10. Do marginal pdfs possess the usual properties of pdfs or are they dierent?
11. Given a bivariate joint pdf, f (x1 , x2 ), dene the conditional pdfs, f (x1 |x2 ) and
f (x2 |x1 ).
12. In what way is the denition of a conditional pdf similar to the conditional
probability of events A and B dened on a sample space, ?
13. Dene the expectation, E[(U (X1 , X2 )], for a bivariate random variable. Extend
this to an n-dimensional (multivariate) random variable.
14. Dene the marginal expectation, E[(U (Xi )], for a bivariate random variable.
Extend this to an n-dimensional (multivariate) random variable.
15. Dene the conditional expectations, E[(U (X1 )|X2 ] and E[(U (X2 )|X1 ], for a bivariate random variable.
16. Given two random variables, X1 and X2 , dene their covariance.
17. What is the relationship between covariance and the correlation coecient?
166
Random Phenomena
18. What does a negative correlation coecient indicate about the relationship between two random variables, X1 and X2 ? What does a positive correlation coecient
indicate?
19. If the behavior of the random variable, X1 , has little bearing on that of X2 , how
will this manifest in the value of the correlation coecient, ?
20. When the correlation coecient of two random variables, X1 and X2 , is such
that || 1, what does this indicate about the random variables?
21. What does it mean that two random variables, X1 and X2 , are stochastically
independent?
22. If two random variables are independent, what is the value of their covariance,
and of their correlation coecient?
23. When dealing with n > 2 random variables, what is the dierence between
pairwise stochastic independence and mutual stochastic independence? Does one
always imply the other?
EXERCISES
Sections 5.1 and 5.2
5.1 Revisit Example 5.1 in the text and dene the two-dimensional random variable
(X1 , X2 ) as follows: X1 is the total number of heads, and X2 is the total number of tails. Obtain the space, V , and determine the complete pdf, f (x1 , x2 ), for
x1 = 0, 1, 2; x2 = 0, 1, 2, assuming equiprobable outcomes in the original sample
space.
5.2 The two-dimensional random variable (X1 , X2 ) has the following joint pdf:
f (1, 1) = 14 ;
f (1, 2) = 18 ;
1
f (1, 3) = 16
;
f (2, 1) =
f (2, 2) =
f (2, 3) =
3
8
1
8
1
16
(i) Determine the following probabilities: (a) P (X1 X2 ); (b) P (X1 + X2 = 4); (c)
P (|X2 X1 | = 1); (d) P (X1 + X2 is even).
(ii) Obtain the joint cumulative distribution function, F (x1 , x2 ).
5.3 In a game of chess, one player either wins, W , loses, L, or draws, D (either by
mutual agreement with the opponent, or as a result of a stalemate). Consider a
player participating in a two-game, pre-tournament qualication series:
(i) Obtain the sample space, .
(ii) Dene the two-dimensional random variable (X1 , X2 ) where X1 is the total
number of wins, and X2 is the total number of draws. Obtain V and, assuming
equiprobable outcomes in the original sample space, determine the complete joint
pdf, f (x1 , x2 ).
(iii) If the player is awarded 3 points for a win, 1 point for a draw and no point for a
loss, dene the random variable Y as the total number of points assigned to a player
167
at the end of the two-game preliminary round. If a player needs at least 4 points to
qualify, determine the probability of qualifying.
5.4 Revisit Exercise 5.3 above but this time consider three players: Suzie, the superior player for whom the probability of winning a game, pW = 0.75, the probability of
drawing, pD = 0.2 and the probability of losing, pL = 0.05; Meredith, the mediocre
player for whom pW = 0.5; pD = 0.3; PL = 0.2; and Paula, the poor player, for
whom pW = 0.2; pD = 0.3; PL = 0.5. Determine the complete joint pdf for each
player, fS (x1 , x2 ), for Suzie, fM (x1 , x2 ), for Meredith, and fP (x1 , x2 ), for Paula;
and from these, determine for each player, the probability that she qualies for the
tournament.
5.5 The continuous random variables X1 and X2 have the joint pdf
cx1 x2 (1 x2 ); 0 < x1 < 2; 0 < x2 < 1
f (x, y) =
0;
elsewhere
(5.93)
0
0
1/4
0
1/2
0
1/4
0
0
(i) Obtain the marginal pdfs, f1 (x1 ) and f2 (x2 ), and determine whether or not X1
and X2 are independent.
(ii) Obtain the conditional pdfs f (x1 |x2 ) and f (x2 |x1 ). Describe in words what these
results imply in terms of the original experiments and these random variables.
(iii) It is conjectured that this joint pdf is for an experiment involving tossing a fair
coin twice, with X1 as the total number of heads, and X2 as the total number of
tails. Are the foregoing results consistent with this conjecture? Explain.
5.8 Given the joint pdf:
f (x1 , x2 ) =
ce(x1 +x2 ) ;
0;
(5.94)
First obtain c, then obtain the marginal pdfs f1 (x1 ) and f2 (x2 ), and hence determine
whether or not X1 and X2 are independent.
168
Random Phenomena
5.9 If the range of validity of the joint pdf in Exercise 5.8 and Eq (5.94) are modied
to 0 < x1 < and 0 < x2 < , obtain c and the marginal pdf, and then determine
whether or not these random variables are now independent.
Section 5.3
5.10 Revisit Exercise 5.3. From the joint pdf determine
(i) E[U (X1 , X2 ) = X1 + X2 ].
(ii) E[U (X1 , X2 ) = 3X1 + X2 ]. Use this result to determine if the player will be
expected to qualify or not.
5.11 For each of the three players in Exercise 5.4,
(i) Determine the marginal pdfs, f1 (x1 ) and f2 (x2 ) and the marginal means
X1 , X2 .
(ii) Determine E[U (X1 , X2 ) = 3X1 + X2 ] and use the result to determine which of
the three players, if any, will be expected to qualify for the tournament.
5.12 Determine the covariance and correlation coecient for the two random variables whose joint pdf, f (x1 , x2 ) is given in the table in Exercise 5.7.
5.13 For each of the three chess players in Exercise 5.4, Suzie, Meredith, and Paula,
and from the joint pdf of each players performance at the pre-tournament qualifying
games, determine the covariance and correlation coecients for each player. Discuss
what these results imply in terms of the relationship between wins and draws for
each player.
5.14 The joint pdf for two random variables X and Y is given as:
x + y; 0 < x < 1; 0 < y < 1;
f (x, y) =
0;
elsewhere
(5.95)
(i) Obtain f (x|y and f (y|x) and show that these two random variables are not
independent.
(ii) Obtain the covariance, XY , and the correlation coecient, . Comment on the
strength of the correlation between these two random variables.
APPLICATION PROBLEMS
5.15 Refer to Application Problem 3.23 in Chapter 3, where the relationship between
a blood assay used to determine lithium concentration in blood samples and lithium
toxicity in 150 patients was presented in a table reproduced here for ease of reference.
Assay
A+
A
Total
Lithium
L+
30
21
51
Toxicity
L
17
82
92
Total
47
103
150
169
(i) In general, consider the assay result as the random variable Y having two possible
outcomes y1 = A+ , and y2 = A ; and consider the true lithium toxicity status as
the random variable X also having having two possible outcomes x1 = L+ , and
x2 = L . Now consider that the relative frequencies (or proportions) indicated in
the data table can be approximately considered as close enough to true probabilities;
convert the data table to a table of joint probability distribution f (x, y). What is
the probability that the test method will produce the right result?
(ii) From the table of the joint pdf, compute the following probabilities and explain what they mean in words in terms of the problem at hand: f (y2 |x2 ); f (y1 |x2 );
f (y2 |x1 ).
5.16 The reliability of the temperature control system for a commercial, highly
exothermic polymer reactor presented in Example 5.2 in the text is known to depend
on the lifetimes (in years) of the control hardware electronics, X1 , and of the control
valve on the cooling water line, X2 ; the joint pdf is:
1 (0.2x +0.1x )
1
2
; 0 < x1 <
50 e
f (x1 , x2 ) =
0 < x2 <
0
elsewhere
(i) Determine the probability that the control valve outlasts the control hardware
electronics.
(ii) Determine the converse probability that the controller hardware electronics outlast the control valve.
(iii) If a component is replaced every time it fails, how frequently can one expect to
replace the control valve, and how frequently can one expect to replace the controller
hardware electronics?
(iv) If it costs $20,000 to replace the control hardware electronics and $10,000 to
replace the control valve, how much should be budgeted over the next 20 years for
keeping the control system functioning, assuming all other characteristics remain
essentially the same over this period?
5.17 In a major bio-vaccine research company, it is inevitable that workers are exposed to some hazardous, but highly treatable, disease causing agents. According
to papers led with the Safety and Hazards Authorities of the state in which the
facility is located, the treatment provided is tailored to the workers age, (the variable, X: 0 if younger than 30 years; 1 if 31 years or older), and location in the
facility (a surrogate for virulence of the proprietary strains used in various parts of
the facility, represented by the variable Y = 1, 2, 3 or 4. The composition of the
2,500 employees at the companys research headquarters is shown in the table below:
Location
Age
< 30
31
6%
17%
20%
14%
13%
12%
10%
8%
(i) If a worker is infected at random so that the outcome is the bivariate random
variable (X, Y ) where X has two outcomes, and Y has four, obtain the pdf f (x, y)
from the given data (assuming each worker in each location has an equal chance of
infection); and determine the marginal pdfs f1 (x) and f2 (y).
170
Random Phenomena
(ii) What is the probability that a worker in need of treatment was infected in
location 3 or 4 given that he/she is < 30 years old?
(iii) If the cost of the treating each infected worker (in dollars per year) is given by
the expression
C = 1500 100Y + 500X
(5.96)
how much should the company expect to spend per worker every year, assuming the
worker composition remains the same year after year?
5.18 A non-destructive quality control test on a military weapon system correctly
detects a aw in the central electronic guidance subunit if one exists, or correctly
accepts the system as fully functional if no aw exists, 85% of the time; it incorrectly
identies a aw when one does not exist (a false positive), 5% of the time, and
incorrectly fails to detect a aw when one exists (a false negative), 10% of the time.
When the test is repeated 5 times under mostly identical conditions, if X1 is the
number of times the test is correct, and X2 is the number of times it registers a false
positive, the joint pdf of these two random variables is given as:
f (x1 , x2 ) =
120
0.85x1 0.05x2
x1 !x2 !
(5.97)
(i) Why is no consideration given in the expression in Eq (5.97) to the third random
variable, X3 , the number of times the test registers a false negative?
(ii) From Eq (5.97), generate a 5 5 table of f (x1 , x2 ) for all the possible outcomes
and from this obtain the marginal pdfs, f1 (x1 ) and f2 (x2 ). Are these two random
variables independent?
(iii) Determine the expected number of correct test results regardless of the other
results; also determine the expected value of false positives regardless of other results.
(iv) What is the expected number of the total number of correct results and false
positives? Is this value the same as the sum of the expected values obtained in (iii)?
Explain.
Chapter 6
Random Variable Transformations
6.1
6.2
6.3
6.4
6.5
171
172
173
173
175
176
177
177
179
181
184
184
185
188
188
189
190
192
Many problems of practical interest involve a random variable Y that is dened as a function of another random variable X, say according to Y = (X),
so that the characteristics of the one arise directly from those of the other via
the indicated transformation. In particular, if we already know the probability
distribution function for X as fX (x), it will be helpful to know how to determine the corresponding distribution function for Y . This chapter presents
techniques for characterizing functions of random variables, and the results,
important in their own right, become particularly useful in Part III where
probability models are derived for random phenomena of importance in engineering and science.
171
172
6.1
Random Phenomena
(6.1)
1 (X1 , X2 , . . . , Xn );
Y2 =
... =
2 (X1 , X2 , . . . , Xn );
...
Ym
m (X1 , X2 , . . . , Xn )
Y1
(6.2)
As demonstrated in later chapters, these results are extremely useful in deriving probability models for more complicated random variables from the
probability models of simpler ones.
6.2
(6.3)
(6.4)
173
exists and is also one-to-one. The procedure for obtaining fY (y) given fX (x)
is highly dependent on the nature of the random variable in question, being
more straightforward for the discrete case than for the continuous.
6.2.1
Discrete Case
(6.5)
We illustrate this straightforward result rst with the following simple example.
Example 6.1 LINEAR TRANSFORMATION OF A POISSON
RANDOM VARIABLE
As discussed in more detail in Part III, the discrete random variable X
having the following pdf:
fX (x) =
x e
; x = 0, 1, 2, 3, . . .
x!
(6.6)
P (Y = y) = P (X = y/2)
y/2 e
; y = 0, 2, 4, 6, . . .
(y/2)!
(6.9)
y/2 e
; y = 0, 2, 4, 6, . . .
(y/2)!
(6.10)
A Practical Application
The number of times, X, that each cell in a cell culture divides in a time
interval of length, t, is a random variable whose specic value depends on
many factors both intrinsic (e.g. individual cell characteristics) and extrinsic
174
Random Phenomena
(t)x e(t)
; x = 0, 1, 2, 3, . . .
x!
(6.11)
(6.12)
(6.14)
e log2 y
; y = 1, 2, 4, 8, . . .
(log2 y)!
(6.15)
(6.17)
fY (y) =
(6.18)
It is possible to conrm that the pdf obtained in Eq (6.18) for Y , the number
of cells in the culture after a time interval t is a valid pdf for which:
fY (y) = 1
(6.19)
y
175
fY (y)
= e
(2 )
22
23
2
+
+
+ ...
1+
1
2!
3!
= e(2 ) e(2
=1
(6.20)
The mean number of cells in the culture after time t, E[Y ], can be shown (see
end-of-chapter Exercise 6.2) to be:
E[Y ] = e
(6.21)
6.2.2
Continuous Case
(6.24)
dFY (y)
d
d
=
{FX [(y)]} = fX [(y)] {(y)}
dy
dy
dy
(6.25)
with the derivative on the RHS positive for a strictly monotonic increasing
function. It can be shown that if were monotonically decreasing, the expression in (6.24) will yield:
fY (y) = fX [(y)]
d
{(y)}
dy
(6.26)
with the derivative on the RHS as a negative quantity. Both results may be
combined into one as
d
fY (y) = fX [(y)] {(y)}
(6.27)
dy
as presented in Eq (6.23). Let us illustrate this with another example.
176
Random Phenomena
Example 6.2 LOG TRANSFORMATION OF A UNIFORM
RANDOM VARIABLE
The random variable X with the following pdf:
1; 0 < x < 1
(6.28)
fX (x) =
0; otherwise
is identied in Part III as the uniform random variable. Determine the
pdf for the random variable Y obtained via the transformation:
Y = ln X
(6.29)
Solution:
The transformation is one-to-one, maps VX = {0 < x < 1} onto VY =
{0 < y < }, and the inverse transformation is given by:
X = (y) = ey/ ; 0 < y < .
(6.30)
(6.31)
1 y/
e
;
0;
0<y<
otherwise
(6.32)
These two random variables and their corresponding models are discussed
more fully in Part III.
6.2.3
(6.33)
d
{i (y)}
dy
(6.34)
k
i=1
fX [i (y)]|Ji |
(6.35)
177
(6.36)
Determine the pdf for the random variable Y obtained via the transformation:
(6.37)
y = x2
Solution:
Observe that this transformation, which maps the space VX =
< x < onto VY = 0 < y < , is not one-to-one; for all y > 0
there are two xs corresponding to each y, since the inverse transformation is given by:
(6.38)
x= y
The transformation thus has 2 roots for x:
x1
1 (y) =
x2
2 (y) = y
(6.39)
(6.40)
(6.41)
6.2.4
Let us consider rst the case where the random variable transformation
involves the sum of two independent random variables, i.e.,
Y = (X1 , X2 ) = X1 + X2
(6.42)
where f1 (x1 ) and f2 (x2 ), are, respectively, the known pdfs of the random
variables X1 and X2 . Two approaches are typically employed in nding the
desired fY (y):
The cumulative distribution function approach;
The characteristic function approach.
178
Random Phenomena
x2
y= x1 + x2
Vy={(x1,x2): x1 + x2 y}
x1
FIGURE 6.1: Region of interest, VY , for computing the cdf of the random variable Y
dened as a sum of 2 independent random variables X1 and X2
where f (x1 , x2 ) is the joint pdf of X1 and X2 , and, most importantly, the
region over which the double integration is being carried out, VY , is given by:
VY = {(x1 , x2 ) : x1 + x2 y}
(6.44)
as shown in Fig 6.1. Observe from this gure that the integration may be
carried out several dierent ways: if we integrate rst with respect to x1 , the
limits go from until we reach the line, at which point x1 = y x2 ; we then
integrate with respect to x2 from to . In this case, Eq (6.43) becomes:
yx2
f (x1 , x2 )dx1 dx2
(6.45)
FY (y) =
(6.46)
179
If, instead, the integration in Eq (6.43) had been done rst with respect to x2
and then with respect to x1 , the resulting dierentiation would have resulted
in the alternative, and entirely equivalent, expression:
fY (y) =
(6.48)
Integrals of this nature are known as convolutions of the functions f1 (x1 ) and
f2 (x2 ) and this is as far as we can go with a general discussion.
Thus, we have the general result that the pdf of the random variable
Y obtained as a sum of two independent random variables X1 and X2 is a
convolution of the two contributing pdfs f1 (x1 ) and f2 (x2 ) as shown in Eqs
(6.47) and (6.48).
Let us illustrate this with a classic example.
Example 6.4 THE SUM OF TWO EXPONENTIAL RANDOM VARIABLES
Given two stochastically independent random variables X1 and X2 with
pdfs:
1
f1 (x1 ) = ex1 / ; 0 < x1 <
(6.49)
f2 (x2 ) =
1 x2 /
e
; 0 < x2 <
(6.50)
1
2
(6.51)
1 y/
ye
;0 < y <
2
(6.53)
Observe that the result presented above for the sum of two random variables extends directly to the sum of more than two random variables by successive additions. However, this procedure becomes rapidly more tedious as we
must carry out repeated convolution integrals over increasingly more complex
regions.
180
Random Phenomena
(6.54)
(6.55)
(6.56)
then
The utility of this result lies in the fact that Y (t) is easily obtained from each
contributing Xi (t); the desired fY (y) is then recovered from Y (t) either by
inspection (when this is obvious), or else by the inversion formula presented
in Chapter 4.
Let us illustrate this with the same example used above.
Example 6.5 THE SUM OF TWO EXPONENTIAL RANDOM VARIABLES REVISITED
Using characteristic functions, determine the pdf of the random variable
Y = X1 + X2 , where the pdfs of the two stochastically independent random variables X1 and X2 are as given in Example 6.4 above and their
characteristic functions are given as:
X1 (t) = X2 (t) =
1
(1 jt)
(6.57)
Solution:
From Eq (6.54), the required characteristic function for the sum is:
Y (t) =
1
(1 jt)2
(6.58)
At this point, anyone familiar with specic random variable pdfs and
their characteristic functions will recognize this particular form right
away: it is the pdf of a gamma random variable, specically (2, ),
as Chapter 9 shows. However, since we have not yet introduced these
important random variables, their pdfs and characteristic functions (see
Chapter 9), we therefore do not expect the reader to be able to deduce
the pdf corresponding to Y (t) above by inspection. In this case we can
invoke the inversion formula of Chapter 4 to obtain:
1
ejyt Y (t)dt
fY (y) =
2
1
ejyt
=
dt
(6.59)
2 (1 jt)2
181
Upon carrying out the indicated integral, we obtain the nal result:
fY (y) =
1 y/
ye
;0 < y <
2
(6.60)
1
ex/ x1 ; 0 < x <
()
(6.61)
1
(1 jt)
(6.62)
Find the pdf of the random variable Y dened as the sum of the n
independent such random variables, Xi , each with dierent parameters
i but with the same parameter .
Solution:
The desired transformation is
Y =
n
Xi
(6.63)
i=1
n
i=1
Xi (t) =
1
(1 jt)
(6.64)
where = n
i=1 i . Now, by comparing Eq (6.62) with Eq (6.64), we
see immediately the important result that Y is also a gamma random
variable, with parameters and . Thus, this sum of gamma random
variables begets another gamma random variable, a result generally
known as the reproductive property of the gamma random variable.
182
6.3
Random Phenomena
Bivariate Transformations
(6.65)
Y = (X)
(6.66)
(6.67)
X = (Y)
(6.68)
x1
y1
x1
y2
x2
y1
x2
y2
(6.69)
183
1
x1 ex2 ; 0 < x2 <
(6.72)
() 1
Determine both the joint and the marginal pdfs for the two random
variables Y1 and Y2 obtained via the transformation:
f2 (x2 ) =
Y1 = X1 + X2
X1
Y2 =
X1 + X2
(6.73)
Solution:
First, by independence, the joint pdf for X1 and X2 is:
1
x1 x11 ex1 ex2 ; 0 < x1 < ; 0 < x2 <
()() 1
(6.74)
Next, observe that the transformation in Eq (6.73) is a one-to-one
mapping of VX , the positive quadrant of the x1 x2 plane, onto
VY = {(y1 , y2 ); 0 < y1 < , 0 < y2 < 1}; the inverse transformation is
given by:
fX (x1 , x2 ) =
x1
y1 y2
x2
y1 (1 y2 )
y1
= y1
y1
(6.75)
(6.76)
1
1
[y1 (1 y2 )]1 ey1 y1 ; 0 < y1 < ;
()() (y1 y2 )
fY (y1 , y2 ) =
0 < y2 < 1;
0
otherwise
(6.77)
This may be rearranged to give:
1
1
0
otherwise
(6.78)
an equation which, apart from the constant, factors out into separate
and distinct functions of y1 and y2 , indicating that the random variables
Y1 and Y2 are independent.
By denition, the marginal pdf for Y2 is obtained by integrating out
y1 in Eq (6.78) to obtain
1
ey1 y1+1 dy1
(6.79)
f2 (y2 ) =
y21 (1 y2 )1 )
()()
0
Recognizing the integral as the gamma function, i.e.,
(a) =
ey y a1 dy
0
(6.80)
184
Random Phenomena
we obtain:
f2 (y2 ) =
( + ) 1
y
(1 y2 )1 ; 0 < y2 < 1
()() 2
(6.81)
Since, by independence,
fY (y1 , y2 ) = f1 (y1 )f2 (y2 )
(6.82)
it follows from Eqs (6.78), (6.71) or (6.72), and Eq (15.82) that the
marginal pdf for Y1 is given by:
f1 (y1 ) =
1
ey1 y1+1 ; 0 < y1 <
( + )
(6.83)
6.4
1 (X1 , X2 , . . . , Xn );
2 (X1 , X2 , . . . , Xn );
... =
Ym =
...
m (X1 , X2 , . . . , Xn )
Y1
Y2
6.4.1
Square Transformations
1 (y1 , y2 , . . . , yn );
2 (y1 , y2 , . . . , yn );
... =
xn =
...
n (y1 , y2 , . . . , yn )
x1
x2
(6.84)
(6.85)
x1
yn
..
.
x2
yn
..
.
xn
yn
185
(6.86)
And now, as in the bivariate case, it can be shown that for a J that is non-zero
anywhere in VY , the desired joint pdf for Y is given by:
fY (y) = fX [(y)]|J|; y VY
(6.87)
an expression that is identical in every way to Eq (14.32) except for the dimensionality, and similar to the single variate result in Eq (6.23). Thus for
the square transformation in which n = m, the required result is a direct
generalization of the bivariate result, identical in structure, diering only in
dimensionality.
6.4.2
Non-Square Transformations
186
Random Phenomena
2
x2
1
f2 (x2 ) = e 2 ; < x2 <
2
(6.89)
determine the pdf of the random variable Y obtained from their sum,
Y = X1 + X2
(6.90)
Solution:
First, observe that even though this is a sum, so that we could invoke
earlier results to handle this problem, Eq (6.90) is also an underdetermined transformation from two dimensions in X1 and X2 to one in Y .
To square the transformation, let the variable in Eq (6.90) now be Y1
and add another one, say Y2 = X1 X2 , to give:
Y1
X1 + X2
Y2
X1 X2
(6.91)
x2
1
(y1 + y2 )
2
1
(y1 y2 )
2
(6.92)
2
x +x2
1
(6.93)
and from Eq (6.87), the joint pdf for Y1 and Y2 is obtained as:
fY (y1 , y2 ) =
1 1
e
2 2
2
y
1
4
2
y
2
4
And now, either by inspection (this is a product of two clearly identiable, separate and distinct functions of y1 and y2 , indicating that the
two variables are independent), or by integrating out y2 in Eq (6.95),
one easily obtains the required marginal pdf for Y1 as:
2
y1
1
f1 (y1 ) = e 4 ; < y1 <
2
(6.96)
In the next example we derive one more important result and illustrate the
seriousness of the requirement that the Jacobian of the inverse transformation
not vanish anywhere in VY .
187
x2
1
f2 (x2 ) = e 2 ; < x2 <
2
(6.98)
determine the pdf of the random variable Y obtained from their ratio,
Y = X1 /X2
(6.99)
Solution:
Again, because this is an underdetermined transformation, we must rst
augment it with another one, say Y2 = X2 , to give:
Y1
Y2
X1
X2
X2
(6.100)
The Jacobian,
x1
y1 y2
x2
y2
y2
J =
0
(6.101)
y1
= y2
1
(6.102)
1
e
2
2
x1 +x2
2
2
(6.103)
from where we now obtain the joint pdf for Y1 and Y2 as:
fY (y1 , y2 ) =
1
|y2 |e
2
2 2
2
y1 y2 +y2
2
< y1 < ;
(6.104)
The careful reader will notice two things: (i) the expression for fY involves not just y2 , but its absolute value |y2 |; and (ii) that we have
excluded the troublesome point y2 = 0 from the space VY . These two
points are related: to the left of the point y2 = 0, |y2 | = y2 ; to the
right, |y2 | = y2 , so that these two regions must be treated dierently in
evaluating the integral.
188
Random Phenomena
To obtain the marginal pdf for y1 we now integrate out y2 in Eq
(6.104) over the appropriate region in VY as follows:
0
(y12 +1)y22
(y12 +1)y22
1
2
2
f1 (y1 ) =
y2 e
dy2 +
y2 e
dy2
2
0
(6.105)
which simplies to:
1
1
; < y1 <
(6.106)
f1 (y1 ) =
(1 + y12 )
as the required pdf. It is important to note that in carrying out the
integration implied in (6.105), the nature of the absolute value function, |y2 |, naturally forced us to exclude the point y2 = 0 because it
made it impossible for us to carry out the integration from to
under a single integral. (Had the integral involved not |y2 |, but y2 , as
an instructive exercise, the reader should try to evaluate the resulting
integral from to . See Exercise 6.9.)
6.4.3
Non-Monotone Transformations
In general, when the multivariate transformation y = (x) may be nonmonotone but has a countable number of roots k, when written as the matrix
version of Eq (6.33), i.e.,
xi = 1
i (y) = i (y); i = 1, 2, 3, . . . , k
(6.107)
k
fX [i (y)]|Ji |
(6.108)
i=1
6.5
We have focussed attention in this chapter on the single problem of determining the pdf, fY (y), of a random variable Y that has been dened as
a function of another random variable, X, whose pdf fX (x) is known. As is
common with problems of such general construct, the approach used to determine the desired pdf depends on the nature of the random variable, as
well as the nature of the problem itselfin this particular case, the problem
189
REVIEW QUESTIONS
1. State, in mathematical terms, the problem of primary interest in this chapter.
2. What are the results of this chapter useful for?
3. In single variable transformations, where Y = (X) is given along with fX (x),
and fY (y) is to be determined, what is the dierence between the discrete case of
this problem and the continuous counterpart?
190
Random Phenomena
EXERCISES
6.1 The pdf of a random variable X is given as:
f (x) = p(1 p)x1 ; x = 1, 2, 3, . . . ,
(6.109)
1
X
(6.110)
(ii) Given that E(X) = 1/p, obtain E(Y ) and compare it to E(X).
6.2 Given the pdf shown in Eq (6.18) for the transformed variable, Y , i.e.,
fY (y) =
e(2 ) y
; y = 1, 2, 4, 8, . . .
(log2 y)!
fX (x) =
0;
elsewhere
(6.111)
Determine the pdf for the random variable Y obtained via the transformation
Y =
1 X/
e
(6.112)
Compare this result to the one obtained in Example 6.2 in the text.
6.4 Given a random variable, X, with the following pdf:
1
(x + 1); 1 < x < 1
2
fX (x) =
0;
elsewhere
(6.113)
191
(i) Determine the pdf for the random variable Y obtained via the transformation
Y = X2
(6.114)
Xi (t) = e[i (e
1)]
(6.116)
(i) Obtain the pdf fY (y) of the random variable Y dened as the sum of these two
random variables, i.e.,
Y = X1 + X2
(ii) Extend the result to a sum of n such random variables, i.e.,
Y = X1 + X2 + + Xn
with each distribution given in Eq (6.115). Hence, establish that the random variable
X also possesses the reproductive property illustrated in Example 6.6 in the text.
(iii) Obtain the pdf fZ (z) of the random variable Z dened as the average of n such
random variables, i.e.,
1
Z = (X1 + X2 + + Xn )
n
6.6 In Example 6.3 in the text, it was established that if the random variable X has
the following pdf:
2
1
(6.117)
fX (x) = ex /2 ; < x <
2
then the pdf for the random variable Y = X 2 is:
1
fY (y) = ey/2 y 1/2 ; 0 < y <
2
(6.118)
1
(1 j2t)1/2
(6.119)
by re-writing 2 as 21/2 , and as (1/2) (or otherwise), obtain the pdf fZ (z) of
the random variable dened as:
Z = X12 + X22 + Xr2
(6.120)
where the random variables, Xi , are all mutually stochastically independent, and
each has the distribution shown in Eq (6.117).
192
Random Phenomena
6.7 Revisit Example 6.8 in the text, but this time, instead of Eq (6.91), use the
following alternative squaring transformation,
Y2 = X2
(6.121)
(6.122)
APPLICATION PROBLEMS
6.10 In a commercial process for manufacturing the extruded polymer lm Mylar ,
each roll of the product is characterized in terms of its gage, the lm thickness,
X. For a series of rolls that meet the desired mean thickness target of 350 m, the
thickness of a section of lm sampled randomly from a particular roll has the pdf
(x 350)2
1
(6.123)
f (x) = exp
2i2
i 2
where i2 is the variance associated with the average thickness for each roll, i. In
reality, the product property that is of importance to the end-user is not so much
the lm thickness, or even the average lm thickness, but a roll-to-roll consistency,
quantied in terms of a relative thickness variability measure dened as
2
X 350
(6.124)
Y =
i
Obtain the pdf fY (y) that is used to characterize the roll-to-roll variability observed
in this product quality variable.
6.11 Consider an experimental, electronically controlled, mechanical tennis ball
launcher designed to be used to train tennis players. One such machine is positioned at a xed launch point, L, located a distance of 1 m from a wall as shown in
Fig 6.2. The launch mechanism is programmed to launch the ball in an essentially
straight line, at an angle that varies randomly according to the pdf:
c; 2 < < 2
(6.125)
f () =
0; elsewhere
where c is a constant. The point of impact on the wall, at a distance y from the
193
y
1
FIGURE 6.2: Schematic diagram of the tennis ball launcher of Problem 6.11
center, will therefore be a random variable whose specic value depends on . First
show that c = , and then obtain fY (y).
6.12 The distribution of residence times in a single continuous stirred tank reactor
(CSTR), whose volume is V liters and through which reactants ow at rate F
liters/hr, was established in Chapter 2 as the pdf:
f (x) =
1 x/
;0 < x <
e
(6.126)
where = V /F .
(i) Find the pdf fY (y) of the residence time, Y , in a reactor that is 5 times as large,
given that in this case,
Y = 5X
(6.127)
(ii) Find the pdf fZ (z) of the residence time, Z, in an ensemble of 5 reactors in
series, given that:
(6.128)
Z = X1 + X2 + + X5
where each reactors pdf is as given in Eq (6.126), with parameter, i ; i = 1, 2, . . . , 5.
(Hint: Use results of Examples 6.5 and 6.6).
(iii) Show that even if 1 = 2 = = 5 = for the ensemble of 5 reactors in series,
fZ (z) will still not be the same as fY (y).
6.13 The total number of aws (dents, scratches, paint blisters, etc) found on the
various sets of doors installed on brand new minivans in an assembly plant is a
random variable with the pdf:
f (x) =
e x
; x = 0, 1, 2, . . .
x!
(6.129)
The value of the pdf parameter, , depends on the door in question as follows:
= 0.5 for the driver and front passenger doors; = 0.75 for the two bigger midsection passenger doors, and = 1.0 for the fth, rear trunk/tailgate door. If the
total number of aws per completely assembled minivan is Y , obtain the pdf fY (y)
and from it, compute the probability of assembling a minivan with more than a total
number of 2 aws on all its doors.
6.14 Let the uorescence signals obtained from a test spot and the reference spot
194
Random Phenomena
1
x1 ex1 ; 0 < x1 <
() 1
(6.130)
f2 (x2 ) =
1
x1 ex2 ; 0 < x2 <
() 1
(6.131)
It is customary to analyze such microarray data in terms of the fold change ratio,
Y =
X1
X2
(6.132)
indicative of the fold increase (or decrease) in the signal intensity between test
and reference conditions. Show that the pdf of Y is given by:
f (y) =
( + )
y 1
; y > 0; > 0; > 0
()() (1 + y)+
(6.133)
(6.134)
in a range from 50 to 500 volts and 100 to 250 C. If the voltage output is subject
to random variability around the true value V , such that
1
(v V )2
exp
(6.135)
f (v) =
2V2
V 2
where the mean (i.e., expected) value for Voltage, E(V ) = V and the variance,
V ar(V ) = V2 , (i) Show that:
E(X)
V ar(X)
0.4V + 100
(6.136)
0.16V2
(6.137)
2
(ii) In terms of E(X) = X and V ar(x) = X
, obtain an expression for the pdf
fX (x) representing the variability propagated to the temperature values.
6.16 Propagation-of-errors studies are concerned with determining how the errors from one variable are transmitted to another when the two variables are related
according to a known expression. When the relationships are linear, it is often possible to obtain complete probability distribution functions for the dependent variable
given the pdf for the independent variable (see Problem 6.15). When the relationships are nonlinear, closed form expressions are not always possible; in terms of
general results, the best one can hope for are approximate expressions for the expected value and variance of the dependent variable, typically in a local region,
upon linearizing the nonlinear expression. The following is an application of these
principles.
One of the best known laws of bioenergetics, Kleibers law, states that the Resting Energy Expenditure of an animal, Q0 , (essentially the animals metabolic rate,
195
in kcal/day), is proportional to M 3/4 , where M is the animals mass (in kg). Specifically for mature homeotherms, the expression is:
Q0 = 70M 3/4
(6.138)
rst obtain the approximate linearized expression for Q0 when M = 75 kg, and
then determine E(Q0 ) and V ar(Q0 ) for a population with M = 12.5 kg under
these conditions.
196
Random Phenomena
Chapter 7
Application Case Studies I:
Probability
7.1
7.2
7.3
7.4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mendel and Heredity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Background and Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Single Trait Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.3 Single trait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The First Generation Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Probability and The Second Generation Traits . . . . . . . . . . . . . . . .
7.2.4 Multiple Traits and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pairwise Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.5 Subsequent Experiments and Conclusions . . . . . . . . . . . . . . . . . . . . . . . .
World War II Warship Tactical Response Under Attack . . . . . . . . . . . . . . . . .
7.3.1 Background and Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2 Approach and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
198
199
199
200
201
203
204
205
205
208
209
209
209
212
212
To many scientists and engineers, a rst encounter with the theory of probability in its modern axiomatic form often leaves the impression of a subject
matter so abstract and esoteric in nature as to be entirely suited to nothing
but the most contrived applications. Nothing could be further from the truth.
In reality, the application of probability theory features prominently in many
modern elds of study: from nance, economics, sociology and psychology to
various branches of physics, chemistry, biology and engineering, providing a
perfect illustration of the aphorism that there is nothing so practical as a
good theory.
This chapter showcases the applicability of probability theory through
two specic case studies involving real-world problems whose practical importance can hardly be overstated. The rst, Mendels deduction of the laws
of hereditythe basis for the modern science of geneticsshows how Mendel
employed probability (and the concept of stochastic independence) to establish the principles underlying a phenomenon which, until then, was considered
essentially unpredictable and hence not susceptible to systematic analysis.
The second is from a now-declassied US Navy study during World War
II and involves decision-making in the face of uncertainty, using past data. It
197
198
Random Phenomena
7.1
Introduction
The elegant, well-established and fruitful tree we now see as modern probability theory has roots that reach back to 16th and 17th century gamblers
and the very realand very practicalneed for reliable solutions to numerous
gambling problems. Referring to these gambling problems by the somewhat
less morally questionable term problems on games of chance, some of the
most famous and most gifted mathematicians of the day devoted considerable
energy rst to solving specic problems (most notably the Italian mathematician, Cardano, in the 16th century), and later to developing the foundational
basis for systematic mathematical analysis (most notably the Dutch scientist,
Huygens, and the French mathematicians, Pascal and Fermat, in the 17th
century). However, despite subsequent major contributions in the 18th century from the likes of Jakob Bernoulli (1654-1705) and Abraham de Moivre
(1667-1754), it was not until the 19th ce