0% found this document useful (0 votes)
313 views1,063 pages

Random Phenomena Text

This document provides an overview and preface for a textbook on probability and statistics for engineering applications. It discusses: 1) The scope and organization of the textbook, which is organized into 5 parts covering foundational concepts, probability, probability models, statistics, and special topics. It is intended to cover a two-semester sequence or a one-semester graduate course. 2) Key features of the textbook including its emphasis on fundamental principles, derivation of probability models from underlying mechanisms, inclusion of examples and case studies, and use of computer software. 3) Suggestions for how the material could be covered for different audiences including a two-semester undergraduate sequence or a one-semester graduate course

Uploaded by

leapoffaith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
313 views1,063 pages

Random Phenomena Text

This document provides an overview and preface for a textbook on probability and statistics for engineering applications. It discusses: 1) The scope and organization of the textbook, which is organized into 5 parts covering foundational concepts, probability, probability models, statistics, and special topics. It is intended to cover a two-semester sequence or a one-semester graduate course. 2) Key features of the textbook including its emphasis on fundamental principles, derivation of probability models from underlying mechanisms, inclusion of examples and case studies, and use of computer software. 3) Suggestions for how the material could be covered for different audiences including a two-semester undergraduate sequence or a one-semester graduate course

Uploaded by

leapoffaith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1063

Babatunde A.

Ogunnaike

Random Phenomena
Fundamentals and Engineering Applications of
Probability & Statistics

Random Phenomena
Fundamentals and Engineering Applications of Probability & Statistics

I frame no hypothesis; for whatever is not deduced from the phenomenon is to be called a hypothesis; and hypotheses, whether
metaphysical or physical, whether of occult qualities or mechanical, have no place in experimental philosophy.
Sir Isaac Newton (16421727)

In Memoriam

In acknowledgement of the debt of birth I can never repay, I humbly dedicate


this book to the memory of my father, my mother, and my statistics mentor
at Wisconsin.

Adesijibomi Ogunde.ro. Ogunnaike


19222002
Some men y as eagles free
But few with grace to the same degree
as when you rise upward to y
to avoid sparrows in a crowded sky

Ayo.o.la Oluronke. Ogunnaike

19312005
Some who only search for silver and gold
Soon nd what they cannot hold;
You searched after Gods own heart,
and left behind, too soon, your pilgrims chart

William Gordon Hunter


1937-1986
See what ripples great teachers make
With but one inspiring nger
Touching, once, the young minds lake

ii

Preface

In an age characterized by the democratization of quantication, where data


about every conceivable phenomenon is available somewhere and easily accessible from just about anywhere, it is becoming just as important that the
educated person also be conversant with how to handle data, and be able
to understand what the data say as well as what they dont say. Of course,
this has always been true of scientists and engineersindividuals whose profession requires them to be involved in the acquisition, analysis, interpretation
and exploitation of data in one form or another; but it is even more so now.
Engineers now work in non-traditional areas ranging from molecular biology
to nance; physicists work with material scientists and economists; and the
problems to be solved continue to widen in scope, becoming more interdisciplinary as the traditional disciplinary boundaries disappear altogether or are
being restructured.

In writing this book, I have been particularly cognizant of these basic facts
of 21st century science and engineering. And yet while most scientists and engineers are well-trained in problem formulation and problem solving when all
the entities involved are considered deterministic in character, many remain
uncomfortable with problems involving random variations, if such problems
cannot be idealized and reduced to the more familiar deterministic types.
Even after going through the usual one-semester course in Engineering Statistics, the discomfort persists. Of all the reasons for this circumstance, the most
compelling is this: most of these students tend to perceive their training in
statistics more as a set of instructions on what to do and how to do it, than as
a training in fundamental principles of random phenomena. Such students are
then uncomfortable when they encounter problems that are not quite similar
to those covered in class; they lack the fundamentals to attack new and unfamiliar problems. The purpose of this book is to address this issue directly by
presenting basic fundamental principles, methods, and tools for formulating
and solving engineering problems that involve randomly varying phenomena.
The premise is that by emphasizing fundamentals and basic principles, and
then illustrating these with examples, the reader will be better equipped to
deal with a range of problems wider than that explicitly covered in the book.
This important point is expanded further in Chapter 0.
iii

iv

Scope and Organization


Developing a textbook that will achieve the objective stated above poses
the usual challenge of balancing breadth and depthan optimization problem
with no unique solution. But there is also the additional constraint that the
curriculum in most programs can usually only accommodate a one-semester
course in engineering statisticsif they can nd space for it at all. As all teachers of this material know, nding a universally acceptable compromise solution
is impossible. What this text oers is enough material for a two-semester introductory sequence in probability and statistics for scientists and engineers,
and with it, the exibility of several options for using the material. We envisage the following three categories, for which more detailed recommendations
for coverage will be provided shortly:
Category I: The two-semester undergraduate sequence;
Category II: The traditional one-semester undergraduate course;
Category III: The one-semester beginning graduate course.
The material has been tested and rened over more than a decade, in the
classroom (at the University of Delaware; at the African University of Science
and Technology (AUST), in Abuja, Nigeria; at the African Institute of Mathematical Sciences (AIMS) in Muizenberg, South Africa), and in short courses
presented to industrial participants at DuPont, W. L. Gore, SIEMENS, the
Food and Drugs Administration (FDA), and many others through the University of Delawares Engineering Outreach program.
The book is organized into 5 parts, after a brief prelude in Chapter 0
where the books organizing principles are expounded. Part I (Chapters 1 and
2) provides foundational material for understanding the fundamental nature
of random variability. Part II (Chapters 37) focuses on probability. Chapter
3 introduces the fundamentals of probability theory, and Chapters 4 and 5
extend these to the concept of the random variable and its distribution, for
the single and the multidimensional random variable. Chapter 6 is devoted to
random variable transformations, and Chapter 7 contains the rst of a trilogy
of case studies, this one devoted to two problems with substantial historical
signicance.
Part III (Chapters 811) is devoted entirely to developing and analyzing
probability models for specic random variables. The distinguishing characteristics of the presentation in Chapters 8 and 9, respectively for discrete and
continuous random variables, is that each model is developed from underlying
phenomenological mechanisms. Chapter 10 introduces the idea of information
and entropy as an alternative means of determining appropriate probability
models when only partial knowledge is available about the random variable in
question. Chapter 11 presents the second case study, on in-vitro fertilization
(IVF), as an application of probability models. The chapter illustrates the
development, validation, and use of probability modeling on a contemporary
problem with signicant practical implications.

v
The core of statistics is presented in Part IV (Chapters 1220). Chapter
12 lays the foundation with an introduction to the concepts and ideas behind
statistics, before the coverage begins in earnest in Chapter 13 with sampling
theory, continuing with statistical inference, estimation and hypothesis testing, in Chapters 14 and 15, and regression analysis in Chapter 16. Chapter
17 introduces the important but oft-neglected issue of probability model validation, while Chapter 18 on nonparametric methods extends the ideas of
Chapters 14 and 15 to those cases where the usual probability model assumptions (mostly the normality assumption) are invalid. Chapter 19 presents an
overview treatment of design of experiments. The third and nal set of case
studies is presented in Chapter 20 to illustrate the application of various aspects of statistics to real-life problems.
Part V (Chapters 2123) showcases the application of probability and
statistics with a hand-selected set of special topics: reliability and life testing
in Chapter 21, quality assurance and control in Chapter 22, and multivariate
analysis in Chapter 23. Each has roots in probability and statistics, but all
have evolved into bona de subject matters in their own rights.

Key Features
Before presenting suggestions of how to cover the material for various audiences, I think it is important to point out some of the key features of the
textbook.
1. Approach. This book takes a more fundamental, rst-principles approach to the issue of dealing with random variability and uncertainty in
engineering problems. As a result, for example, the treatment of probability
distributions for random variables (Chapters 810) is based on a derivation of
each model from phenomenological mechanisms, allowing the reader to see the
subterraneous roots from which these probability models sprang. The reader is
then able to see, for instance, how the Poisson model arises either as a limiting
case of the binomial random variable, or from the phenomenon of observing
in nite-sized intervals of time or space, rare events with low probabilities of
occurrence; or how the Gaussian model arises from an accumulation of small
random perturbations.
2. Examples and Case Studies. This fundamental approach note above
is integrated with practical applications in the form of a generous amount
of examples but also with the inclusion of three chapter-length application
case studies, one each for probability, probability distributions, and statistics.
In addition to the usual traditional staples, many of the in-chapter examples
have been drawn from non-traditional applications in molecular biology (e.g.,
DNA replication origin distributions; gene expression data, etc.), from nance
and business, and from population demographics.

vi
3. Computers, Computer Software, On-line resources. As expanded
further in the Appendix, the availability of computers has transformed the
teaching and learning of probability and statistics. Statistical software packages are now so widely available that many of what used to be staples of
traditional probability and statistics textbookstricks for carrying out various computations, approximation techniques, and especially printed statistical
tablesare now essentially obsolete. All the examples in this book were carried out with MINITAB, and I fully expect each student and instructor to have
access to one such statistical package. In this book, therefore, we depart from
tradition and do not include any statistical tables. Instead, we have included
in the Appendix a compilation of useful information about some popular software packages, on-line electronic versions of statistical tables, and a few other
on-line resources such as on-line electronic statistics handbooks, and websites
with data sets.
4. Questions, Exercises, Application Problems, Projects. No one feels
truly condent about a subject matter without having tackled (and solved!)
some problems; and a useful textbook ought to provide a good selection that
oers a broad range of challenges. Here is what is available in this book:
Review Questions: Found at the end of each chapter (with the exception
of the chapters on case studies), these are short, specic questions designed to test the readers basic comprehension. If you can answer all the
review questions at the end of each chapter, you know and understand
the material; if not, revisit the relevant portion and rectify the revealed
deciency.
Exercises: are designed to provide the opportunity to master the mechanics behind a single concept. Some may therefore be purely mechanical in the sense of requiring basic computations; some may require lling in the steps deliberately left as an exercise to the reader; some may
have the avor of an application; but the focus is usually a single aspect
of a topic covered in the text, or a straightforward extension thereof.
Application Problems: are more substantial practical problems whose
solutions usually require integrating various concepts (some obvious,
some not) and deploying the appropriate set of tools. Many of these are
drawn from the literature and involve real applications and actual data
sets. In such cases, the references are provided, and the reader may wish
to consult some of them for additional background and perspective, if
necessary.
Project assignments: allow deeper exploration of a few selected issues
covered in a chapter, mostly as a way of extending the coverage and
also to provide opportunities for creativity. By denition, these involve
a signicant amount of work and also require report-writing. This book
oers a total of nine such projects. They are a good way for students to

vii
learn how to plan, design, and execute projects and to develop writing
and reporting skills. (Each graduate student that has taken the CHEG
604 and CHEG 867 courses at the University of Delaware has had to do
a term project of this type.)
5. Data Sets. All the data sets used in each chapter, whether in the chapter
itself, in an example, or in the exercises or application problems, are made
available on-line and on CD.

Suggested Coverage
Of the three categories mentioned earlier, a methodical coverage of the entire textbook is only possible for Category I, in a two-semester undergraduate
sequence. For this group, the following is one possible approach to dividing
the material up into instruction modules for each semester:
First Semester
Module 1 (Foundations): Chapters 02.
Module 2 (Probability): Chapters 3, 4, 5 and 7.
Module 3 (Probability Models): Chapter 81 (omit detailed derivations
and Section 8.7.2), Chapter 91 (omit detailed derivations), and Chapter
111 (cover Sections 11.4 and 11.5 selectively; omit Section 11.6).
Module 4 (Introduction to Statistics/Sampling): Chapters 12 and 13.
Module 5 (Statistical Inference): Chapter 141 (omit Section 14.6), Chapter 151 (omit Sections 15.8 and 15.9), Chapter 161 (omit Sections 16.4.3,
16.4.4, and 16.5.2), and Chapter 17.
Module 6 (Design of Experiments): Chapter 191 (cover Sections 19.3
19.4 lightly; omit Section 19.10) and Chapter 20.
Second Semester
Module 7 (Probability and Models): Chapters 6 (with ad hoc reference
to Chapters 4 and 5); Chapters 82 and 92 (include details omitted in the
rst semester), Chapter 10.
Module 8 (Statistical Inference): Chapter 142 (Bayesian estimation, Section 14.6), Chapter 152 (Sections 15.8 and 15.9), Chapter 162 (Sections
16.4.3, 16.4.4, and 16.5.2), and Chapter 18.
Module 9 (Applications): Select one of Chapter 21, 22 or 23. (For chemical engineers, and anyone planning to work in the manufacturing industry, I recommend Chapter 22.)
With this as a basic template, other variations can be designed as appropriate.
For example, those who can only aord one semester (Category II) may
adopt the rst semester suggestion above, to which I recommend adding Chapter 22 at the end.

viii
The beginning graduate one-semester course (Cateogory III) may also be
based on the rst semester suggestion above, but with the following additional
recommendations: (i) cover of all the recommended chapters fully; (ii) add
Chapter 23 on multivariate analysis; and (iii) in lieu of a nal examination,
assign at least one, possibly two, of the nine projects.
This will make for a hectic semester, but graduate students should be able
to handle the workload.
A second, perhaps more straightforward, recommendation for a twosemester sequence is to devote the rst semester to Probability (Chapters
011), and the second to Statistics (Chapters 1220) along with one of the
three application chapters.

Acknowledgments
Pulling o a project of this magnitude requires the support and generous
assistance of many colleagues, students, and family. Their genuine words of encouragement and the occasional (innocent and not-so-innocent) inquiry about
the status of the book all contributed to making sure that this potentially
endless project was actually nished. At the risk of leaving someone out, I feel
some deserve particular mention. I begin with, in alphabetical order, Marc
Birtwistle, Ketan Detroja, Claudio Gelmi (Chile), Mary McDonald, Vinay
Prasad (Alberta, Canada), Paul Taylor (AIMS, Muizenberg, South Africa),
and Carissa Young. These are colleagues, former and current students, and
postdocs, who patiently waded through many versions of various chapters,
oered invaluable comments and caught many of the manuscript errors, typographical and otherwise. It is a safe bet that the manuscript still contains
a random number of these errors (few and Poisson distributed, I hope!) but
whatever errors remain are my responsibility. I encourage readers to let me
know of the ones they nd.
I wish to thank my University of Delaware colleagues, Antony Beris and
especially Dion Vlachos, with whom I often shared the responsibility of teaching CHEG 867 to beginning graduate students. Their insight into what the
statistics component of the course should contain was invaluable (as were the
occasional Greek lessons!). Of my other colleagues, I want to thank Dennis
Williams of Basel, for his interest and comments, and then single out former
fellow DuPonters Mike Piovoso, whose ngerprint is recognizable on the
illustrative example of Chapter 23, Ra Sela, now a Six-Sigma Master Black
Belt, Mike Deaton of James Madison University, and Ron Pearson, whose
near-encyclopedic knowledge never ceases to amaze me. Many of the ideas,
problems and approaches evident in this book arose from those discussions
and collaborations from many years ago. Of my other academic colleagues, I
wish to thank Carl Laird of Texas A & M for reading some of the chapters,
Joe Qin of USC for various suggestions, and Jim Rawlings of Wisconsin with
whom I have carried on a long-running discussion about probability and estimation because of his own interests and expertise in this area. David Bacon

ix
and John MacGregor, pioneers in the application of statistics and probability in chemical engineering, deserve my thanks for their early encouragement
about the project and for providing the occasional commentary. I also wish to
take this opportunity to acknowledge the inuence and encouragement of my
chemical engineering mentor, Harmon Ray. I learned more from Harmon than
he probably knew he was teaching me. Much of what is in this book carries
an echo of his voice and reects the Wisconsin tradition.

I must not forget my gracious hosts at the Ecole


Polytechnique Federale
de Lausanne (EPFL), Professor Dominique Bonvin (Merci pour tout, mon
ami) and Professor Vassily Hatzimanikatis (E o o:
Efharisto poli paliole). Without their generous hospitality during the
months from February through July 2009, it is very likely that this project
would have dragged on for far longer. I am also grateful to Michael Amrhein
of the Laboratoire dAutomatique at EPFL, and his graduate student, Paman
Gujral, who both took time to review several chapters and provided additional
useful references for Chapter 23. My thanks go to Allison Shatkin and Marsha
Pronin of CRC Press/Taylor and Francis for their professionalism in guiding
the project through the various phases of the editorial process all the way to
production.
And now to family. Many thanks are due to my sons, Damini and Deji, who
have had cause to use statistics at various stages of their (still on-going) education: each read and commented on a selected set of chapters. My youngest
son, Makinde, still too young to be a proofreader, was nevertheless solicitous
of my progress, especially towards the end. More importantly, however, just
by showing up when he did, and how, he conrmed to me without meaning
to, that he is a natural-born Bayesian. Finally, the debt of thanks I owe to my
wife, Anna, is dicult to express in a few words of prose. She proofread many
of the chapter exercises and problems with an incredible eye, and a sensitive
ear for the language. But more than that, she knows well what it means to be
a book widow; without her forbearance, encouragement, and patience, this
project would never have been completed.
Babatunde A. Ogunnaike
Newark, Delaware
Lausanne, Switzerland
April 2009

List of Figures

1.1
1.2
1.3
1.4
1.5
1.6

Histogram for YA data . . . . . . . . . . . . . . . . . . . . . . .


Histogram for YB data . . . . . . . . . . . . . . . . . . . . . . .
Histogram of inclusions data . . . . . . . . . . . . . . . . . . . .
Histogram for YA data with superimposed theoretical distribution
Histogram for YB data with superimposed theoretical distribution
Theoretical probability distribution function for a Poisson random
variable with parameter = 1.02. Compare with the inclusions data
histogram in Fig 1.3 . . . . . . . . . . . . . . . . . . . . . . . .

19
20
22
24
24

Schematic diagram of a plug ow reactor (PFR). . . . . . . .


Schematic diagram of a continuous stirred tank reactor
(CSTR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Instantaneous residence time distribution function for the
CSTR: (with = 5). . . . . . . . . . . . . . . . . . . . . . . .

36

39

3.1
3.2
3.3
3.4
3.5

Venn Diagram for Example 3.7 . . . . . . . . . . . . . .


Venn diagram of students in a thermodynamics class . . .
The role of conditioning Set B in conditional probability
Representing set A as a union of 2 disjoint sets . . . . . .
Partitioned sets for generalizing total probability result . .

66
72
73
74
75

4.1

The original sample space, , and the corresponding space V induced


by the random variable X . . . . . . . . . . . . . . . . . . . . .
Probability distribution function, f (x), and cumulative distribution
function, F (x), for 3-coin toss experiment of Example 4.1 . . . . .
Distribution of a negatively skewed random variable . . . . . . .
Distribution of a positively skewed random variable . . . . . . . .
Distributions with reference kurtosis (solid line) and mild kurtosis
(dashed line) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distributions with reference kurtosis (solid line) and high kurtosis
(dashed line) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The pdf of a continuous random variable X with a mode at x = 1
The cdf of a continuous random variable X showing the lower and
upper quartiles and the median . . . . . . . . . . . . . . . . . .

2.1
2.2
2.3

4.2
4.3
4.4
4.5
4.6
4.7
4.8

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

25

37

91
97
110
110
111
112
117
118
xi

xii
5.1
5.2
5.3
5.4
6.1
6.2
9.1
9.2

9.3
9.4

9.5
9.6

9.7
9.8

9.9

9.10

9.11

9.12
9.13
9.14

Graph of the joint pdf for the 2-dimensional random


Example 5.5 . . . . . . . . . . . . . . . . . . . . . .
Positively correlated variables: = 0.923 . . . . . . .
Negatively correlated variables: = 0.689 . . . . . .
Essentially uncorrelated variables: = 0.085 . . . . . .

variable of

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

149
159
159
160

Region of interest, VY , for computing the cdf of the random variable


Y dened as a sum of 2 independent random variables X1 and X2
Schematic diagram of the tennis ball launcher of Problem 6.11 . .

178
193

Exponential pdfs for various values of parameter . . . . . . . .


Gamma pdfs for various values of parameter and : Note how with
increasing values of the shape becomes less skewed, and how the
breadth of the distribution increases with increasing values of .
Gamma distribution t to data on inter-origin distances in the budding yeast S. cerevisiae genome . . . . . . . . . . . . . . . . . .
Weibull pdfs for various values of parameter and : Note how with
increasing values of the shape becomes less skewed, and how the
breadth of the distribution increases with increasing values of .
The Herschel-Maxwell 2-dimensional plane . . . . . . . . . . . .
Gaussian pdfs for various values of parameter and : Note the
symmetric shapes, how the center of the distribution is determined
by , and how the shape becomes broader with increasing values of
Symmetric tail area probabilities for the standard normal random
variable with z = 1.96 and FZ (1.96) = 0.025 = 1 FZ (1.96) . .
Lognormal pdfs for scale parameter = 0 and various values of
the shape parameter . Note how the shape changes, becoming less
skewed as becomes smaller. . . . . . . . . . . . . . . . . . . .
Lognormal pdfs for shape parameter = 1 and various values of the
scale parameter . Note how the shape remains unchanged while the
entire distribution is scaled appropriately depending on the value of
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Particle size distribution for the granulation process product: a lognormal distribution with = 6.8, = 0.5. The shaded area corresponds to product meeting quality specications, 350 < X < 1650
microns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unimodal Beta pdfs when > 1; > 1: Note the symmetric shape
when = , and the skewness determined by the value of relative
to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
U-Shaped Beta pdfs when < 1; < 1 . . . . . . . . . . . . . .
Other shapes of the Beta pdfs: It is J-shaped when (1)( 1) < 0
and a straight line when = 2; = 1 . . . . . . . . . . . . . . .
Theoretical distribution for characterizing fractional microarray intensities for the example gene: The shaded area corresponds to the
probability that the gene in question is upregulated. . . . . . . .

262

267
270

274
286

289
291

295

295

298

304
304
305

307

xiii
9.15 Two uniform distributions over dierent ranges (0,1) and (2,10).
Since the total area under the pdf must be 1, the narrower pdf is
proportionately longer than the wider one. . . . . . . . . . . . .
Two F distribution plots for dierent values for 1 , the rst degree
of freedom, but the same value for 2 . Note how the mode shifts to
the right as 1 increases . . . . . . . . . . . . . . . . . . . . . .
Three tdistribution plots for degrees of freedom values =
5, 10, 100. Note the symmetrical shape and the heavier tail for
smaller values of . . . . . . . . . . . . . . . . . . . . . . . . . .
A comparison of the tdistribution with = 5 with the standard
normal N (0, 1) distribution. Note the similarity as well as the tdistributions comparatively heavier tail. . . . . . . . . . . . . .
A comparison of the tdistribution with = 50 with the standard
normal N (0, 1) distribution. The two distributions are practically
indistinguishable. . . . . . . . . . . . . . . . . . . . . . . . . . .
A comparison of the standard Cauchy distributions with the standard normal N (0, 1) distribution. Note the general similarities as
well as the Cauchy distributions substantially heavier tail. . . . .
Common probability distributions and connections among them .

315
319

10.1 The entropy function of a Bernoulli random variable . . . . . . .

340

Elsner data versus binomial model prediction . . . . . . . . . . .


Elsner data (Younger set) versus binomial model prediction . .
Elsner data (Older set) versus binomial model prediction . . . .
Elsner data (Younger set) versus stratied binomial model prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5 Elsner data (Older set) versus stratied binomial model prediction
11.6 Complete Elsner data versus stratied binomial model prediction .
11.7 Optimum number of embryos as a function of p . . . . . . . . . .
11.8 Surface plot of the probability of a singleton as a function of p and
the number of embryos transferred, n . . . . . . . . . . . . . . .
11.9 The (maximized) probability of a singleton as a function of p when
the optimum integer number of embryos are transferred . . . . . .
11.10Surface plot of the probability of no live birth as a function of p and
the number of embryos transferred, n . . . . . . . . . . . . . . .
11.11Surface plot of the probability of multiple births as a function of p
and the number of embryos transferred, n . . . . . . . . . . . . .
11.12IVF treatment outcome probabilities for good prognosis patients
with p = 0.5, as a function of n, the number of embryos transferred
11.13IVF treatment outcome probabilities for medium prognosis patients with p = 0.3, as a function of n, the number of embryos
transferred . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.14IVF treatment outcome probabilities for poor prognosis patients
with p = 0.18, as a function of n, the number of embryos transferred

379
381
382

9.16

9.17

9.18

9.19

9.20

9.21

11.1
11.2
11.3
11.4

309

311

312

313

313

383
383
384
386
388
388
389
389
391

392
393

xiv
11.15Relative sensitivity of the binomial model derived n to errors in
estimates of p as a function of p . . . . . . . . . . . . . . . . . .
12.1 Relating the tools of Probability, Statistics and Design of Experiments to the concepts of Population and Sample . . . . . . . . .
12.2 Bar chart of welding injuries from Table 12.1 . . . . . . . . . . .
12.3 Bar chart of welding injuries arranged in decreasing order of number
of injuries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 Pareto chart of welding injuries . . . . . . . . . . . . . . . . . .
12.5 Pie chart of welding injuries . . . . . . . . . . . . . . . . . . . .
12.6 Bar Chart of frozen ready meals sold in France in 2002 . . . . . .
12.7 Pie Chart of frozen ready meals sold in France in 2002 . . . . . .
12.8 Histogram for YA data of Chapter 1 . . . . . . . . . . . . . . . .
12.9 Frequency Polygon of YA data of Chapter 1 . . . . . . . . . . . .
12.10Frequency Polygon of YB data of Chapter 1 . . . . . . . . . . . .
12.11Boxplot of the chemical process yield data YA , YB of Chapter 1 .
12.12Boxplot of random N(0,1) data: original set, and with added outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.13Box plot of raisins dispensed by ve dierent machines . . . . . .
12.14Scatter plot of cranial circumference versus nger length: The plot
shows no real relationship between these variables . . . . . . . . .
12.15Scatter plot of city gas mileage versus highway gas mileage for various two-seater automobiles: The plot shows a strong positive linear
relationship, but no causality is implied. . . . . . . . . . . . . . .
12.16Scatter plot of highway gas mileage versus engine capacity for various two-seater automobiles: The plot shows a negative linear relationship. Note the two unusually high mileage values associated
with engine capacities 7.0 and 8.4 liters identied as belonging to
the Chevrolet Corvette and the Dodge Viper, respectively. . . . .
12.17Scatter plot of highway gas mileage versus number of cylinders for
various two-seater automobiles: The plot shows a negative linear
relationship. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.18Scatter plot of US population every ten years since the 1790 census versus census year: The plot shows a strong non-linear trend,
with very little scatter, indicative of the systematic, approximately
exponential growth . . . . . . . . . . . . . . . . . . . . . . . . .
12.19Scatter plot of Y1 and X1 from Anscombe data set 1. . . . . . . .
12.20Scatter plot of Y2 and X2 from Anscombe data set 2. . . . . . . .
12.21Scatter plot of Y3 and X3 from Anscombe data set 3. . . . . . . .
12.22Scatter plot of Y4 and X4 from Anscombe data set 4. . . . . . . .

396

415
420
420
421
422
423
424
425
427
428
429
430
431
432

433

434

434

435
444
445
445
446

13.1 Sampling distribution for mean lifetime of DLP lamps in Example


< 5200) = P (0.66 < Z < 1.34)
13.3 used to compute P (5100 < X

469

13.2 Sampling distribution for average lifetime of DLP lamps in Example


< 5000) = P (Z < 2.67) . . . . . . .
13.3 used to compute P (X

470

xv
13.3 Sampling distribution of the mean diameter of ball bearings in Ex 10| 0.14) = P (|T | 0.62) .
ample 13.4 used to compute P (|X

473

13.4 Sampling distribution for the variance of ball bearing diameters in


Example 13.5 used to compute P (S 1.01) = P (C 23.93) . . .

475

13.5 Sampling distribution for the two variances of ball bearing diameters
in Example 13.6 used to compute P (F 1.41) + P (F 0.709) . .

476

14.1 Sampling distribution for the two estimators U1 and U2 : U1 is the


more ecient estimator because of its smaller variance . . . . . .

491

14.2 Two-sided tail area probabilities of /2 for the standard normal


sampling distribution . . . . . . . . . . . . . . . . . . . . . . .

504

14.3 Two-sided tail area probabilities of /2 = 0.025 for a Chi-squared


distribution with 9 degrees of freedom . . . . . . . . . . . . . . .

511

14.4 Sampling distribution with two-sided tail area probabilities of 0.025

for X/,
based on a sample of size n = 10 from an exponential
population . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

516

14.5 Sampling distribution with two-sided tail area probabilities of 0.025

for X/,
based on a larger sample of size n = 100 from an exponential population . . . . . . . . . . . . . . . . . . . . . . . . . . .

517

15.1 A distribution for the null hypothesis, H0 , in terms of the test statistic, QT , where the shaded rejection region, QT > q, indicates a signicance level, . . . . . . . . . . . . . . . . . . . . . . . . . .

557

15.2 Overlapping distributions for the null hypothesis, H0 (with mean


0 ), and alternative hypothesis, Ha (with mean a ), showing Type
I and Type II error risks , , along with qC the boundary of the
critical region of the test statistic, QT . . . . . . . . . . . . . . .

559

15.3 The standard normal variate z = z with tail area probability .


The shaded portion is the rejection region for a lower-tailed test,
Ha : < 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

564

15.4 The standard normal variate z = z with tail area probability .


The shaded portion is the rejection region for an upper-tailed test,
Ha : > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

565

15.5 Symmetric standard normal variates z = z/2 and z = z/2 with


identical tail area probabilities /2. The shaded portions show the
rejection regions for a two-sided test, Ha : = 0 . . . . . . . . .

565

15.6 Box plot for Method A scores including the null hypothesis mean,
H0 : = 75, shown along with the sample average, x
, and the
95% condence interval based on the t-distribution with 9 degrees
of freedom. Note how the upper bound of the 95% condence interval
lies to the left of, and does not touch, the postulated H0 value . .

574

xvi
15.7 Box plot for Method B scores including the null hypothesis mean,
H0 , = 75, shown along with the sample average, x
, and the 95%
condence interval based on the t-distribution with 9 degrees of freedom. Note how the the 95% condence interval includes the postulated H0 value . . . . . . . . . . . . . . . . . . . . . . . . . . .

574

15.8 Box plot of dierences between the Before and After weights,
including a 95% condence interval for the mean dierence, and the
hypothesized H0 point, 0 = 0 . . . . . . . . . . . . . . . . . . .

15.9 Box plot of the Before and After weights including individual
data means. Notice the wide range of each data set . . . . . . . .
15.10A plot of the Before and After weights for each patient. Note
how one data sequence is almost perfectly correlated with the other;
in addition note the relatively large variability intrinsic in each data
set compared to the dierence between each point . . . . . . . . .

588
590

590

15.11Null and alternative hypotheses distributions for upper-tailed test


based on n = 25 observations, with population standard deviation
= 4, where the true alternative mean, a , exceeds the hypothesized

one by = 2.0. The gure shows a z-shift of ( n)/ = 2.5;


and with reference to H0 , the critical value z0.05 = 1.65. The area
under the H0 curve to the right of the point z = 1.65 is = 0.05,
the signicance level; the area under the dashed Ha curve to the left
of the point z = 1.65 is . . . . . . . . . . . . . . . . . . . . .

15.12 and power values for hypothesis test of Fig 15.11 with Ha
N (2.5, 1). Top:; Bottom: Power = (1 ) . . . . . . . . . . . .
15.13Rejection regions for one-sided tests of a single variance of a normal
population, at a signicance level of = 0.05, based on n = 10
samples. The distribution is 2 (9); Top: for Ha : 2 > 02 , indicating
rejection of H0 if c2 > 2 (9) = 16.9; Bottom: for Ha : 2 < 02 ,
indicating rejection of H0 if c2 < 21 (9) = 3.33 . . . . . . . . .

592
594

602

15.14Rejection regions for the two-sided tests concerning the variance of


2
the process A yield data H0 : A
= 1.52 , based on n = 50 samples,
at a signicance level of = 0.05. The distribution is 2 (49), with
the rejection region shaded; because the test statistic, c2 = 44.63,
falls outside of the rejection region, we do not reject H0 . . . . . .

604

15.15Rejection regions for the two-sided tests of the equality of the vari2
2
= B
,
ances of the process A and process B yield data, i.e., H0 : A
at a signicance level of = 0.05, based on n = 50 samples each.
The distribution is F (49, 49), with the rejection region shaded; since
the test statistic, f = 0.27, falls within the rejection region to the
left, we reject H0 in favor of Ha . . . . . . . . . . . . . . . . . . .

606

16.1 Boiling point of hydrocarbons in Table 16.1 as a function of the


number of carbon atoms in the compound . . . . . . . . . . . . .
16.2 The true regression line and the zero mean random error i . . . .

649
654

xvii
16.3 The Gaussian assumption regarding variability around the true regression line giving rise to N (0, 2 ): The 6 points represent the
data at x1 , x2 , . . . , x6 ; the solid straight line is the true regression
line which passes through the mean of the sequence of the indicated
Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . .

655

16.4 The tted straight line to the Density versus Ethanol Weight % data:
The additional terms included in the graph, S, R-Sq and R-Sq(adj)
are discussed later . . . . . . . . . . . . . . . . . . . . . . . . .

659

16.5 The tted regression line to the Density versus Ethanol Weight %
data (solid line) along with the 95% condence interval (dashed line).
The condence interval is narrowest at x = x
and widens for values
further away from x
. . . . . . . . . . . . . . . . . . . . . . . . .

664

16.6 The tted straight line to the Cranial circumference versus Finger
length data. Note how the data points are widely scattered around
the tted regression line. (The additional terms included in the
graph, S, R-Sq and R-Sq(adj) are discussed later) . . . . . . . . .

667

16.7 The tted straight line to the Highway MPG versus Engine Capacity
data of Table 12.5 (leaving out the two inconsistent data points)
along with the 95% condence interval (long dashed line) and the
95% prediction interval (short dashed line). (Again, the additional
terms included in the graph, S, R-Sq and R-Sq(adj) are discussed
later). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

670

16.8 Modeling the temperature dependence of thermal conductivity: Top:


Fitted straight line to the Thermal conductivity (k) versus Temperature (T C) data in Table 16.6; Bottom: standardized residuals versus
tted value, yi . . . . . . . . . . . . . . . . . . . . . . . . . . . .

681

16.9 Modeling the dependence of the boiling points (BP) of hydrocarbon


compounds in Table 16.1 on the number of carbon atoms in the
compound: Top: Fitted straight line of BP versus n, the number of
carbon atoms; Bottom: standardized residuals versus tted value yi .
Notice the distinctive quadratic structure left over in the residuals
exposing the linear models over-estimation at the extremes and the
under-estimation in the middle. . . . . . . . . . . . . . . . . . .

16.10Catalytic process yield data of Table 16.7 . . . . . . . . . . . . .


16.11Catalytic process yield data of Table 16.1. Top: Fitted plane of Yield
as a function of Temperature and Pressure; Bottom: standardized
residuals versus tted value yi . Nothing appears unusual about these
residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

683
692

695

16.12Modeling the dependence of the boiling points (BP) of hydrocarbon


compounds in Table 16.1 on the number of carbon atoms in the
compound: Top: Fitted quadratic curve of BP versus n, the number
of carbon atoms; Bottom: standardized residuals versus tted value
yi . Despite the good t, the visible systematic structure still left
over in the residuals suggests adding one more term to the model.

703

xviii
16.13Modeling the dependence of the boiling points (BP) of hydrocarbon
compounds in Table 16.1 on the number of carbon atoms in the compound: Top: Fitted cubic curve of BP versus n, the number of carbon
atoms; Bottom: standardized residuals versus tted value yi . There
appears to be little or no systematic structure left in the residuals,
suggesting that the cubic model provides an adequate description of
the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

705

16.14Gram polynomials evaluated at 5 discrete points k = 1, 2, 3, 4, 5; p0


is the constant; p1 , the straight line; p2 , the quadratic and p3 , the
cubic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

707

17.1 Probability plots for safety data postulated to be exponentially distributed, each showing (a) rank ordered data; (b) theoretical tted
cumulative probability distribution line along with associated 95%
condence intervals; (c) a list of summary statistics, including the
p-value associated with a formal goodness-of-t test. The indication
from the p-values is that there is no evidence to reject H0 ; therefore
the model appears to be adequate . . . . . . . . . . . . . . . . .

738

17.2 Probability plot for safety data S2 wrongly postulated to be normally


distributed. The departure from the linear t does not appear too
severe, but the low/borderline p-value (0.045) objectively compels
us to reject H0 at the 0.05 signicance level and conclude that the
Gaussian model is inadequate for this data. . . . . . . . . . . . .

739

17.3 Probability plots for yield data sets YA and YB postulated to be


normally distributed. The 95% condence intervals around the tted
line, along with the indicated p-values, strongly suggest that the
distributional assumptions appear to be valid. . . . . . . . . . . .

740

17.4 Normal probability plot for the residuals of the regression analysis
of the dependence of thermal conductivity, k, on Temperature in
Example 16.5. The postulated model, a two-parameter regression
model with Gaussian distributed zero mean errors, appears valid. .

741

17.5 Chi-Squared test results for inclusions data and a postulated Poisson
model. Top panel: Bar chart of Expected and Observed frequencies, which shows how well the model prediction matches observed
data; Bottom Panel: Bar chart of contributions to the Chi-squared
statistic, showing that the group of 3 or more inclusions is responsible for the largest model-observation discrepancy, by a wide margin. 744

18.1 Histograms of interspike intervals data with Gamma model t for


the pyramidal tract cell of a monkey. Top panel: when awake (PTW); Bottom Panel: when asleep (PT-S). Note the similarities in the
estimated values for the shape parameterfor both sets of data,
and the dierence between the estimates for , the scale parameters. 774

xix
18.2 Probability plot of interspike intervals data with postulated Gamma
model and Anderson-Darling test for the pyramidal tract cell of a
monkey. Top panel: when awake (PT-W); Bottom panel: when asleep
(PT-S). The p-values for the A-D tests indicate no evidence to reject
the null hypothesis . . . . . . . . . . . . . . . . . . . . . . . . .

19.1 Graphic illustration of the orthogonal vector decomposition of Eq


(19.11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.2 Boxplot of raisins data showing what the ANOVA analysis has conrmed that there is a signicant dierence in how the machines
dispense raisins. . . . . . . . . . . . . . . . . . . . . . . . . . .

776

800

802

19.3 Normal probability plots of the residuals from the one-way classication ANOVA model in Example 19.1. Top panel: Plot obtained
directly from the ANOVA analysis which does not provide any test
statistic or signicance level; Bottom panel: Subsequent goodnessof-t test carried out on saved residuals; note the high p-value associated with the A-D test. . . . . . . . . . . . . . . . . . . . . .

19.4 Graphic illustration of the orthogonal error decomposition of Eq


(19.21) with the additional block component, EB . . . . . . . . . .
19.5 Normal probability plots of the residuals from the two-way classication ANOVA model for investigating tire wear, obtained directly
from the ANOVA analysis. . . . . . . . . . . . . . . . . . . . .

804
807

810

19.6 2 factorial design for factors A and B showing the four experimental
points; represents low values, + represents high values for each
factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

815

19.7 Graphic illustration of folding where two half-fractions of a 23


factorial design are combined to recover the full factorial design;
each fold costs an additional degree of freedom for analysis. . . . .

19.8 Normal probability plot for the eects, using Lenths method to
identify A, D and AD as signicant. . . . . . . . . . . . . . . . .
19.9 Normal probability plot for the residuals of the Etch rate model in
Eq (19.46) obtained upon projection of the experimental data to
retain only the signicant terms A, Gap (x1 ), D, Power (x2 ), and
the interaction AD, Gap*Power (x1 x2 ). . . . . . . . . . . . . . .

826
830

832

19.10The 3-factor face-centered cube (FCC) response surface design and


its constituent parts: 23 factorial base, Open circles; face center
points, lighter shaded circles; center point, darker solid circle. . . .

835

19.11The 3-factor Box-Behnken response surface design and its constituent parts: X1 , X2 : 22 factorial points moved to the center of
X3 to give the darker shaded circles at the edge-centers of the X3
axes; X2 , X3 : 22 factorial points moved to the center of X1 to give
the lighter shaded circles at the edge-centers of the X1 axes; X1 , X3 :
22 factorial points moved to the center of X2 to give the solid circles
at the edge-centers of the X2 axes; the center point, open circle. .

836

xx
20.1 Chi-Squared test results for Prussian army death by horse kicks data
and a postulated Poisson model. Top panel: Bar chart of Expected
and Observed frequencies; Bottom Panel: Bar chart of contributions to the Chi-squared statistic. . . . . . . . . . . . . . . . . .
20.2 Initial prior distribution, a Gamma (2,0.5), used to obtain a Bayesian
estimate for the Poisson mean number of deaths per unit-year parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.3 Recursive Bayesian estimates using yearly data sequentially, compared with the standard maximum likelihood estimate, 0.61,
(dashed-line). . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.4 Final posterior distribution (dashed line) along with initial prior
distribution (solid line). . . . . . . . . . . . . . . . . . . . . . .
20.5 Quadratic regression model t to US Population data along with
both the 95% condence interval and the 95% prediction interval.
20.6 Standardized residuals from the regression model t to US Population data. Top panel: Residuals versus observation order; Bottom
panel: Normal probability plot. Note the left-over pattern indicative
of serial correlation, and the unusual observations identied for the
1940 and 1950 census years in the top panel; note also the general
deviation of the residuals from the theoretical normal probability
distribution line in the bottom panel. . . . . . . . . . . . . . . .
20.7 Percent average relative population growth rate in the US for each
census year from 1800-2000 divided into three equal 70-year periods.
Period 1: 1800-1860; Period 2: 1870-1930; Period 3: 1940-2000. . .
20.8 Normal probability plot for the residuals from the ANOVA model
for Percent average relative population growth rate versus Period
with Period 1: 1800-1860; Period 2: 1870-1930; Period 3: 1940-2000.
20.9 Standardized residual plots for Yield response surface model: versus tted value, and normal probability plot. . . . . . . . . . . .
20.10Standardized residual plots for Adhesion response surface model:
versus tted value, and normal probability plot. . . . . . . . . . .
20.11Response surface and contour plots for Yield as a function of Additive and Temperature (with Time held at 60.00). . . . . . . . .
20.12Response surface and contour plots for Adhesion as a function of
Additive and Temperature (with Time held at 60.00). . . . . . . .
20.13Overlaid contours for Yield and Adhesion showing feasible region for desired optimum. The planted ag indicates the optimum
values of the responses along with the corresponding setting of the
factors Additive and Temperature (with Time held at 60.00) that
achieve this optimum. . . . . . . . . . . . . . . . . . . . . . . .
20.14Schematic diagram of folded helicopter prototype . . . . . . . . .
20.15Paper helicopter prototype . . . . . . . . . . . . . . . . . . . . .

21.1 Simple Systems: Series and parallel conguration . . . . . . . . .


21.2 A series-parallel arrangement of a 6-component system . . . . . .

861

864

867
868
874

875

877

878
884
885
886
887

888
891
893
902
902

xxi
21.3 Sampling-analyzer system: basic conguration . . . . . . . . . . .
21.4 Sampling-analyzer system: conguration with redundant solenoid
valve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.5 Fluid ow system with a cross link . . . . . . . . . . . . . . . .
21.6 Typical failure rate (hazard function) curve showing the classic three
distinct characteristic periods in the lifetime distributions of a population of items . . . . . . . . . . . . . . . . . . . . . . . . . .

21.7 Blood storage system . . . . . . . . . . . . . . . . . . . . . . .


21.8 Nuclear power plant heat exchanger system . . . . . . . . . . . .
21.9 Fluid ow system with a cross link (from Fig 21.5) . . . . . . . .
21.10Fire alarm system with back up . . . . . . . . . . . . . . . . . .
21.11Condenser system for VOCs . . . . . . . . . . . . . . . . . . . .
21.12Simplied representation of the control structure in the baroreceptor
reex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

907
907
909

913
926
927
927
928
928
929

22.1 OC Curve for a lot size of 1000, sample size of 32 and acceptance
number of 3: AQL is the acceptance quality level; RQL is the rejection quality level. . . . . . . . . . . . . . . . . . . . . . . . . . .
22.2 OC Curve for a lot size of 1000, generated for a sampling plan for an
AQL= 0.004 and an RQL = 0.02, leading to a required sample size
of 333 and acceptance number of 3. Compare with the OC curve in
Fig 22.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22.3 A generic SPC chart for the generic process variable Y indicating a
sixth data point that is out of limits. . . . . . . . . . . . . . . .
22.4 The X-bar chart for the average length measurements for 6-inch nails
determined from samples of three measurements obtained every 5
mins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22.5 The S-chart for the 6-inch nails process data of Example 22.2. . .
22.6 The combination Xbar-R chart for the 6-inch nails process data of
Example 22.2. . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.7 The combination I-MR chart for the Mooney viscosity data. . . .
22.8 P-chart for the data on defective mechanical pencils: note the 9th
observation that is outside the UCL. . . . . . . . . . . . . . . . .
22.9 C-chart for the inclusions data presented in Chapter 1, Table 1.2,
and discussed in subsequent chapters: note the 33rd observation that
is outside the UCL, otherwise, the process appears to be operating
in statistical control . . . . . . . . . . . . . . . . . . . . . . . .
22.10Time series plot of the original Mooney viscosity data of Fig 22.7
and Table 22.2, and of the shifted version showing a step increase of
0.7 after sample 15. . . . . . . . . . . . . . . . . . . . . . . . .

939

943
946

948
951
952
954
956

958

959

22.11I-chart for the shifted Mooney viscosity data. Even with = 0.5, it
is not sensitive enough to detect the step change of 0.7 introduced
after sample 15. . . . . . . . . . . . . . . . . . . . . . . . . . .

960

xxii
22.12Two one-sided CUSUM charts for the shifted Mooney viscosity data.
The upper chart uses dots; the lower chart uses diamonds; the nonconforming points are represented with the squares. With the same
= 0.5, the step change of 0.7 introduced after sample 15 is identied after sample 18. Compare with the I-Chart in Fig 22.11. . . .

962

22.13Two one-sided CUSUM charts for the original Mooney viscosity data
using the same characteristics as those in Fig 22.12. The upper
chart uses dots; the lower chart uses diamonds; there are no nonconforming points. . . . . . . . . . . . . . . . . . . . . . . . . .

962

22.14EWMA chart for the shifted Mooney viscosity data, with w = 0.2.
Note the staircase shape of the control limits for the earlier data
points. With the same = 0.5, the step change of 0.7 introduced
after sample 15 is detected after sample 18. The non-conforming
points are represented with the squares. Compare with the I-Chart
in Fig 22.11 and the CUSUM charts in Fig 22.12. . . . . . . . . .

964

22.15The EWMA chart for the original Mooney viscosity data using the
same characteristics as in Fig 22.14. There are no non-conforming
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

965

23.1 Examples of the bivariate Gaussian distribution where the two random variables are uncorrelated ( = 0) and strongly positively correlated ( = 0.9). . . . . . . . . . . . . . . . . . . . . . . . . . .

23.2 Plot of the 16 variables in the illustrative example data set. . . . .


23.3 Scree plot showing that the rst two components are the most important. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23.4 Plot of the scores and loading for the rst principal component. The
distinct trend indicated in the scores should be interpreted along
with the loadings by comparison to the full original data set in Fig
23.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

981
992
994

995

23.5 Plot of the scores and loading for the second principal component.
The distinct trend indicated in the scores should be interpreted along
with the loadings by comparison to the full original data set in Fig
23.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

996

23.6 Scores and loading plots for the rst two components. Top panel:
Scores plot indicates a quadratic relationship between the two scores
t1 and t2 ; Bottom panel: Loading vector plot indicates that in the
new set of coordinates, the original variables contain mostly pure
components PC1 and PC2 indicated by a distinctive North/South
and West/East alignment of the data vectors, with like variables
clustered together according to the nature of the component contributions. Compare to the full original data set in Fig 23.2. . . . . .

998

23.7 Principal component model for a 3-dimensional data set described


by two principal components on a plane, showing a point with a
large Q and another with a large T 2 value. . . . . . . . . . . . . 1001

xxiii
23.8 Control limits for Q and T 2 for process data represented with two
principal components. . . . . . . . . . . . . . . . . . . . . . . . 1001

xxiv

List of Tables

1.1
1.2
1.3
1.4
1.5

Yield Data for Process A versus Process B . . . . . . . . . .


Number of inclusions on sixty 1-sq meter glass sheets . .
Group classication and frequencies for YA data . . . . . . .
Group classication and frequencies for YB data . . . . . .
Group classication and frequencies for the inclusions data

.
.
.
.
.

13
16
18
19
21

2.1

Computed probabilities of occurrence of various number of inclusions for = 2 . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.1
3.2
3.3

Subsets and Events . . . . . . . . . . . . . . . . . . . . . . . .


Class list and attributes . . . . . . . . . . . . . . . . . . . . .
Lithium toxicity study results . . . . . . . . . . . . . . . . . .

63
65
85

4.1

f (x) and F (x) for the three coin-toss experiments of Example


4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The pdf f (x) for the ball-drawing game . . . . . . . . . . . .
Summary analysis for the ball-drawing game . . . . . . . . .

96
103
104

4.2
4.3
5.1
5.2
5.3
5.4
5.5

. . . . .
. . . . .
. . . . .
. . . . .
Example
. . . . .

151
152
152
153

202

7.4

Summary of Mendels single trait experiment results . . . . .


Theoretical distribution of shape-color traits in second generation hybrids under the independence assumption . . . . . . .
Theoretical versus experimental results for second generation
hybrid plants . . . . . . . . . . . . . . . . . . . . . . . . . . .
Attacks and hits on US Naval Warships in 1943 . . . . . . . .

8.1
8.2

Theoretical versus empirical frequencies for inclusions data .


Summary of probability models for discrete random variables

241
245

9.1

Summary of probability models for continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

318

7.1
7.2
7.3

Joint pdf for computer store sales . . . . . . . . . . .


Joint and marginal pdfs for computer store sales . .
Conditional pdf f (x1 |x2 ) for computer store sales . .
Conditional pdf f (x2 |x1 ) for computer store sales . .
Joint and marginal pdfs for two-coin toss problem of
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

162

207
208
210

xxv

xxvi
10.1 Summary of maximum entropy probability models . . . . . .
11.1 Theoretical distribution of probabilities of possible outcomes of
an IVF treatment . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Elsner, et al., data of outcomes of a 42-month IVF treatment
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Binomial model prediction of Elsner, et al. data in Table 11.2
11.4 Elsner data stratied by age indicating variability in the probability of success estimates . . . . . . . . . . . . . . . . . . .
11.5 Stratied binomial model prediction of Elsner, et al. data. . .
12.1 Number and Type of injuries incurred by welders in the USA
from 1980-1989 . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Frozen Ready meals in France, in 2002 . . . . . . . . . . . . .
12.3 Group classication and frequencies for YA data . . . . . . . .
12.4 Number of raisins dispensed into trial-sized Raising Bran cereal boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5 Gasoline mileage ratings for a collection of two-seater automobiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6 Descriptive statistics for yield data sets YA and YB . . . . . .
12.7 The Anscombe data set 1 . . . . . . . . . . . . . . . . . . . .
12.8 The Anscombe data sets 2, 3, and 4 . . . . . . . . . . . . . .

356

373
376
378
379
382

419
422
425
430
433
441
443
443

14.1 Summary of estimation results . . . . . . . . . . . . . . . . .


14.2 Some population parameters and conjugate prior distributions
appropriate for their Bayesian estimation . . . . . . . . . . .

549

15.1
15.2
15.3
15.4
15.5
15.6

558
566
571
577
579

Hypothesis test decisions and risks . . . . . . . . . . . . . . .


Summary of H0 rejection conditions for the one-sample z-test
Summary of H0 rejection conditions for the one-sample t-test
Summary of H0 rejection conditions for the two-sample z-test
Summary of H0 rejection conditions for the two-sample t-test
Before and After weights for patients on a supervised
weight-loss program . . . . . . . . . . . . . . . . . . . . . . .
15.7 Summary of H0 rejection conditions for the paired t-test . . .
15.8 Sample size n required to achieve a power of 0.9 . . . . . . .
15.9 Summary of H0 rejection conditions for the 2 -test . . . . . .
15.10Summary of H0 rejection conditions for the F -test . . . . . .
15.11Summary of H0 rejection conditions for the single-proportion
z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.12Summary of Selected Hypothesis Tests and their Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16.1 Boiling points of a series of hydrocarbons . . . . . . . . . . .


16.2 Density (in gm/cc) and weight percent of ethanol in ethanolwater mixture . . . . . . . . . . . . . . . . . . . . . . . . . . .

550

586
587
598
601
604
608
645
649
658

xxvii
16.3 Density and weight percent of ethanol in ethanol-water mixture: model t and residual errors . . . . . . . . . . . . . . . .
16.4 Cranial circumference and nger lengths . . . . . . . . . . . .
16.5 ANOVA Table for Testing Signicance of Regression . . . . .
16.6 Thermal conductivity measurements at various temperatures
for a metal . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.7 Laboratory experimental data on Yield . . . . . . . . . . . .
17.1 Table of values for safety data probability plot . . . . . . . .
18.1 A professors teaching evaluation scores organized by student
type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.2 Interspike intervals data . . . . . . . . . . . . . . . . . . . . .
18.3 Summary of Selected Nonparametric Tests and their Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.1 Data table for typical single-factor experiment . . . . . . .
19.2 One-Way Classication ANOVA Table . . . . . . . . . . .
19.3 Data table for typical single-factor, two-way classication,
periment . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.4 Two-Way Classication ANOVA Table . . . . . . . . . . .
19.5 Data table for typical two-factor experiment . . . . . . . .
19.6 Two-factor ANOVA Table . . . . . . . . . . . . . . . . . .

. .
. .
ex. .
. .
. .
. .

20.1 Frequency distribution of Prussian army deaths by horse kicks


20.2 Actual vs Predicted Frequency distribution of Prussian army
deaths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.3 Year-by-Year, Unit-by-Unit breakdown of Prussian army
deaths data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.4 Recursive (yearly) Bayesian estimates of the mean number of
deaths per unit-year . . . . . . . . . . . . . . . . . . . . . . .
20.5 Frequency distribution of bomb hits in greater London during
WW II and Poisson model prediction . . . . . . . . . . . . . .
20.6 US Population (to the nearest million) from 17902000 . . . .
20.7 Percent average relative population growth rate for each census
year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.8 Response surface design and experimental results for coating
process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

659
666
675
679
693
735

759
773
779
799
801
806
808
813
813
858
859
862
866
869
871
877
880

21.1 Summary of H0 rejection conditions for the test of hypothesis


based on an exponential model of component failure-time . .

921

22.1 Measured length of samples of 6-inch nails in a manufacturing


process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.2 Hourly Mooney viscosity data . . . . . . . . . . . . . . . . . .
22.3 Number and proportion of defective mechanical pencils . . . .

949
953
956

xxviii

Contents

0 Prelude
0.1 Approach Philosophy . . . . . . . . . . . . . . . . . . . . . .
0.2 Four basic principles . . . . . . . . . . . . . . . . . . . . . . .
0.3 Summary and Conclusions . . . . . . . . . . . . . . . . . . .

1
1
3
5

Foundations

1 Two Motivating Examples


1.1 Yield Improvement in a Chemical Process . . . . . . . . .
1.1.1 The Problem . . . . . . . . . . . . . . . . . . . . . .
1.1.2 The Essence of the Problem . . . . . . . . . . . . . .
1.1.3 Preliminary Intuitive Notions . . . . . . . . . . . .
1.2 Quality Assurance in a Glass Sheet Manufacturing Process
1.3 Outline of a Systematic Approach . . . . . . . . . . . . . .
1.3.1 Group Classication and Frequency Distributions .
1.3.2 Theoretical Distributions . . . . . . . . . . . . . . .
1.4 Summary and Conclusions . . . . . . . . . . . . . . . . . .
2 Random Phenomena, Variability and Uncertainty
2.1 Two Extreme Idealizations of Natural Phenomena .
2.1.1 Introduction . . . . . . . . . . . . . . . . . .
2.1.2 A Chemical Engineering Illustration . . . . .
2.2 Random Mass Phenomena . . . . . . . . . . . . . .
2.2.1 Dening Characteristics . . . . . . . . . . . .
2.2.2 Variability and Uncertainty . . . . . . . . . .
2.2.3 Practical Problems of Interest . . . . . . . . .
2.3 Introducing Probability . . . . . . . . . . . . . . . .
2.3.1 Basic Concepts . . . . . . . . . . . . . . . . .
2.3.2 Interpreting Probability . . . . . . . . . . . .
2.4 The Probabilistic Framework . . . . . . . . . . . . .
2.5 Summary and Conclusions . . . . . . . . . . . . . .

II

Probability

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

11
12
12
14
14
16
17
18
22
25

.
.
.
.
.
.
.
.
.
.
.
.

33
34
34
35
41
41
42
42
43
43
44
47
48

53
xxix

xxx
3 Fundamentals of Probability Theory
3.1 Building Blocks . . . . . . . . . . . . .
3.2 Operations . . . . . . . . . . . . . . . .
3.2.1 Events, Sets and Set Operations
3.2.2 Set Functions . . . . . . . . . . .
3.2.3 Probability Set Function . . . . .
3.2.4 Final considerations . . . . . . .
3.3 Probability . . . . . . . . . . . . . . . .
3.3.1 The Calculus of Probability . . .
3.3.2 Implications . . . . . . . . . . . .
3.4 Conditional Probability . . . . . . . . .
3.4.1 Illustrating the Concept . . . . .
3.4.2 Formalizing the Concept . . . . .
3.4.3 Total Probability . . . . . . . . .
3.4.4 Bayes Rule . . . . . . . . . . . .
3.5 Independence . . . . . . . . . . . . . . .
3.6 Summary and Conclusions . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

57
58
61
61
65
67
68
69
69
71
72
72
73
74
76
77
78

4 Random Variables and Distributions


4.1 Introduction and Denition . . . . . . . . . . . . . . .
4.1.1 Mathematical Concept of the Random Variable
4.1.2 Practical Considerations . . . . . . . . . . . . .
4.1.3 Types of Random Variables . . . . . . . . . . .
4.2 Distributions . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Discrete Random Variables . . . . . . . . . . .
4.2.2 Continuous Random Variables . . . . . . . . .
4.2.3 The Probability Distribution Function . . . . .
4.3 Mathematical Expectation . . . . . . . . . . . . . . .
4.3.1 Motivating the Denition . . . . . . . . . . . .
4.3.2 Denition and Properties . . . . . . . . . . . .
4.4 Characterizing Distributions . . . . . . . . . . . . . .
4.4.1 Moments of a Distributions . . . . . . . . . . .
4.4.2 Moment Generating Function . . . . . . . . . .
4.4.3 Characteristic Function . . . . . . . . . . . . .
4.4.4 Additional Distributional Characteristics . . .
4.4.5 Entropy . . . . . . . . . . . . . . . . . . . . . .
4.4.6 Probability Bounds . . . . . . . . . . . . . . . .
4.5 Special Derived Probability Functions . . . . . . . . .
4.5.1 Survival Function . . . . . . . . . . . . . . . . .
4.5.2 Hazard Function . . . . . . . . . . . . . . . . .
4.5.3 Cumulative Hazard Function . . . . . . . . . .
4.6 Summary and Conclusions . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

89
90
90
94
94
95
95
98
100
102
102
105
107
107
113
115
116
119
119
122
122
123
124
124

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

xxxi
5 Multidimensional Random Variables
137
5.1 Introduction and Denitions . . . . . . . . . . . . . . . . . . 138
5.1.1 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . 138
5.1.2 2-Dimensional (Bivariate) Random Variables . . . . . 139
5.1.3 Higher-Dimensional (Multivariate) Random Variables
140
5.2 Distributions of Several Random Variables . . . . . . . . . . 141
5.2.1 Joint Distributions . . . . . . . . . . . . . . . . . . . . 141
5.2.2 Marginal Distributions . . . . . . . . . . . . . . . . . . 144
5.2.3 Conditional Distributions . . . . . . . . . . . . . . . . 147
5.2.4 General Extensions . . . . . . . . . . . . . . . . . . . . 153
5.3 Distributional Characteristics of Jointly Distributed Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . 154
5.3.2 Covariance and Correlation . . . . . . . . . . . . . . . 157
5.3.3 Independence . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 163
6 Random Variable Transformations
6.1 Introduction and Problem Denition .
6.2 Single Variable Transformations . . . .
6.2.1 Discrete Case . . . . . . . . . . .
6.2.2 Continuous Case . . . . . . . . .
6.2.3 General Continuous Case . . . .
6.2.4 Random Variable Sums . . . . .
6.3 Bivariate Transformations . . . . . . .
6.4 General Multivariate Transformations .
6.4.1 Square Transformations . . . . .
6.4.2 Non-Square Transformations . .
6.4.3 Non-Monotone Transformations .
6.5 Summary and Conclusions . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

171
172
172
173
175
176
177
182
184
184
185
188
188

7 Application Case Studies I: Probability


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Mendel and Heredity . . . . . . . . . . . . . . . . . . .
7.2.1 Background and Problem Denition . . . . . . .
7.2.2 Single Trait Experiments and Results . . . . . .
7.2.3 Single trait analysis . . . . . . . . . . . . . . . .
7.2.4 Multiple Traits and Independence . . . . . . . .
7.2.5 Subsequent Experiments and Conclusions . . . .
7.3 World War II Warship Tactical Response Under Attack
7.3.1 Background and Problem Denition . . . . . . .
7.3.2 Approach and Results . . . . . . . . . . . . . . .
7.3.3 Final Comments . . . . . . . . . . . . . . . . . .
7.4 Summary and Conclusions . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

197
198
199
199
201
201
205
208
209
209
210
212
212

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

xxxii

III

Distributions

213

8 Ideal Models of Discrete Random Variables


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 The Discrete Uniform Random Variable . . . . . . . . . . . .
8.2.1 Basic Characteristics and Model . . . . . . . . . . . .
8.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . .
8.3 The Bernoulli Random Variable . . . . . . . . . . . . . . . .
8.3.1 Basic Characteristics . . . . . . . . . . . . . . . . . . .
8.3.2 Model Development . . . . . . . . . . . . . . . . . . .
8.3.3 Important Mathematical Characteristics . . . . . . . .
8.4 The Hypergeometric Random Variable . . . . . . . . . . . .
8.4.1 Basic Characteristics . . . . . . . . . . . . . . . . . . .
8.4.2 Model Development . . . . . . . . . . . . . . . . . . .
8.4.3 Important Mathematical Characteristics . . . . . . . .
8.4.4 Applications . . . . . . . . . . . . . . . . . . . . . . .
8.5 The Binomial Random Variable . . . . . . . . . . . . . . . .
8.5.1 Basic Characteristics . . . . . . . . . . . . . . . . . . .
8.5.2 Model Development . . . . . . . . . . . . . . . . . . .
8.5.3 Important Mathematical Characteristics . . . . . . . .
8.5.4 Applications . . . . . . . . . . . . . . . . . . . . . . .
8.6 Extensions and Special Cases of the Binomial Random Variable
8.6.1 Trinomial Random Variable . . . . . . . . . . . . . . .
8.6.2 Multinomial Random Variable . . . . . . . . . . . . .
8.6.3 Negative Binomial Random Variable . . . . . . . . . .
8.6.4 Geometric Random Variable . . . . . . . . . . . . . .
8.7 The Poisson Random Variable . . . . . . . . . . . . . . . . .
8.7.1 The Limiting Form of a Binomial Random Variable .
8.7.2 First Principles Derivation . . . . . . . . . . . . . . . .
8.7.3 Important Mathematical Characteristics . . . . . . . .
8.7.4 Applications . . . . . . . . . . . . . . . . . . . . . . .
8.8 Summary and Conclusions . . . . . . . . . . . . . . . . . . .

217
218
219
219
220
221
221
221
222
222
222
223
224
224
225
225
225
226
227
230
230
231
232
234
236
236
237
239
240
243

9 Ideal Models of Continuous Random Variables


9.1 Gamma Family Random Variables . . . . . . . .
9.1.1 The Exponential Random Variable . . . .
9.1.2 The Gamma Random Variable . . . . . .
9.1.3 The Chi-Square Random Variable . . . .
9.1.4 The Weibull Random Variable . . . . . .
9.1.5 The Generalized Gamma Model . . . . .
9.1.6 The Poisson-Gamma Mixture Distribution
9.2 Gaussian Family Random Variables . . . . . . .
9.2.1 The Gaussian (Normal) Random Variable
9.2.2 The Standard Normal Random Variable .
9.2.3 The Lognormal Random Variable . . . . .

257
259
260
264
271
272
276
276
278
279
290
292

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

xxxiii

9.3

9.4

9.2.4
9.2.5
Ratio
9.3.1
9.3.2

The Rayleigh Random Variable . . . . . . . . . . . . .


The Generalized Gaussian Model . . . . . . . . . . . .
Family Random Variables . . . . . . . . . . . . . . . .
The Beta Random Variable . . . . . . . . . . . . . . .
Extensions and Special Cases of the Beta Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.3 The (Continuous) Uniform Random Variable . . . . .
9.3.4 Fishers F Random Variable . . . . . . . . . . . . . . .
9.3.5 Students t Random Variable . . . . . . . . . . . . . .
9.3.6 The Cauchy Random Variable . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . .

297
300
300
301
307
308
309
311
314
316

10 Information, Entropy and Probability Models


335
10.1 Uncertainty and Information . . . . . . . . . . . . . . . . . . 336
10.1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 336
10.1.2 Quantifying Information . . . . . . . . . . . . . . . . . 337
10.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
10.2.1 Discrete Random Variables . . . . . . . . . . . . . . . 338
10.2.2 Continuous Random Variables . . . . . . . . . . . . . 340
10.3 Maximum Entropy Principles for Probability Modeling . . . 344
10.4 Some Maximum Entropy Models . . . . . . . . . . . . . . . . 344
10.4.1 Discrete Random Variable; Known Range . . . . . . . 345
10.4.2 Discrete Random Variable; Known Mean . . . . . . . 346
10.4.3 Continuous Random Variable; Known Range . . . . . 348
10.4.4 Continuous Random Variable; Known Mean . . . . . . 349
10.4.5 Continuous Random Variable; Known Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.4.6 Continuous Random Variable; Known Range, Mean and
Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 351
10.5 Maximum Entropy Models from General Expectations . . . . 351
10.5.1 Single Expectations . . . . . . . . . . . . . . . . . . . 351
10.5.2 Multiple Expectations . . . . . . . . . . . . . . . . . . 353
10.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 354
11 Application Case Studies II: In-Vitro Fertilization
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
11.2 In-Vitro Fertilization and Multiple Births . . . . . . .
11.2.1 Background and Problem Denition . . . . . .
11.2.2 Clinical Studies and Recommended Guidelines
11.3 Probability Modeling and Analysis . . . . . . . . . . .
11.3.1 Model Postulate . . . . . . . . . . . . . . . . .
11.3.2 Prediction . . . . . . . . . . . . . . . . . . . . .
11.3.3 Estimation . . . . . . . . . . . . . . . . . . . .
11.4 Binomial Model Validation . . . . . . . . . . . . . . .
11.4.1 Overview and Study Characteristics . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

363
364
365
365
367
371
371
372
373
375
375

xxxiv
11.4.2 Binomial Model versus Clinical Data . . . . . . . . . .
11.5 Problem Solution: Model-based IVF Optimization and Analysis
11.5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 Model-based Analysis . . . . . . . . . . . . . . . . . .
11.5.3 Patient Categorization and Theoretical Analysis of
Treatment Outcomes . . . . . . . . . . . . . . . . . . .
11.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . .
11.6.1 General Discussion . . . . . . . . . . . . . . . . . . . .
11.6.2 Theoretical Sensitivity Analysis . . . . . . . . . . . . .
11.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . .
11.7.1 Final Wrap-up . . . . . . . . . . . . . . . . . . . . . .
11.7.2 Conclusions and Perspectives on Previous Studies and
Guidelines . . . . . . . . . . . . . . . . . . . . . . . . .

IV

Statistics

377
384
385
386
390
392
392
394
395
395
397

403

12 Introduction to Statistics
12.1 From Probability to Statistics . . . . . . . . . . . . . . .
12.1.1 Random Phenomena and Finite Data Sets . . . . .
12.1.2 Finite Data Sets and Statistical Analysis . . . . . .
12.1.3 Probability, Statistics and Design of Experiments .
12.1.4 Statistical Analysis . . . . . . . . . . . . . . . . . .
12.2 Variable and Data Types . . . . . . . . . . . . . . . . . .
12.3 Graphical Methods of Descriptive Statistics . . . . . . . .
12.3.1 Bar Charts and Pie Charts . . . . . . . . . . . . .
12.3.2 Frequency Distributions . . . . . . . . . . . . . . .
12.3.3 Box Plots . . . . . . . . . . . . . . . . . . . . . . .
12.3.4 Scatter Plots . . . . . . . . . . . . . . . . . . . . .
12.4 Numerical Descriptions . . . . . . . . . . . . . . . . . . .
12.4.1 Theoretical Measures of Central Tendency . . . . .
12.4.2 Measures of Central Tendency: Sample Equivalents
12.4.3 Measures of Variability . . . . . . . . . . . . . . .
12.4.4 Supplementing Numerics with Graphics . . . . . .
12.5 Summary and Conclusions . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

407
408
408
411
414
415
417
419
419
424
427
431
436
436
438
440
442
446

13 Sampling
13.1 Introductory Concepts . . . . . . . . . . . . . . . . .
13.1.1 The Random Sample . . . . . . . . . . . . . . .
13.1.2 The Statistic and its Distribution . . . . . .
13.2 The Distribution of Functions of Random Variables .
13.2.1 General Overview . . . . . . . . . . . . . . . .
13.2.2 Some Important Sampling Distribution Results
13.3 Sampling Distribution of The Mean . . . . . . . . . .
13.3.1 Underlying Probability Distribution Known . .
13.3.2 Underlying Probability Distribution Unknown .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

459
460
460
461
463
463
463
465
465
467

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

xxxv
13.3.3 Limiting Distribution of the Mean
13.3.4 Unknown . . . . . . . . . . . . .
13.4 Sampling Distribution of the Variance . .
13.5 Summary and Conclusions . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

467
470
473
476

14 Estimation
487
14.1 Introductory Concepts . . . . . . . . . . . . . . . . . . . . . 488
14.1.1 An Illustration . . . . . . . . . . . . . . . . . . . . . . 488
14.1.2 Problem Denition and Key Concepts . . . . . . . . . 489
14.2 Criteria for Selecting Estimators . . . . . . . . . . . . . . . . 490
14.2.1 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . 490
14.2.2 Eciency . . . . . . . . . . . . . . . . . . . . . . . . . 491
14.2.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . 492
14.3 Point Estimation Methods . . . . . . . . . . . . . . . . . . . 493
14.3.1 Method of Moments . . . . . . . . . . . . . . . . . . . 493
14.3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . 496
14.4 Precision of Point Estimates . . . . . . . . . . . . . . . . . . 503
14.5 Interval Estimates . . . . . . . . . . . . . . . . . . . . . . . . 506
14.5.1 General Principles . . . . . . . . . . . . . . . . . . . . 506
14.5.2 Mean of a Normal Population; Known . . . . . . . . 507
14.5.3 Mean of a Normal Population; Unknown . . . . . . 508
14.5.4 Variance of a Normal Population . . . . . . . . . . . . 510
14.5.5 Dierence of Two Normal Populations Means . . . . . 512
14.5.6 Interval Estimates for Parameters from other Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
14.6 Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . 518
14.6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 518
14.6.2 Basic Concept . . . . . . . . . . . . . . . . . . . . . . 519
14.6.3 Bayesian Estimation Results . . . . . . . . . . . . . . 520
14.6.4 A Simple Illustration . . . . . . . . . . . . . . . . . . . 521
14.6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 524
14.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 527
15 Hypothesis Testing
15.1 Introduction . . . . . . . . . . . . . . . . . . . .
15.2 Basic Concepts . . . . . . . . . . . . . . . . . . .
15.2.1 Terminology and Denitions . . . . . . . .
15.2.2 General Procedure . . . . . . . . . . . . .
15.3 Concerning Single Mean of a Normal Population
15.3.1 Known; the z-test . . . . . . . . . . .
15.3.2 Unknown; the t-test . . . . . . . . .
15.3.3 Condence Intervals and Hypothesis Tests
15.4 Concerning Two Normal Population Means . . .
15.4.1 Population Standard Deviations Known .
15.4.2 Population Standard Deviations Unknown

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

551
552
554
554
560
561
563
570
575
576
576
578

xxxvi
15.4.3 Paired Dierences . . . . . . . . . . . . . . . . . . .
15.5 Determining , Power, and Sample Size . . . . . . . . . . .
15.5.1 and Power . . . . . . . . . . . . . . . . . . . . . .
15.5.2 Sample Size . . . . . . . . . . . . . . . . . . . . . . .
15.5.3 and Power for Lower-Tailed and Two-Sided Tests
15.5.4 General Power and Sample Size Considerations . . .
15.6 Concerning Variances of Normal Populations . . . . . . . .
15.6.1 Single Variance . . . . . . . . . . . . . . . . . . . . .
15.6.2 Two Variances . . . . . . . . . . . . . . . . . . . . .
15.7 Concerning Proportions . . . . . . . . . . . . . . . . . . . .
15.7.1 Single Population Proportion . . . . . . . . . . . . .
15.7.2 Two Population Proportions . . . . . . . . . . . . .
15.8 Concerning Non-Gaussian Populations . . . . . . . . . . .
15.8.1 Large Sample Test for Means . . . . . . . . . . . . .
15.8.2 Small Sample Tests . . . . . . . . . . . . . . . . . . .
15.9 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . .
15.9.1 General Principles . . . . . . . . . . . . . . . . . . .
15.9.2 Special Cases . . . . . . . . . . . . . . . . . . . . . .
15.9.3 Asymptotic Distribution for . . . . . . . . . . . .
15.10Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.11Summary and Conclusions . . . . . . . . . . . . . . . . . .
16 Regression Analysis
16.1 Introductory Concepts . . . . . . . . . . . . . . .
16.1.1 Dependent and Independent Variables . . .
16.1.2 The Principle of Least Squares . . . . . . .
16.2 Simple Linear Regression . . . . . . . . . . . . . .
16.2.1 One-Parameter Model . . . . . . . . . . . .
16.2.2 Two-Parameter Model . . . . . . . . . . . .
16.2.3 Properties of OLS Estimators . . . . . . . .
16.2.4 Condence Intervals . . . . . . . . . . . . .
16.2.5 Hypothesis Testing . . . . . . . . . . . . . .
16.2.6 Prediction and Prediction Intervals . . . . .
16.2.7 Coecient of Determination and the F-Test
16.2.8 Relation to the Correlation Coecient . . .
16.2.9 Mean-Centered Model . . . . . . . . . . . .
16.2.10 Residual Analysis . . . . . . . . . . . . . . .
16.3 Intrinsically Linear Regression . . . . . . . . . .
16.3.1 Linearity in Regression Models . . . . . . .
16.3.2 Variable Transformations . . . . . . . . . .
16.4 Multiple Linear Regression . . . . . . . . . . . . .
16.4.1 General Least Squares . . . . . . . . . . . .
16.4.2 Matrix Methods . . . . . . . . . . . . . . .
16.4.3 Some Important Special Cases . . . . . . .
16.4.4 Recursive Least Squares . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

585
591
591
593
598
599
600
601
603
606
607
610
613
613
614
616
616
619
622
623
624

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

647
648
650
651
652
652
653
660
661
664
668
671
676
677
678
682
682
685
686
687
688
694
698

xxxvii
16.5 Polynomial Regression . . . . . . . . . .
16.5.1 General Considerations . . . . . .
16.5.2 Orthogonal Polynomial Regression
16.6 Summary and Conclusions . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

700
700
704
710

17 Probability Model Validation


17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
17.2 Probability Plots . . . . . . . . . . . . . . . . . . . . .
17.2.1 Basic Principles . . . . . . . . . . . . . . . . . .
17.2.2 Transformations and Specialized Graph Papers
17.2.3 Modern Probability Plots . . . . . . . . . . . .
17.2.4 Applications . . . . . . . . . . . . . . . . . . .
17.3 Chi-Squared Goodness-of-t Test . . . . . . . . . . .
17.3.1 Basic Principles . . . . . . . . . . . . . . . . . .
17.3.2 Properties and Application . . . . . . . . . . .
17.4 Summary and Conclusions . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

731
732
733
733
734
736
737
739
739
742
745

18 Nonparametric Methods
18.1 Introduction . . . . . . . . . . . . . . . . . . . . .
18.2 Single Population . . . . . . . . . . . . . . . . . .
18.2.1 One-Sample Sign Test . . . . . . . . . . . .
18.2.2 One-Sample Wilcoxon Signed Rank Test . .
18.3 Two Populations . . . . . . . . . . . . . . . . . . .
18.3.1 Two-Sample Paired Test . . . . . . . . . . .
18.3.2 Mann-Whitney-Wilcoxon Test . . . . . . .
18.4 Probability Model Validation . . . . . . . . . . . .
18.4.1 The Kolmogorov-Smirnov Test . . . . . . .
18.4.2 The Anderson-Darling Test . . . . . . . . .
18.5 A Comprehensive Illustrative Example . . . . . .
18.5.1 Probability Model Postulate and Validation
18.5.2 Mann-Whitney-Wilcoxon Test . . . . . . .
18.6 Summary and Conclusions . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

757
758
760
760
763
765
766
766
770
770
771
772
772
775
777

19 Design of Experiments
19.1 Introductory Concepts . . . . . . . . . . . . .
19.1.1 Experimental Studies and Design . . . .
19.1.2 Phases of Ecient Experimental Studies
19.1.3 Problem Denition and Terminology . .
19.2 Analysis of Variance . . . . . . . . . . . . . . .
19.3 Single Factor Experiments . . . . . . . . . . .
19.3.1 One-Way Classication . . . . . . . . .
19.3.2 Kruskal-Wallis Nonparametric Test . . .
19.3.3 Two-Way Classication . . . . . . . . .
19.3.4 Other Extensions . . . . . . . . . . . . .
19.4 Two-Factor Experiments . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

791
793
793
794
795
796
797
797
805
805
811
811

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

xxxviii
19.5 General Multi-factor Experiments . . .
19.6 2k Factorial Experiments and Design .
19.6.1 Overview . . . . . . . . . . . . .
19.6.2 Design and Analysis . . . . . . .
19.6.3 Procedure . . . . . . . . . . . . .
19.6.4 Closing Remarks . . . . . . . . .
19.7 Screening Designs: Fractional Factorial
19.7.1 Rationale . . . . . . . . . . . . .
19.7.2 Illustrating the Mechanics . . . .
19.7.3 General characteristics . . . . . .
19.7.4 Design and Analysis . . . . . . .
19.7.5 A Practical Illustrative Example
19.8 Screening Designs: Plackett-Burman . .
19.8.1 Primary Characteristics . . . . .
19.8.2 Design and Analysis . . . . . . .
19.9 Response Surface Designs . . . . . . . .
19.9.1 Characteristics . . . . . . . . . .
19.9.2 Response Surface Designs . . . .
19.9.3 Design and Analysis . . . . . . .
19.10Introduction to Optimal Designs . . . .
19.10.1 Background . . . . . . . . . . . .
19.10.2 Alphabetic Optimal Designs .
19.11Summary and Conclusions . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

814
814
814
816
817
821
822
822
822
823
825
827
832
833
833
834
834
835
836
837
837
838
839

20 Application Case Studies III: Statistics


855
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
20.2 Prussian Army Death-by-Horse kicks . . . . . . . . . . . . . 857
20.2.1 Background and Data . . . . . . . . . . . . . . . . . . 857
20.2.2 Parameter Estimation and Model Validation . . . . . 859
20.2.3 Recursive Bayesian Estimation . . . . . . . . . . . . . 860
20.3 WW II Aerial Bombardment of London . . . . . . . . . . . . 868
20.4 US Population Dynamics: 1790-2000 . . . . . . . . . . . . . . 870
20.4.1 Background and Data . . . . . . . . . . . . . . . . . . 870
20.4.2 Truncated Data Modeling and Evaluation . . . . . . 872
20.4.3 Full Data Set Modeling and Evaluation . . . . . . . . 873
20.4.4 Hypothesis Testing Concerning Average Population
Growth Rate . . . . . . . . . . . . . . . . . . . . . . . 876
20.5 Process Optimization . . . . . . . . . . . . . . . . . . . . . . 879
20.5.1 Problem Denition and Background . . . . . . . . . . 879
20.5.2 Experimental Strategy and Results . . . . . . . . . . . 879
20.5.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 880
20.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 889

Applications

895

xxxix
21 Reliability and Life Testing
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
21.2 System Reliability . . . . . . . . . . . . . . . . . . .
21.2.1 Simple Systems . . . . . . . . . . . . . . . . .
21.2.2 Complex Systems . . . . . . . . . . . . . . . .
21.3 System Lifetime and Failure-Time Distributions . .
21.3.1 Characterizing Time-to-Failure . . . . . . . .
21.3.2 Probability Models for Distribution of Failure
21.4 The Exponential Reliability Model . . . . . . . . . .
21.4.1 Component Characteristics . . . . . . . . . .
21.4.2 Series Conguration . . . . . . . . . . . . . .
21.4.3 Parallel Conguration . . . . . . . . . . . . .
21.4.4 m-of-n Parallel Systems . . . . . . . . . . . .
21.5 The Weibull Reliability Model . . . . . . . . . . . .
21.6 Life Testing . . . . . . . . . . . . . . . . . . . . . . .
21.6.1 The Exponential Model . . . . . . . . . . . .
21.6.2 The Weibull Model . . . . . . . . . . . . . . .
21.7 Summary and Conclusions . . . . . . . . . . . . . .

. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Times
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

22 Quality Assurance and Control


22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
22.2 Acceptance Sampling . . . . . . . . . . . . . . . . . . .
22.2.1 Basic Principles . . . . . . . . . . . . . . . . . . .
22.2.2 Determining a Sampling Plan . . . . . . . . . . .
22.3 Process and Quality Control . . . . . . . . . . . . . . .
22.3.1 Underlying Philosophy . . . . . . . . . . . . . . .
22.3.2 Statistical Process Control . . . . . . . . . . . . .
22.3.3 Basic Control Charts . . . . . . . . . . . . . . . .
22.3.4 Enhancements . . . . . . . . . . . . . . . . . . .
22.4 Chemical Process Control . . . . . . . . . . . . . . . . .
22.4.1 Preliminary Considerations . . . . . . . . . . . .
22.4.2 Statistical Process Control (SPC) Perspective . .
22.4.3 Engineering/Automatic Process Control (APC)
spective . . . . . . . . . . . . . . . . . . . . . . .
22.4.4 SPC or APC . . . . . . . . . . . . . . . . . . . .
22.5 Process and Parameter Design . . . . . . . . . . . . . .
22.5.1 Basic Principles . . . . . . . . . . . . . . . . . . .
22.5.2 A Theoretical Rationale . . . . . . . . . . . . . .
22.6 Summary and Conclusions . . . . . . . . . . . . . . . .

. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Per. . .
. . .
. . .
. . .
. . .
. . .

23 Introduction to Multivariate Analysis


23.1 Multivariate Probability Models . . . . . . .
23.1.1 Introduction . . . . . . . . . . . . . .
23.1.2 The Multivariate Normal Distribution
23.1.3 The Wishart Distribution . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

899
900
901
901
906
911
911
913
914
914
915
916
917
918
919
919
922
923
933
934
936
936
938
944
944
944
946
958
964
964
965
966
967
969
969
970
971
977
978
978
979
981

xl
23.1.4 Hotellings T -Squared Distribution
23.1.5 The Wilks Lambda Distribution .
23.1.6 The Dirichlet Distribution . . . . .
23.2 Multivariate Data Analysis . . . . . . . .
23.3 Principal Components Analysis . . . . .
23.3.1 Basic Principles of PCA . . . . . .
23.3.2 Main Characteristics of PCA . . .
23.3.3 Illustrative example . . . . . . . .
23.3.4 Other Applications of PCA . . . .
23.4 Summary and Conclusions . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

982
982
983
984
985
986
990
991
999
1002

Appendix

1005

Index

1009

Chapter 0
Prelude

0.1
0.2
0.3

Approach Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Four basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
3
5

Rem tene; verba sequentur.


(Grasp the subject; the words will follow.)
Cato the Elder (234149 BC)

From weather forecasts and life insurance premiums for non-smokers to clinical
tests of experimental drugs and defect rates in manufacturing facilities, and
in numerous other ways, randomly varying phenomena exert a subtle but pervasive inuence on everyday life. In most cases, one can be blissfully ignorant
of the true implications of the presence of such phenomena without consequence. In science and engineering, however, the inuence of randomly varying phenomena can be such that even apparently simple problems can become
dramatically complicated by the presence of random variabilitydemanding
special methods and analysis tools for obtaining valid and useful solutions.

The primary aim of this book is to provide the reader with the basic fundamental principles, methods, and tools for formulating and
solving engineering problems involving randomly varying phenomena.

Since this aim can be achieved in several dierent ways, this chapter is
devoted to presenting this books approach philosophy.

0.1

Approach Philosophy

Engineers are typically well-trained in the art of problem formulation


and problem solving when all the entities involved are considered deterministic in character. However, many problems of practical importance involve
1

Random Phenomena

randomly varying phenomena of one sort or another; and the vast majority of
such problems cannot always be idealized and reduced to the more familiar
deterministic types without destroying the very essence of the problem. For
example, in determining which of two catalysts A or B provides the greater
yield in a chemical manufacturing process , it is well-known that the respective
yields YA and YB , as observed experimentally, are randomly varying quantities. Chapter 1 presents a full-scale discussion of this problem. For now, we
simply note that with catalyst A, fty dierent experiments performed under
essentially identical conditions will result in fty dierent values (realizations)
for YA . Similarly for catalyst B, one obtains fty distinct values for YB from
fty dierent experiments replicated under identical conditions. The rst 10
experimental data points for this example are shown in the table below.
YA % YB %
74.04 75.75
75.29 68.41
75.62 74.19
75.91 68.10
77.21 68.10
75.07 69.23
74.23 70.14
74.92 69.22
76.57 74.17
77.77 70.23
Observe that because of the variability inherent in the data, some of the YA
values are greater than some of the YB values; but the converse is also true
some YB values are greater than some YA values. So how does one determine
reliably and condentlywhich catalyst (if any) really provides the greater
yield? Clearly, special methods and analysis tools are required for handling
this apparently simple problem: the deterministic idealization of comparing a
single observed value of YA (say the rst entry, 74.04) with a corresponding
single observed value of YB (in this case 75.75) is incapable of producing a
valid answer. The primary essence of this problem is the variability inherent
in the data which masks the fact that one catalyst does in fact provide the
greater yield.
This book takes a more fundamental, rst-principles approach to the
issue of dealing with random variability and uncertainty in engineering problems. This is in contrast to the typical engineering statistics approach on the
one hand, or the problem-specic approach on the other. With the former
approach, most of the emphasis is on how to use certain popular statistical
techniques to solve some of the most commonly encountered engineering problems, with little or no discussion of why the techniques are eective. With the
latter approach, a particular topic (say Design of Experiments) is selected
and dealt with in depth, and the appropriate statistical tools are presented
and discussed within the context of the specic problem at the core of the

Prelude

selected topic. By denition, such an approach excludes all other topics that
may be of practical interest, opting to make up in depth what it gives up in
breadth.

The approach taken in this book is based on the premise that emphasizing fundamentals and basic principles, and then illustrating
these with examples, equips the reader with the means of dealing
with a range of problems wider than that explicitly covered in the
book.

This approach philosophy is based on the four basic principles discussed


next.

0.2

Four basic principles

1. If characterized properly, random phenomena are subject to rigorous mathematical analysis in much the same manner as deterministic phenomena.
Random phenomena are so-called because they show no apparent regularity, appearing to occur haphazardlytotally at random; the observed variations do not seem to obey any discernible rational laws and therefore appear to
be entirely unpredictable. However, the unpredictable irregularities of the individual observations (or, more generally, the detail) of random phenomena
in fact co-exist with surprisingly predictable ensemble, or aggregate, behavior. This fact makes rigorous analysis possible; it also provides the basis for
employing the concept and calculus of probability to develop a systematic
framework for characterizing random phenomena in terms of probability distribution functions.
The rst order of business is therefore to seek to understand random phenomena and to develop techniques for characterizing them appropriately. Part
I, titled FOUNDATIONS: Understanding Random Variability, and Part II,
titled PROBABILITY: Characterizing Random Variability, are devoted to
these respective tasks. Ultimately, probabilityand the probability distribution functionare introduced as the theoretical constructs for eciently describing our knowledge of the real-world phenomena in question.
2. By focusing on the underlying phenomenological mechanisms , it is possible
to develop appropriate theoretical characterizations of random phenomena in
terms of ideal models of the observed variability.
Within the probabilistic framework, the ensemble, or aggregate behavior

Random Phenomena

of the random phenomenon in question is characterized by its probability


distribution function. In much the same way that theoretical mathematical models are derived from rst-principles for deterministic phenomena, it
is also possible to derive these theoretical probability distribution functions
as ideal models that describe our knowledge of the underlying random phenomena. Part III, titled DISTRIBUTIONS: Modeling Random Variability, is
devoted to the important tasks of developing and analyzing ideal probability
models for many random phenomena of practical interest. The end result is a
collection of probability distribution functions each derived directly fromand
hence explicitly linked tothe underlying random phenomenological mechanisms.
3. The ensemble (or aggregate) characterization provided by ideal probability
models can be used successfully to develop the theoretical basis for solving real
problems where one is always limited to dealing with an incomplete collection
of individual observationsnever the entire aggregate.
A key dening characteristic of random phenomena is that specic outcomes or observations cannot be predicted with absolute certainty. With probabilistic analysis, this otherwise impossible task of predicting the unpredictable individual observation or outcome is simply replaced by the analytical
task of determining the mathematical probability of its occurrence. In many
practical problems involving random phenomena, however, there is no avoiding this impossible task: one is required to deal with, and make decisions
about, individual observations, and must therefore confront the inevitable uncertainty that will always be associated with such decisions. Statistical Theory,
using the aggregate descriptions of probability theory, provides a rational basis
not only for making these predictions and decisions about individual observations with condence, but also for quantifying the degree of uncertainty
associated with such decisions.
Part IV, titled STATISTICS: Quantifying Random Variability, is devoted
to elucidating statistical principles and concepts required for dealing eectively
with data as collections of individual observations from random phenomena.
4. The usefulness and broad applicability of the fundamental principles, analysis methods, and tools provided by probability and statistics are best illustrated
with several actual example topics of engineering applications involving random phenomena.
The manifestations of random phenomena in problems of practical interest are countless, and the range of such problems is itself quite broad: from
simple data analysis and experimental designs, to polynomial curve-tting,
and empirical modeling of complex dynamic systems; from quality assurance
and control, to state and parameter estimation, and process monitoring and
diagnosis, . . . etc. The topical headings under which such problems may be
organizedDesign of Experiments; Regression Analysis; Time Series Analysis; etcare numerous, and many books have been devoted to each one of

Prelude

them. Clearly then, the sheer vastness of the subject matter of engineering
applications of probability and statistics renders completely unreasonable any
hope of comprehensive coverage in a single introductory text.
Nevertheless, how probability and statistics are employed in practice to
deal successfully with various problems created by random variability and
uncertainty can be discussed in such a way as to equip the student with
the tools needed to approach, with condence, other problems that are not
addressed explicitly in this book.
Part V, titled APPLICATIONS: Dealing with Random Variability in Practice, consists of three chapters each devoted to a specic application topic of
importance in engineering practice. Entire books have been written, and entire courses taught, on each of the topics to which we will devote only one
chapter; the coverage is therefore designed to be more illustrative than comprehensive, providing the basis for absorbing and employing more eciently,
the more extensive material presented in these other books or courses.

0.3

Summary and Conclusions

This chapter has been primarily concerned with setting forth this books
approach to presenting the fundamentals and engineering applications of probability and statistics. The four basic principles on which the more fundamental,
rst principles approach is based were presented, providing the rationale for
the scope and organization of the material to be presented in the rest of the
book.
The approach is designed to produce the following result:

A course of study based on this book should provide the reader with
a reasonable fundamental understanding of random phenomena, a
working knowledge of how to model and analyze such phenomena,
and facility with using probability and statistics to cope with random variability and uncertainty in some key engineering problems.

The book should also prepare the student to absorb and employ the material presented in more problem-specic courses such as Design of Experiments,
Time Series Analysis, Regression Analysis, Statistical Process Control, etc, a
bit more eciently.

Random Phenomena

Part I

Foundations

Part I: Foundations
Understanding Random Variability

I shall light a candle of understanding in thine heart which shall


not be put out.
Apocrypha: I Esdras 14:25

10

Part I: Foundations
Understanding Random Variability

Chapter 1: Two Motivating Examples


Chapter 2: Random Phenomena, Variability and Uncertainty

Chapter 1
Two Motivating Examples

1.1

1.2
1.3

1.4

Yield Improvement in a Chemical Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


1.1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mathematical Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 The Essence of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Preliminary Intuitive Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Quality Assurance in a Glass Sheet Manufacturing Process . . . . . . . . . . . . .
Outline of a Systematic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Group Classication and Frequency Distributions . . . . . . . . . . . . . . .
1.3.2 Theoretical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11
12
12
14
14
15
17
18
22
23
25
26
27
28

And coming events cast their shadows before.


(Lochiels warning.)
Thomas Campbell (17771844)

When random variability is genuinely intrinsic to a problem, uncertainty


becomes inevitable, but the problem can still be solved systematically and
with condence. This the underlying theme of Applied Probability and
Statistics is what this chapter seeks to illustrate with two representative
examples. The example problems and the accompanying discussions are intended to serve two main purposes: (i) illustrate the sort of complications
caused by the presence of random components in practical problems; and (ii)
demonstrate (qualitatively for now) how to solve such problems by formulating them properly and employing appropriate methods and tools. The primary
value of this chapter is as a vehicle for placing in context this books approach
to analyzing randomly varying phenomena in engineering and science. It allows us to preview and motivate the key concepts to be developed fully in the
remaining chapters.
11

12

Random Phenomena

1.1

Yield Improvement in a Chemical Process

To an engineer or scientist, determining which of two numbers is larger,


and by how much, is trivial, in principle requiring no more than the elementary
arithmetic operation of subtraction. Identify the two numbers as individual observations from two randomly varying quantities, however, and the character
of the problem changes signicantly: determiningwith any certaintywhich
of the random quantities is larger and precisely by how much now requires
more than mere subtraction. This is the case with our rst example.

1.1.1

The Problem

A chemical process using catalyst A (process A) is being considered as


an alternative to the incumbent process using a dierent catalyst B (process
B). The decision in favor of one process over the other is to be based on
a comparison of the yield YA obtained from the challenger, and YB from the
incumbent, in conjunction with the following economic considerations :
Achieving target prot objectives for the nished product line requires
a process yield consistently at or above 74.5%.
For process A to be a viable alternative, at the barest minimum, its yield
must be higher than that for process B.
Every 1% yield increase over what is currently achievable with the incumbent process translates to signicant after tax operating income;
however, catalyst A used by the alternative process costs more than
catalyst B used by the incumbent process. Including the additional cost
of the process modications required to implement the new technology,
a shift to catalyst A and the new process will be economically viable only
if the resulting yield increase exceeds 2%.
The result of a series of 50 experiments carefully performed on each
process to determine YA and YB is shown in Table 1.1. Given only the supplied
data, what should the Vice President/General Manager of this business do:
authorize a switch to the new process A or stay with the incumbent process?
Mathematical Formulation
Observe that solving this problem requires nding appropriate answers to
the following mathematical questions:
1. Is YA 74.5, and YB 74.5, consistently?
2. Is YA > YB ?

Two Motivating Examples

TABLE 1.1:

Yield Data
for Process A versus Process B
YA %
YB %
74.04 75.29 75.75 68.41
75.63 75.92 74.19 68.10
77.21 75.07 68.10 69.23
74.23 74.92 70.14 69.23
76.58 77.77 74.17 70.24
75.05 74.90 70.09 71.91
75.69 75.31 72.63 78.41
75.19 77.93 71.16 73.37
75.37 74.78 70.27 73.64
74.47 72.99 75.82 74.42
73.99 73.32 72.14 78.49
74.90 74.88 74.88 76.33
75.78 79.07 70.89 71.07
75.09 73.87 72.39 72.04
73.88 74.23 74.94 70.02
76.98 74.85 75.64 74.62
75.80 75.22 75.70 67.33
77.53 73.99 72.49 71.71
72.30 76.56 69.98 72.90
77.25 78.31 70.15 70.14
75.06 76.06 74.09 68.78
74.82 75.28 72.91 72.49
76.67 74.39 75.40 76.47
76.79 77.57 69.38 75.47
75.85 77.31 71.37 74.12

13

14

Random Phenomena
3. If yes, is YA YB > 2?

Clearly, making the proper decision hinges on our ability to answer these
questions with condence.

1.1.2

The Essence of the Problem

Observe that the real essence of the problem is random variability: if each
experiment had resulted in the same single, constant number for YA and another for YB , the problem would be deterministic in character, and each of
the 3 associated questions would be trivial to answer. Instead, the random
phenomena inherent in the experimental determination of the true process
yields have been manifested in the observed variability, so that we are uncertain about the true values of YA and YB , making it not quite as trivial to
solve the problem.
The sources of variability in this case can be shown to include the measurement procedure, the measurement device itself, raw materials, and process
conditions. The observed variability is therefore intrinsic to the problem and
cannot be idealized away. There is no other way to solve this problem rationally without dealing directly with the random variability.
Next, note that YA and YB data (observations) take on values on a continuous scale i.e. yield values are real and can be located anywhere on the
real line, as opposed to quantities that can take on integer values only (as is
the case with the second example discussed later). The variables YA and YB
are therefore said to be continuous and this example illustrates decisionmaking under uncertainty when the random phenomena in question involve
continuous variables.
The main issues with this problem are as follows:
1. Characterization: How should the quantities YA and YB be characterized
so that the questions raised above can be answered properly?
2. Quantication: Are there such things as true values of the quantities
YA and YB ? If so, how should these true values be best quantied?
3. Application: How should the characterization and quantication of YA
and YB be used to answer the 3 questions raised above?

1.1.3

Preliminary Intuitive Notions

Before outlining procedures for solving this problem, it is helpful to entertain some notions that the intuition of a good scientist or engineer will
suggest. For instance, the concept of the arithmetic average of a collection
of n data points, x1 , x2 , x3 , . . . , xn , dened by:
1
xi
n i=1
n

x
=

(1.1)

Two Motivating Examples

15

is well-known to all scientists and engineers, and the intuitive notion of employing this single computed value to represent the data set is almost instinctive.
It seems reasonable therefore to consider representing YA with the computed
average obtained from the data, i.e. yA = 75.52, and similarly, representing
YB with yB = 72.47. We may now observe right away that yA > yB , which
now seems to suggest not only that YA > YB , but since yA yB = 3.05, that
the dierence in fact exceeds the threshold of 2%.
As intuitively appealing as these arguments might be, they raise some
important additional questions:
1. The variability of individual values of the data yAi around the average
value yA = 75.52 is noticeable; that of yBi around the average value yB =
72.47 even more so. How condent then are we about the arguments
presented above, and in the implied recommendation to prefer process
A to B, based as they are on the computed averages? (For example,
there are some 8 values of yBi > yA ; what should we make of this fact?)
2. Will it (or should it) matter that
72.30 < yAi < 79.07
67.33 < yBi < 78.41

(1.2)

so that the observed data are seen to vary over a range of yield values
that is 11.08 units wide for process B as opposed to 6.77 for A? The
averages give no indication of these extents of variability.
3. More fundamentally, is it always a good idea to work with averages? How
reasonable is it to characterize the entire data set with the average?
4. If new sets of data are gathered, the new averages computed from them
will almost surely dier from the corresponding values computed from
the current set of data shown here. Observe therefore that the computed
averages yA and yB are themselves clearly subject to random variability.
How can we then be sure that using averages oers any advantages,
since, like the original data, these averages are also not free from random
variability?
5. How were the data themselves collected? What does it mean concretely
that the 50 experiments were carefully performed? Is it possible that
the experimental protocols used may have impaired our ability to answer
the questions posed above adequately? Conversely, are there protocols
that are particularly calibrated to improve our ability to answer these
questions adequately?
Obviously therefore there is a lot more to dealing with this example problem
than merely using the intuitively appealing notion of averages.
Let us now consider a second, dierent but somewhat complementary,
example.

16

Random Phenomena

TABLE 1.2:
inclusions
glass sheets
0 1 1
2 0 2
1 2 0
1 1 5
2 1 0
1 0 0

1.2

Number of
on sixty 1-sq meter
1
2
1
2
0
2

0
3
0
0
1
4

0
2
1
0
1
0

1
0
0
1
0
1

0
0
0
4
0
1

2
2
1
1
1
0

2
0
1
1
1
1

Quality Assurance in a Glass Sheet Manufacturing


Process

A key measure of product quality in a glass sheet manufacturing process is


the optical attribute known as inclusions particulate aws (of size exceeding 0.5 m) included in an otherwise perfectly clear glass sheet. While it is
all but inevitable to nd inclusions in some products, the best manufacturers
produce remarkably few glass sheets with these imperfections; and even then,
the actual number of inclusions on these imperfect sheets is itself usually very
low, perhaps 1 or 2.
The specic example in question involves a manufacturer of 1 sq. meter
sheets of glass used for various types of building windows. Prior to shipping
a batch of manufactured product to customers, a sample of glass sheets from
the batch is sent to the companys Quality Control (QC) laboratory where an
optical scanning device is used to determine X, the number of inclusions in
each square-meter sheet. The results for 60 samples from a particular batch
is shown in Table 1.2.
This particular set of results caused the supervising QC engineer some
concern for the following reasons:
1. Historically, the manufacturing process hardly ever produces sheets with
more than 3 inclusions per square meter; this batch of 60 has three such
sheets: two with 4 inclusions, and one with 5.
2. Each 1 sq-m sheet with 3 or fewer inclusions is acceptable and can
be sold to customers unconditionally; sheets with 4 inclusions are
marginally acceptable so long as a batch of 1000 sheets does not contain
more than 20 such sheets; a sheet with 5 or more inclusions is unacceptable and cannot be shipped to customers. All such sheets found by
a customer are sent back to the manufacturer (at the manufacturers
expense) for a full refund. The specic sheet of this type contained in
this sample of 60 must therefore be found and eliminated.
3. More importantly, the manufacturing process was designed such that

Two Motivating Examples

17

when operated properly, there will be no more than 3 unacceptable


sheets with 5 or more inclusions in each batch of 1000 sheets. The process will be uneconomical otherwise.
The question of interest is this: Does the QC engineer have a reason to be
concerned? Or, stated mathematically, if X is the design value of the number
of inclusions per sq. m. associated with the sheets produced by this process,
is there evidence in this sample data that X > X so that steps will have to
be taken to identify the source of this process performance degradation and
then to rectify the problem in order to improve the process performance?
As with the rst problem, the primary issue here is also the randomness associated with the variable of interest, X, the number of inclusions per square
meter of each glass sheet. The value observed for X in the QC lab is a randomly varying quantity, not xed and deterministic. In this case, however,
there is little or no contribution to the observed variability from the measurement device: these particulate inclusions are relatively few in number and
are easily counted without error by the optical device. The variability in raw
material characteristics, and in particular the control systems eectiveness in
maintaining the process conditions at desired values (in the face of inevitable
and unpredictable disturbances to the process) all contribute to whether or
not there are imperfections, how many there are per square meter, and where
they are located on the sheet. Some sheets come out awless while others end
up with a varying number of inclusions that cannot be predicted precisely `
apriori. Thus, once again, the observed variability must be dealt with directly
because it is intrinsic to the problem and cannot be idealized away.
Next, note that the data in Table 1.2, being counts of distinct entities,
take on integer values. The variable X is therefore said to be discrete, so
that this example illustrates decision-making when the random phenomena in
question involve discrete variables.

1.3

Outline of a Systematic Approach

Even though the two illustrative problems presented above are dierent in
so many ways (one involves continuous variables, the other a discrete variable;
one is concerned with comparing two entities to each other, the other pits a
single set of data against a design target), the systematic approach to solving
such problems provided by probability and statistics applies to both in a
unied way. The fundamental issues at stake may be stated as follows:
In light of its dening characteristics of intrinsic variability, how
should randomly varying quantities be characterized and quantied precisely in order to facilitate the solution of practical problems
involving them?

18

Random Phenomena

TABLE 1.3:

Group classication
and frequencies for YA data (from the
proposed process)
Relative
YA group Frequency Frequency
71.51-72.50
1
0.02
2
0.04
72.51-73.50
9
0.18
73.51-74.50
74.51-75.50
17
0.34
75.51-76.50
7
0.14
8
0.16
76.51-77.50
77.51-78.50
5
0.10
78.51-79.50
1
0.02
TOTAL

50

1.00

What now follows is a somewhat informal examination of the ideas and concepts behind these time-tested techniques. The purpose is to motivate and
provide context for the more formal discussions in upcoming chapters.

1.3.1

Group Classication and Frequency Distributions

Let us revisit the example data sets and consider the following alternative
approach to the data representation. Instead of focusing on individual observations as presented in the tables of raw data, what if we sub-divided the
observations into small groups (called bins) and re-organized the raw data
in terms of how frequently members of each group occur? One possible result
is shown in Tables 12.3 and 1.4 respectively for process A and process B. (A
dierent bin size will lead to a slightly dierent group classication but the
principles remain the same.)
This reclassication indicates, for instance, that for YA , there is only one
observation between 71.51 and 72.50 (the actual number is 72.30), but there
are 17 observations between 74.51 and 75.50; for YB on the other hand, 3
observations fall in the [67.51-68.50] group whereas there are 8 observations
between 69.51 and 70.50. The relative frequency column indicates what
proportion of the original 50 data points are found in each group. A plot of
this reorganization of the data, known as the histogram, is shown in Figure
12.8 for YA and Figure 1.2 for YB .
The histogram, a term rst used by Pearson in 1895, is a graphical representation of data from a group-classication and frequency-of-occurrence
perspective. Each bar represents a distinct group (or class) within the data
set, with the bar height proportional to the group frequency. Because this
graphical representation provides a picture of how the data are distributed
in terms of the frequency of occurrence of each group (how much each group

Two Motivating Examples

19

TABLE 1.4:

Group classication
and frequencies for YB data (from the
incumbent process)
Relative
YB group Frequency Frequency
66.51-67.50
1
0.02
3
0.06
67.51-68.50
68.51-69.50
4
0.08
8
0.16
69.51-70.50
4
0.04
70.51-71.50
71.51-72.50
7
0.14
4
0.08
72.51-73.50
6
0.12
73.51-74.50
74.51-75.50
5
0.10
6
0.12
75.51-76.50
0
0.00
76.51-77.50
77.51-78.50
2
0.04
0
0.00
78.51-79.50
TOTAL

50

1.00

18
16
14

Frequency

12
10
8
6
4
2
0

72

73

74

75
YA

76

77

78

FIGURE 1.1: Histogram for YA data

79

20

Random Phenomena
9
8
7

Frequency

6
5
4
3
2
1
0

68

70

72

74

76

78

YB

FIGURE 1.2: Histogram for YB data

contributes to the data set), it is often referred to as a frequency distribution


of the data.
A key advantage of such a representation is how clearly it portrays the
nature of the variability associated with each variable. For example, we easily
see from Fig 12.8 that the center of action for the YA data is somewhere
around the group whose bar is centered around 75 (i.e. in the interval [74.51,
75.50]). Furthermore, most of the values of YA cluster in the 4 central groups
centered around 74, 75, 76 and 77. In fact, 41 out of the 50 observations, or
82%, fall into these 4 groups; groups further away from the center of action
(to the left as well as to the right) contribute less to the YA data. Similarly,
Fig 1.2 shows that the center of action for the YB data is located somewhere
around the group in the [71.51, 72.50] interval but it is not as sharply dened
as it was with YA . Also the values of YB are more spread out and do not
cluster as tightly around this central group.
The histogram also provides quantitative insight. For example, we see that
38 of the 50 YA observations (or 76%) are greater than 74.51; only 13 out
of the 50 YB observations (or 26%) fall into this category. Also, exactly 0%
of YA observations are less than or equal to 71.50 compared with 20 out
of 50 observations (or a staggering 40%) of YB observations. Thus, if these
data sets can be considered as representative of the overall performance of
each process, then it is reasonable to conclude, for example, that there is a
better chance of obtaining yields greater than 74.50 with process A than with
process B (a 76% chance compared to a 26% chance). Similarly, while it is
highly unlikely that process A will ever return yields less than 71.50, there is
a not-insignicant chance (40%) that the yield obtained from process B will
be less than 71.50. What is thus beginning to emerge are the faint outlines of

Two Motivating Examples

21

TABLE 1.5:

Group
classication and frequencies for the
inclusions data
Relative
Frequency Frequency
X
0
22
0.367
23
0.383
1
11
0.183
2
3
1
0.017
4
2
0.033
1
0.017
5
6
0
0.000
TOTAL

60

1.000

a rigorous framework for characterizing and quantifying random variability,


with the histogram providing this rst glimpse.
It is important to note that the advantage provided by the histogram
comes at the expense of losing the individuality of each observation. Having
gone from 50 raw observations each to 8 groups for YA , and a slightly larger
12 groups for YB , there is clearly a loss of resolution: the individual identities
of the original observations are no longer visible from the histogram. (For
example, the identities of each of the 17 YA observations that make up the
group in the interval [74.51,75.50] have been melded into that of a single,
monolithic bar in the histogram.) But this is not necessarily a bad thing. As we
demonstrate in upcoming chapters, a fundamental tenet of the probabilistic
approach to dealing with randomly varying phenomena is an abandonment
of the individual observation as the basis for theoretical characterization, in
favor of an ensemble description. For now, it suces to be able to see from this
example that the clarity with which the histogram portrays data variability
has been achieved by trading o the individual observations identity for the
ensemble identity of groups. But keep in mind that what the histogram oers
is simply an alternative (albeit more informative) way of representing the same
identical information contained in the data tables.
Let us now return to the second problem. In this case, the group classication and frequency distribution for the raw inclusions data is shown in
Table 1.5. Let it not be lost on the reader that while the groups for the yield
data sets were created from intervals of nite length, no such quantization is
necessary for the inclusions data since in this case, the variable of interest,
X, is naturally discrete. This fundamental dierence between continuous variables (such as YA and YB ) and discrete variables (such as X) will continue to
surface at various stages in subsequent discussions.
The histogram for the inclusions data is shown in Fig 1.3 where several
characteristics are now clear: for example, 75% of the glass sheets (45 out of 60)
are either perfect or have only a single (almost inconsequential) inclusion; only

22

Random Phenomena
25

Frequency

20

15

10

Inclusions

FIGURE 1.3: Histogram of inclusions data

5% of the glass sheets (3 out of 60) have more than 3 inclusions, the remaining
95% have 3 or fewer; 93.3% (56 out of 60) have 2 or fewer inclusions. The
important point is that such quantitative characteristics of the data variability
(made possible by the histogram) is potentially useful for answering practical
questions about what one can reasonably expect from this process.

1.3.2

Theoretical Distributions

How can the benets of the histogram be consolidated into a useful tool
for quantitative analysis of randomly varying phenomena? The answer: by appealing to a fundamental axiom of random phenomena: that conceptually, as
more observations are made, the shape of the data histogram stabilizes, and
tends to the form of the theoretical distribution that characterizes the random
phenomenon in question, in the limit as the total number of observations approaches innity. It is important to note that this concept does not necessarily
require that an innite number of observations actually be obtained in practice, even if this were possible. The essence of the concept is that an underlying
theoretical distribution exists for which the frequency distribution represented
by the histogram is but a nite sample approximation; that the underlying theoretical distribution is an ideal model of the particular phenomenon
responsible for generating the nite number of observations contained in the
current data set; and hence that this theoretical distribution provides a reasonable mathematical characterization of the random phenomenon.
As we show later, these theoretical distributions may be derived from rst
principles given sucient knowledge regarding the underlying random phenomena. And, as the brief informal examination of the illustrative histograms

Two Motivating Examples

23

above indicates, these theoretical distributions can be used for various things.
For example, even though we have not yet provided any concrete denition
of the term probability, neither have we given any concrete justications of
its usage in this context, still from the discussion in the previous section, the
reader can intuitively attest to the reasonableness of the following statements:
the probability that YA 74.5 is 0.76; or the probability that YB 74.5
is 0.26; or the probability that X 1 is 0.75. Parts II and III are
devoted to establishing these ideas more concretely and more precisely.
A Preview
It turns out that the theoretical distribution for each yield data set is:
f (y|, ) =

(y)2
1
e 22 ; < y <
2

(1.3)

which, when superimposed on each histogram, is shown in Fig 1.4 for YA , and
Fig 1.5 for YB , when the indicated characteristic parameters are specied
as = 75.52, = 1.43 for YA , and = 72.47, = 2.76 for YB .
Similarly, the theoretical distribution for the inclusions data is:
e x
; x = 0, 1, 2, . . .
(1.4)
x!
where the characteristic parameter = 1.02 is the average number of inclusions in each glass sheet. In similar fashion to Eq 4.155, it also provides
a theoretical characterization and quantication of the random phenomenon
responsible for the variability observed in the inclusions data. From it we
are able, for example, to compute the theoretical probabilities of observing
0, 1, 2, . . ., inclusions in any one glass sheet manufactured by this process. A
plot of this theoretical probability distribution function is shown in Fig 22.41
(compare with the histogram in Fig 1.3).
The full detail of precisely what all this means is discussed in subsequent
chapters; for now, this current brief preview serves the purpose of simply indicating how the expression in Eqs 4.155 and 4.40 provide a theoretical means
of characterizing (and quantifying) the random phenomenon involved respectively in the yield data and in the inclusions data. Expressions such as this are
called probability distribution functions (pdfs) and they provide the basis
for rational analysis of random variability via the concept of probability.
Precisely what this concept of probability is, how it gives rise to pdfs, and
how pdfs are used to solve practical problems and provide answers to the sorts
of questions posed by these illustrative examples, constitute the primary focus
of the remaining chapters in the book.
At this point, it is best to defer the rest of the discussion until when we
revisit these two problems at appropriate places in upcoming chapters where
we show that:
f (x|) =

1. YA indeed may be considered as greater than YB , and in particular,


that YA YB > 2, up to a specic, quantiable degree of condence,

24

Random Phenomena




Histogram of YA
Normal
18

Mean
StDev
N

16

75.52
1.432
50

14
Frequency

12
10
8
6
4
2
0

72

73

74

75

76

77

78

79

YA


FIGURE 1.4: Histogram for YA data with superimposed theoretical distribution

Histogram of YB
Normal
9

Mean
StDev
N

72.47
2.764
50

7
Frequency

6
5
4
3
2
1
0

68

70

72
YB

74

76

78

FIGURE 1.5: Histogram for YB data with superimposed theoretical distribution

Two Motivating Examples

25

Distribution Plot
Poisson, Mean=1.02
0.4

Probability

0.3

0.2

0.1

0.0

FIGURE 1.6: Theoretical probability distribution function for a Poisson random variable with parameter = 1.02. Compare with the inclusions data histogram in Fig 1.3

2. There is in fact no evidence in the inclusions data to suggest that the


process has deviated from its design target; i.e. that there is no reason
to believe that X = X , again up to a specic, quantiable degree of
condence.

1.4

Summary and Conclusions

We have introduced two practical problems in this chapter to illustrate


the complications caused by the presence of randomly varying phenomena in
engineering problems. One problem involved determining which of two continuous variables is larger; the other involved determining if a discrete variable
has deviated from its design target. Without the presence of random variability, each problem would ordinarily have been trivial to solve. However, with
intrinsic variability that could not be idealized away, it became clear that special techniques capable of coping explicitly with randomly varying phenomena
would be required to solve these problems satisfactorily. We did not solve the
problems, of course (that is reserved for later); we simply provided an outline of a systematic approach to solving them, which required introducing
some concepts that are to be explored fully later. As a result, the very brief
introduction of the frequency distribution, the graphical histogram, and the
theoretical distribution function was intended to serve merely as a preview of

26

Random Phenomena

upcoming detailed discussions concerning how randomly varying phenomena


are analyzed systematically.
Here are some of the main points of the chapter again:
The presence of random variability often complicates otherwise straightforward problems so that specialized solution techniques are required;
Frequency distributions and histograms provide a particularly informative perspective of random variations intrinsic to experimental data;
The probability distribution function the theoretical limit to which
the frequency distribution (and histogram) tends provides the basis
for systematic analysis of randomly varying phenomena.

REVIEW QUESTIONS
1. What decision is to be made in the yield improvement problem of Section 1.1?
2. What are the economic factors to be taken into consideration in deciding what
to do with the yield improvement problem?
3. What is the essence of the yield improvement problem as discussed in Section
1.1?
4. What are some of the sources of variability associated with the process yields?
5. Why are the yield variables, YA and YB , continuous variables?
6. What single value is suggested as intuitive for representing a collection of n
data points, x1 , x2 , . . . , xn ?
7. What are some of the issues raised by entertaining the idea of representing the
yield data sets with the arithmetic averages yA and yB ?
8. Why is the number of inclusions found on each glass sheet a discrete variable?
9. What are some sources of variability associated with the glass manufacturing process which may ultimately be responsible for the variability observed in the number
of inclusions?
10. What is a frequency distribution and how is it obtained from raw data?
11. Why will bin size aect the appearance of a frequency distribution?
12. What is a histogram and how is it obtained from data?
13. What is the primary advantage of a histogram over a table of raw data?

Two Motivating Examples

27

14. What is the relationship between a histogram and a theoretical distribution?


15. What are the expressions in Eqs (4.155) and (4.40) called? These equations
provide the basis for what?

EXERCISES
Section 1.1
1.1 The variance of a collection of n data points, y1 , y2 , . . . , yn , is dened as:
n
)2
i=1 (yi y
(1.5)
s2 =
n1
where y is the arithmetic average of the data set. From the yield data in Table 1.1,
obtain the variances s2A and s2B for the YA and YB data sets, respectively. Which is
greater, s2A or s2B ?
1.2 Even though the data sets in Table 1.1 were not generated in pairs, obtain the
50 dierences,
di = yAi yBi ; i = 1, 2, . . . , 50,
(1.6)
for corresponding values of YA and YB as presented in this table. Obtain a histogram
of di and compute the arithmetic average,
n
1
di .
d =
n i=1

(1.7)

What do these results suggest about the possibility that YA may be greater than YB ?
1.3 A set of theoretical results to be established later (see Chapter 4 Exercises) state
that, for di and d dened in Eq (1.7), and variance s2 dened in Exercise 1,
d =
s2d

yA yB

(1.8)

s2A + s2B

(1.9)

Conrm these results specically for the data in Table 1.1.


Section 1.2
1.4 From the data in Table 1.2, obtain s2x , the variance of the inclusions.
1.5 The random variable, X, representing the number of inclusions, is purported
to be a Poisson random variable (see Chapter 8). If true, then the average, x
, and
variance, s2x , are theoretically equal. Compare the values computed for these two
quantities from the data set in Table 1.2. What do these results suggest about the
possibility that X may in fact be a Poisson random variable?
Section 1.3
1.6 Using a bin size of 0.75, obtain relative frequencies for YA and YB data and the
corresponding histograms. Repeat this exercise for a bin size of 2.0. Compare these

28

Random Phenomena

two sets of histograms with the corresponding histograms in Figs 12.8 and 1.2.
1.7 From the frequency distribution in Table 12.3 and the values computed for the
average, yA , and variance, s2A of the yield data set, YA , determine the percentage of
the data contained in the interval yA 1.96sA , where sA is the positive square root
of the variance, s2A .
1.8 Repeat Exercise 1.7 for the YB data in Table 1.4. Determine the percentage of
the data contained in the interval yB 1.96sB .
1.9 From Table 1.5 determine the value of x such that only 5% of the data exceeds
this value.
1.10 Using = 75.52 and = 1.43, compute theoretical values of the function in
Eq 4.155 at the center points of the frequency groups for the YA data in Table 12.3;
i.e., for y = 72, 73, . . . , 79. Compare these theoretical values with the corresponding
relative frequency values.
1.11 Repeat Exercise 1.10 for YB data and Table 1.4.
1.12 Using = 1.02, compute theoretical values of the function f (x|) in Eq 4.40
at x = 0, 1, 2, . . . 6 and compare with the corresponding relative frequency values in
Table 1.5.

APPLICATION PROBLEMS
1.13 The data set in the table below is the time (in months) from receipt to publication (sometimes known as time-to-publication) of 85 papers published in the January
2004 issue of a leading chemical engineering research journal.
19.2
9.0
17.2
8.2
4.5
13.5
20.7
7.9
19.5
8.8
18.7
7.4
9.7
13.7
8.1
8.4
10.8

15.1
5.3
12.0
3.0
18.5
5.8
6.8
14.5
3.3
11.1
16.4
7.3
7.4
7.3
5.2
10.2
3.1

9.6
12.9
17.3
6.0
24.3
21.3
19.3
2.5
9.1
8.1
9.8
15.4
15.7
8.2
8.8
7.2
12.8

4.2
4.2
7.8
9.5
3.9
8.7
5.9
5.3
1.8
10.1
10.0
18.7
5.6
3.3
7.3
11.3
2.9

5.4
15.2
8.0
11.7
17.2
4.0
3.8
7.4
5.3
10.6
15.2
11.5
5.9
20.1
12.2
12.0
8.8

(i) Generate a histogram of this data set. Comment on the shape of this histogram

Two Motivating Examples

29

and why, from the nature of the variable in question, such a shape may not be
surprising.
(ii) From the histogram of the data, what is the most popular time-to-publication,
and what fraction of the papers took longer than this to publish?
1.14 Refer to Problem 1.13. Let each raw data entry in the data table be xi .
(i) Generate a set of 85 sample average publication time, yi , from 20 consecutive
times as follows:
y1

20
1 
xi
20 i=1

(1.10)

y2

21
1 
xi
20 i=2

(1.11)

y3

...

yj

1 
xi
20 i=3
...
20+(j1)

1
22

20

(1.12)

xi

(1.13)

i=j

For values of j 66, yj should be obtained by replacing x86 , x87 , x88 , . . . , which do
not exist, with x1 , x2 , x3 , . . . , respectively (i.e., for these purposes treat the given
xi data like a circular array). Plot the histogram for this generated yi data and
compare the shape of this histogram with that of the original xi data.
(ii) Repeat part (i) above, this time for zi data generated from:
zj =

1
20

20+(j1)

yi

(1.14)

i=j

for j = 1, 2, . . . , 85. Compare the histogram of the zi data with that of the yi data
and comment on the eect of averaging on the shape of the data histograms.
1.15 The data shown in the table below is a four-year record of the number of
recordable safety incidents occurring at a plant site each month.
1
0
2
0

0
1
2
1

0
0
0
0

0
1
1
0

2
0
2
0

2
0
0
0

0
0
1
0

0
0
2
0

0
0
1
1

1
0
1
0

0
0
0
0

1
1
0
1

(i) Find the average number of safety incidents per month and the associated variance. Construct a frequency table of the data and plot a histogram.
(ii) From the frequency table and the histogram, what can you say about the
chances of obtaining each of the following observations, where x represents the
number of observed safety incidents per month: x = 0, x = 1, x = 2, x = 3, x = 4
and x = 5?
(iii) Consider the postulate that a reasonable model for this phenomenon is:
f (x) =

e0.5 0.5x
x!

(1.15)

30

Random Phenomena

where f (x) represents the theoretical probability of recording exactly x safety


incidents per month. How well does this model t the data?
(iv) Assuming that this is a reasonable model, discuss how you would use it to
answer the question: If, over the most recent four-month period, the plant recorded
1, 3, 2, 3 safety incidents respectively, is there evidence that there has been a real
increase in the number of safety incidents?
1.16 The table below shows a record of the before and after weights (in pounds)
of 20 patients enrolled in a clinically-supervised ten-week weight-loss program.
Patient #
Before Wt (lbs)
After Wt (lbs)
Patient #
Before Wt (lbs)
After Wt (lbs)

1
272
263
11
215
206

2
319
313
12
245
235

3
253
251
13
248
237

4
325
312
14
364
350

5
236
227
15
301
288

6
233
227
16
203
195

7
300
290
17
197
193

8
260
251
18
217
216

9
268
262
19
210
202

10
276
263
20
223
214

Let XB represent the Before weight and XA the After weight.


(i) Using the same bin size for each data set, obtain histograms for the XB and XA
data and plot both on the same graph. Strictly on the basis of a visual inspection
of these histograms, what can you say about the eectiveness of the weight-loss
program in achieving its objective of assisting patients to lose weight?
(ii) Dene the dierence variable, D = XB XA , and from the given data, obtain
and plot a histogram for this variable. Again, strictly from a visual inspection of this
histogram, what can you say about the eectiveness of the weight-loss program?
1.17 The data shown in the following table is from an Assisted Reproductive
Technologies clinic where a cohort of 100 patients under the age of 35 years (the
Younger group), and another cohort, 35 years and older (the Older group), each
received ve embryos in an in-vitro fertilization (IVF) treatment cycle.
x
No. of live
births in a
delivered
pregnancy
0
1
2
3
4
5

yO
Total no. of
older patients
(out of 100)
with pregnancy outcome x
32
41
21
5
1
0

yY
Total no. of
younger patients
(out of 100)
with pregnancy outcome x
8
25
35
23
8
1

The data shows x, the number of live births per delivered pregnancy, along
with how many in each group had the pregnancy outcome of x. For example, the
rst entry indicates that the IVF treatment was unsuccessful for 32 of the older
patients, with the corresponding number being 8 for the younger patients; 41
older patients delivered singletons, compared with 25 for the younger patients; 21
older patients and 35 younger patients each delivered twins; etc. Obtain a relative
frequency distribution for these data sets and plot the corresponding histograms.
Determine the average number of live births per delivered pregnancy for each group

Two Motivating Examples

31

and compare these values. Comment on whether or not these data sets indicate that
the outcomes of the IVF treatments are dierent for these two groups.

32

Random Phenomena

Chapter 2
Random Phenomena, Variability and
Uncertainty

2.1

2.2

2.3

2.4
2.5

Two Extreme Idealizations of Natural Phenomena . . . . . . . . . . . . . . . . . . . . . . .


2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 A Chemical Engineering Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determinism and the PFR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Randomness and the CSTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theoretical Analysis of the Ideal CSTRs Residence Time . . . .
Random Mass Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Dening Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Variability and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Practical Problems of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Interpreting Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
Classical (A-Priori)
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
Relative Frequency (A-Posteriori)
Probability . . . . . . . . . . . . . . . . .
Subjective Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Probabilistic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34
34
35
35
37
37
41
41
42
42
43
43
44
45
46
46
47
48
49
50
51

Through the great benecence of Providence,


what is given to be foreseen in the general sphere of masses
escapes us in the conned sphere of individuals.
Joann`e-Erhard Valentin-Smith (17961891)

When John Stuart Mills stated in his 1862 book, A System of Logic: Ratiocinative and Inductive, that ...the very events which in their own nature
appear most capricious and uncertain, and which in any individual case no
attainable degree of knowledge would enable us to foresee, occur, when considerable numbers are taken into account, with a degree of regularity approaching
to mathematical ..., he was merely articulatingastutely for the timethe
then-radical, but now well-accepted, concept that randomness in scientic observation is not a synonym for disorder; it is order of a dierent kind. The more
familiar kind of order informs determinism: the concept that, with sucient
mechanistic knowledge, all physical phenomena are entirely predictable and
thus describable by precise mathematical equations. But even classical physics,
that archetypal deterministic science, had to make room for this other kind
33

34

Random Phenomena

of order when quantum physicists of the 1920s discovered that fundamental


particles of nature exhibit irreducible uncertainty (or chance) in their locations, movements and interactions. And today, most contemporary scientists
and engineers are, by training, conditioned to accept both determinism and
randomness as intrinsic aspects of the experiential world. The problem, however, is that to many, the basic characteristics of random phenomena and their
order of a dierent kind still remain somewhat unfamiliar at a fundamental
level.
This chapter is devoted to an expository examination of randomly varying
phenomena. Its primary purpose is to introduce the reader to the central
characteristic of order-in-the-midst-of-variability, and the sort of analysis this
trait permits, before diving headlong into a formal study of probability and
statistics. The premise of this chapter is that a true appreciation of the nature
of randomly varying phenomena at a fundamental level is indispensable to the
sort of clear understanding of probability and statistics that will protect the
diligent reader from all-too-common misapplication pitfalls.

2.1
2.1.1

Two Extreme Idealizations of Natural Phenomena


Introduction

In classical physics, the distance, x, (in meters) traveled in t seconds by


an object launched with an initial velocity, u m/s, and which accelerates at a
m/s2 , is known to be given by the expression:
1
(2.1)
x = ut + at2
2
This is a deterministic expression: it consistently and repeatably produces
the same result every time identical values of the variables u, a, and t are
specied. The same is true for the expression used in engineering to determine
Q, the rate of heat loss from a house, say in the middle of winter, when
the total exposed surface area is A m2 , the inside temperature is Ti K, the
outside temperature is To K, and the combined heat transfer characteristics of
the house walls, insulation, etc., is represented by the so-called overall heat
transfer coecient U , W/m2 K, i.e.
Q = U A(Ti To )

(2.2)

The rate of heat loss is determined precisely and consistently for any given
specic values of each entity on the right hand side of this equation.
The concept of determinism, that the phenomenon in question is precisely
determinable in every relevant detail, is central to much of science and engineering and has proven quite useful in analyzing real systems, and in solving practical problemswhether it is computing the trajectory of rockets for

Random Phenomena, Variability and Uncertainty

35

launching satellites into orbit, installing appropriate insulation for homes, or


designing chemical reactors . However, any assumption of strict determinism
in nature is implicitly understood as a convenient idealization resulting from
neglecting certain details considered non-essential to the core problem. For example, the capriciousness of the wind and its various and sundry eects have
been ignored in Eqs 16.2 and 2.2: no signicant wind resistance (or assistance)
in the former, negligible convective heat transfer in the latter.
At the other extreme is randomness, where the relevant details of the
phenomenon in question are indeterminable precisely; repeated observations
under identical conditions produce dierent and randomly varying results;
and the observed random variability is essential to the problem and therefore
cannot be idealized away. Such is the case with the illustrative problems in
Chapter 1 where in one case, the yield obtained from each process may be
idealized as follows:
yAi
yBi

=
=

A + Ai
B + Bi

(2.3)
(2.4)

with A and B representing the true but unknown yields obtainable from
processes A and B respectively, and Ai and Bi representing the superimposed
randomly varying componentthe sources of the random variability evident
in each observation yAi and yBi . Identical values of A do not produce identical
values of yAi in Eq (2.3); neither will identical values of B produce identical
values of yBi in Eq (2.4). In the second case of the glass process and the
number of inclusions per square meter, the idealization is:
xi = + i

(2.5)

where is the true number of inclusions associated with the process and i is
the superimposed random component responsible for the observed randomness
in the actual number of inclusion xi found on each individual glass sheet upon
inspection.
These two perspectives, determinism and randomness, are thus two
opposite idealizations of natural phenomena, the former when deterministic
aspects of the phenomenon are considered to be overwhelmingly dominant
over any random components, the latter case when the random components
are dominant and central to the problem. The principles behind each conceptual idealization, and the analysis technique appropriate to each, are now
elucidated with a chemical engineering illustration.

2.1.2

A Chemical Engineering Illustration

Residence time , the amount of time a uid element spends in a chemical


reactor, is an important parameter in the design of chemical reactors. We wish
to consider residence times in two classic reactor congurations: the plug ow
reactor (PFR) and the continuous stirred tank reactor (CSTR).

36

Random Phenomena

C0 G(t)
Fluid
element
F m3/s
lm

FIGURE 2.1: Schematic diagram of a plug ow reactor (PFR).


Determinism and the PFR
The plug ow reactor (PFR) is a hollow tube in which reactants that are
introduced at one end react as the uid elements traverse the length of the
tube and emerge at the other end. The name comes from the idealization that
uid elements move through as plugs with no longitudinal mixing (see Fig
2.1).
The PFR assumptions (idealizations) may be stated as follows:
the reactor tube (l m long) has a uniform cross-sectional area, A m2 ;
uid elements move in plug ow with a constant velocity, v m/s, so
that the velocity prole is at;
the ow rate through the reactor is constant at F m3 /s
Now consider that at time t = 0 we instantaneously inject a bolus of red dye
of concentration C0 moles/m3 into the inlet stream. The following question
is of interest in the study of residence time distributions in chemical reactor
design:
How much time does each molecule of red dye spend in the reactor,
if we could label them all and observe each one at the reactor exit?
Because of the plug ow idealization, each uid element moves through the
reactor with a constant velocity given by:
 
F
v=
m/s
(2.6)
A
and it will take precisely
l
= =
v

lA
F


secs

(2.7)

for each dye element to traverse the reactor. Hence, , the residence time for
an ideal plug ow reactor (PFR) is a deterministic quantity because its value
is exactly and precisely determinable from Eq (2.7) given F, A and l.
Keep in mind that the determinism that informs this analysis of the PFR

Random Phenomena, Variability and Uncertainty

37

C0 G(t)

F m3/s

Volume
V m3

FIGURE 2.2: Schematic diagram of a continuous stirred tank reactor (CSTR).

residence time arises directly as a consequence of the central plug ow idealization. Any departures from such idealization, especially the presence of
signicant axial dispersion (leading to a non-at uid velocity prole), will
result in dye molecules no longer arriving at the outlet at precisely the same
time.
Randomness and the CSTR
With the continuous stirred tank reactor (CSTR), the reactant stream
continuously ows into a tank that is vigorously stirred to ensure uniform
mixing of its content, while the product is continuously withdrawn from the
outlet (see Fig 2.2). The assumptions (idealizations) in this case are:
the reactor tank has a xed, constant volume, V m3 ;
the contents of the tank are perfectly mixed.
Once again, let us consider that a bolus of red dye of concentration C0
moles/m3 is instantaneously injected into the inlet stream at time t = 0; and
again, ask: how much time does each molecule of red dye spend in the reactor?
Unlike with the plug ow reactor, observe that it is impossible to answer this
question `
a-priori, or precisely: because of the vigorous stirring of the reactor
content, some dye molecules will exit almost instantaneously; others will stay
longer, some for a very long time. In fact, it can be shown that theoretically,
0 < < . Hence in this case, , the residence time, is a randomly varying
quantity that can take on a range of values from 0 to ; it cannot therefore be
adequately characterized as a single number. Notwithstanding, as all chemical
engineers know, the random phenomenon of residence times for ideal CSTRs
can, and has been, analyzed systematically (see for example, Hill, 19771).
1 C.G. Hill, Jr, An Introduction to Chemical Engineering Kinetics and Reactor Design,
Wiley, NY, 1977, pp 388-396.

38

Random Phenomena

Theoretical Analysis of the Ideal CSTRs Residence Time


Even though based on chemical engineering principles, the results of the
analysis we are about to discuss have fundamental implications for the general
nature of the order present in the midst of random variability encountered in
other applications, and how such order provides the basis for analysis. (As
an added bonus, this analysis also provides a non-probabilistic view of ideas
usually considered the exclusive domain of probability).
By carrying out a material balance around the CSTR, (i.e., that the rate
of accumulation of mass within a prescribed volume must equal the dierence
between the rate of input and the rate of output) it is possible to develop
a mathematical model for this process as follows: If the volumetric ow rate
into and out of the reactor are equal and given by F m3 /s, if C(t) represents
the molar concentration of the dye in the well-mixed reactor, then by the
assumption of perfect mixing, this will also be the dye concentration at the
exit of the reactor. The material balance equation is:
V

dC
= F Cin F C
dt

(2.8)

where Cin is the dye concentration in the inlet stream. If we dene the parameter as
V
=
(2.9)
F
and note that the introduction of a bolus of dye of concentration C0 at t = 0
implies:
(2.10)
Cin = C0 (t)
where (t) is the Dirac delta function, then Eq (2.8) becomes:

dC
= C + C0 (t)
dt

(2.11)

a simple, linear rst order ODE whose solution is:


C(t) =

C0 t/
e

(2.12)

If we now dene as f (), the instantaneous fraction of the initial number of


injected dye molecules exiting the reactor at time t = (those with residence
time ), i.e.
C()
(2.13)
f () =
C0
we obtain immediately from Eq (2.12) that
f () =

1 /
e

(2.14)

recognizable to all chemical engineers as the familiar exponential instantaneous residence time distribution function for the ideal CSTR. The reader

Random Phenomena, Variability and Uncertainty

39

0.20

f T

0.15

0.10

0.05

0.00

10

15

20

25

30

35

FIGURE 2.3: Instantaneous residence time distribution function for the CSTR:
(with = 5).
should take good note of this expression: it shows up a few more times and
in various guises in subsequent chapters. For now, let us observe that, even
though (a) the residence time for a CSTR, , exhibits random variability,
potentially able to take on values between 0 and (and is therefore not describable by a single value); so that (b) it is therefore impossible to determine
with absolute certainty precisely when any individual dye molecule will leave
the reactor; even so (c) the function, f (), shown in Eq (4.41), mathematically
characterizes the behavior of the entire ensemble of dye molecules, but in a
way that requires some explanation:
1. It represents how the residence times of uid particles in the well-mixed
CSTR are distributed over the range of possible values 0 < <
(see Fig 2.3).
2. This distribution of residence times is a well-dened, well-characterized
function, but it is not a description of the precise amount of time a particular individual dye molecule will spend in the reactor; rather it is a
description of how many (or what fraction) of the entire collection of
dye molecules will spend what amount of time in the reactor. For example, in broad terms, it indicates that a good fraction of the molecules
have relatively short residence times, exiting the reactor quickly; a much
smaller but non-zero fraction have relatively long residence times. It can
also provide more precise statements as follows.
3. From this expression (Eq (4.41)), we can determine the fraction of dye
molecules that have remained in the reactor for an amount of time less
than or equal to some time t, (i.e. molecules exiting the reactor with

40

Random Phenomena
age less than or equal to t): we do this by integrating f () with respect
to , as follows, to obtain
 t


1 /
F (t) =
e
d = 1 et/
(2.15)
0
from which we see that F (0), the fraction of dye molecules with age less
than or equal to zero is exactly zero: indicating the intuitively obvious
that, no matter how vigorous the mixing, each dye molecule spends at
least a nite, non-zero, amount of time in the reactor (no molecule exits
instantaneously upon entry).
On the other hand, F () = 1, since

1 /
e
F () =
d = 1

(2.16)

again indicating the obvious: if we wait long enough, all dye molecules
will eventually exit the reactor as t . In other words, the fraction
of molecules exiting the reactor with age less than is exactly 1.
4. Since the fraction of molecules that will have remained in the reactor
for an amount of time less than or equal to t is F (t), and the fraction
that will have remained in the reactor for less than or equal to t + t
is F (t + t), then the fraction with residence time in the innitesimal
interval between t and t + t) is given by:


t+t

[t (t + t)] = F (t + t) F (t) =
t

1 /
e
d

(2.17)

which, for very small t, simplies to:


[t (t + t)] f (t)t

(2.18)

5. And nally, the average residence time may be determined from the
expression in Eq (4.41) (and Eq (2.16)) as:

1 /
1
/
e
d
d

0
0 e
=
= 1 /
=
(2.19)
1
d
0 e
where the numerator integral is evaluated via integration by parts. Observe from the denition of above (in Eq (2.9)) that this result makes
perfect sense, strictly from the physics of the problem: particles in a
stream owing at the rate F m3 /s through a well-mixed reactor of volume V m3 , will spend an average of V /F = seconds in the reactor.
We now observe in conclusion two important points: (i) even though at no
point in the preceding discussion have we made any overt or explicit appeal

Random Phenomena, Variability and Uncertainty

41

to the concepts of probability, the unmistakable ngerprints of probability are


evident all over (as upcoming chapters demonstrate concretely, but perhaps
already recognizable to those with some familiarity with such concepts); (ii)
Nevertheless, this characterizing model in Eq (4.41) was made possible via
rst-principles knowledge of the underlying phenomenon. This is a central
characteristic of random phenomena: that appropriate theoretical characterizations are almost always possible in terms of ideal ensemble models of the
observed variability dictated by the underlying phenomenological mechanism.

2.2
2.2.1

Random Mass Phenomena


Dening Characteristics

In such diverse areas as actuarial science, biology, chemical reactors, demography, economics, nance, genetics, human mortality, manufacturing quality assurance, polymer chemistry, etc., one repeatedly encounters a surprisingly common theme whereby phenomena which, on an individual level, appear entirely unpredictable, are well-characterized as ensembles (as demonstrated above with residence time distribution in CSTRs). For example, as
far back as 1662, in a study widely considered to be the genesis of population
demographics and of modern actuarial science by which insurance premiums
are determined today, the British haberdasher, John Graunt (1620-1674), had
observed that the number of deaths and the age at death in London were surprisingly predictable for the entire population even though it was impossible to
predict which individual would die when and in what manner. Similarly, while
the number of monomer molecules linked together in any polymer molecule
chain varies considerably, how many chains of a certain length a batch of
polymer product contains can be characterized fairly predictably.
Such natural phenomena noted above have come to be known as Random
Mass Phenomena, with the following dening characteristics:
1. Individual observations appear irregular because it is not possible to
predict each one with certainty; but
2. The ensemble or aggregate of all possible outcomes is regular, wellcharacterized and determinable;
3. The underlying phenomenological mechanisms accounting for the nature and occurrence of the specic observations determines the character of the ensemble;
4. Such phenomenological mechanisms may be known mechanistically (as
was the case with the CSTR), or its manifestation may only be deter-

42

Random Phenomena
mined from data (as was the case with John Graunts mortality tables
of 1662).

This fortunate circumstanceaggregate predictability amidst individual


irregularitiesis why the primary issue with random phenomena analysis boils
down to how to use ensemble descriptions and characterization to carry out
systematic analysis of the behavior of individual observations.

2.2.2

Variability and Uncertainty

While ensemble characterizations provide a means of dealing systematically with random mass phenomena, many practical problems still involve
making decisions about specic, inherently unpredictable, outcomes. For example, the insurance company still has to decide what premium to charge each
individual on a person-by-person basis. When decisions must be made about
specic outcomes of random mass phenomena, uncertainty is an inevitable
consequence of the inherent variability. Furthermore, the extent or degree
of variability directly aects the degree of uncertainty: tighter clustering of
possible outcomes implies less uncertainty, whereas a broader distribution of
possible outcomes implies more uncertainty. The most useful mathematical
characterization of ensembles must therefore permit not only systematic analysis, but also a rational quantication of the degree of variability inherent in
the ensemble, and the resulting uncertainty associated with each individual
observation as a result.

2.2.3

Practical Problems of Interest

Let xi represent individual observations, i = 1, 2, . . . , n, from a random


mass phenomenon; let X be the actual variable of interest, dierent and distinct from xi , this latter being merely one out of many other possible realizations of X. For example, X can be the number of live births delivered by a
patient after a round of in-vitro fertilization treatment, a randomly varying
quantity; whereas xi = 2 (i.e. twins) is the specic outcome observed for a
specic patient after a specic round of treatment. For now, let the aggregate
description we seek be represented as f (x) (see for example, Eq (4.41) for the
CSTR residence time); what this is and how it is obtained is discussed later.
In practice, only data in the form of {xi }ni=1 observations is available. The
desired aggregate description, f (x), must be understood in its proper context
as a descriptor of the (possibly innite) collection of all possible outcomes of
which the observed data is only a sample. The fundamental problems of
random phenomena analysis may now be stated formally as follows:
1. Given {xi }ni=1 what can we say about the complete f (x)?
2. Given f (x) what can we say about the specic xi values (both the observed {xi }ni=1 and the yet unobserved)?

Random Phenomena, Variability and Uncertainty

43

Embedded in these questions are the following aliated questions that arise
as a consequence: (a) how was {xi }ni=1 obtained in (1); will the procedure for
obtaining the data aect how well we can answer question 1? (b) how was
f (x) determined in (2)?
Subsequent chapters are devoted to dealing with these fundamental problems systematically and in greater detail.

2.3
2.3.1

Introducing Probability
Basic Concepts

Consider the prototypical random phenomenon for which the individual


observation (or outcome) is not known with certainty `
a-priori, but the complete totality of all possible observations (or outcomes) has been (or can be)
compiled. Now consider a framework that assigns to each individual member
of this collection of possible outcomes, a real-valued number between 0 and 1
that represents the probability of its occurrence, such that:
1. an outcome that is certain to occur is assigned the number 1;
2. an outcome that is certain not to occur is assigned the number 0;
3. any other outcome falling between these two extremes is assigned a
number that reects the extent or degree of certainty (or uncertainty)
associated with its occurrence.
Notice how this represents a shift in focus from the individual outcome itself
to the probability of its occurrence. Using precise denitions and terminology,
along with tools of set theory, set functions and real analysis, we show in the
chapters in Part II how to develop the machinery for the theory of probability,
and the emergence of a compact functional form indicating how the probabilities of occurrence are distributed over all possible outcomes. The resulting
probability distribution function becomes the primary vehicle for analyzing
the behavior of random phenomena.
For example, the phenomenon of inclusions in manufactured glass sheets
discussed in Chapter 1 is well-characterized by the following probability distribution function (pdf) which indicates the probability of observing exactly
x inclusions on a glass sheet as
f (x) =

e x
; x = 0, 1, 2, . . .
x!

(2.20)

a pdf with a single parameter, , characteristic of the manufacturing process


used to produce the glass sheets. (As shown later, is the mean number of

44

Random Phenomena

TABLE 2.1:

Computed
probabilities of occurrence of
various number of inclusions
for = 2 in Eq (9.2)
x = No of f (x) prob of
inclusions occurrence
0
0.135
0.271
1
2
0.271
3
0.180
0.090
4
5
0.036
..
..
.
.
0.001
8
9
0.000

inclusions on a glass sheet.) Even though we do not know precisely how


many inclusions will be found on the next glass sheet inspected in the QC
lab, given the parameter , we can use Eq (9.2) to make statements about
the probabilities of individual occurrences. For instance, if = 2 for a certain
process, Eq (9.2) allows us to state that the probability of nding a perfect
glass sheet with no inclusions in the products made by this process (i.e.
x = 0) is 0.135; or that the probability of nding 1 inclusion is 0.227,
coincidentally the same as the probability of nding 2 inclusions; or that
there is a vanishingly small probability of nding 9 or more inclusions in
this production facility. The complete set of probabilities computed from Eq
(9.2) is shown in Table 2.1.

2.3.2

Interpreting Probability

There always seems to be a certain amount of debate over the meaning,


denition and interpretation of probability. This is perhaps due to a natural
predisposition towards confusing a conceptual entity with how a numerical
value is determined for it. For example, from a certain perspective, temperature, as a conceptual entity in Thermodynamics, is a real number assigned
to an object to indicate its degree of hotness; it is distinct from how its
value is determined (by a thermometer, thermocouple, or any other means).
The same is true of mass, a quantity assigned in Mechanics to a body to
indicate how much matter it contains and how heavy it will be in a gravitational eld ; or distance assigned in geometry to indicate the closeness of
two points in a geometric space. The practical problem of how to determine
numerical values for these quantities, even though important in its own right,
is a separate issue entirely.
This is how probability should be understood: it is simply a quantity that

Random Phenomena, Variability and Uncertainty

45

is assigned to indicate the degree of uncertainty associated with the occurrence of a particular outcome. As with temperature the conceptual quantity,
how a numerical value is determined for the probability of the occurrence
of a particular outcome under any specic circumstance depends on the circumstance itself. To carry the analogy with temperature a bit further: while
a thermometer capable of determining temperature to within half a degree
will suce in one case, a more precise device, such as a thermocouple, may
be required in another case, and an optical pyrometer for yet another case.
Whatever the case, under no circumstance should the device employed to determine its numerical value usurp the role of, or become the surrogate for,
temperature the quantity. This is important in properly interpreting probability, the conceptual entity: how an appropriate value is to be determined
for probability, an important practical problem in its own right, should not
be confused with the quantity itself.
With these ideas in mind, let us now consider several standard perspectives
of probability that have evolved over the years. These are best understood as
various techniques for how numerical values are determined rather than what
probability is.
`
Classical (A-Priori)
Probability
Consider a random phenomenon for which the total number of possible
outcomes is known to be N , all of which are equally likely; of these, let NA
be the number of outcomes in which A is observed (i.e. outcomes that are
favorable to A). Then according to the classical (or `
a-priori) perspective,
the probability of the occurrence of outcome A is dened as
P (A) =

NA
N

(2.21)

For example, in tossing a single perfect die once, the probability of observing
a 3 is, according to this viewpoint, evaluated as 1/6, since the total number of
possible outcomes is 6 of which only 1 is favorable to the desired observation
of 3. Similarly, if B is the outcome that one observes an odd number of dots,
then P (B) = 3/6 = 0.5.
Observe that according to this view, no experiments have been performed
yet; the formulation is based entirely on an `
a-priori enumeration of N and
NA . However, this intuitively appealing perspective is not always applicable:
What if all the outcomes are not equally likely?
How about random phenomena whose outcomes cannot be characterized
as cleanly in this fashion, say, for example, the prospect of a newly purchased refrigerator lasting for 25 years without repair? or the prospect
of snow falling on a specic April day in Wisconsin?
What Eq. (2.21) provides is an intuitively appealing (and theoretically sound)
means of determining an appropriate value for P (A); but it is restricted only

46

Random Phenomena

to those circumstances where the random phenomenon in question is characterized in such a way that N and NA are natural and easy to identify.
`
Relative Frequency (A-Posteriori)
Probability
On the opposite end of the spectrum from the `
a-priori perspective is the
following alternative: consider an experiment that is repeated n times under identical conditions, where the outcomes involving A have been observed
a-posteriori, the probability of the occurrence of
to occur nA times. Then, `
outcome A is dened as
nA
P (A) = lim
(2.22)
n n
The appeal of this viewpoint is not so much that it is just as intuitive as the
previous one, but that it is also empirical, making no assumptions about equal
likelihood of outcomes. It is based on the actual performance of experiments
and the actual `
a-posteriori observation of the relative frequency of occurrences
of the desired outcome. This perspective provides a prevalent interpretation
of probability as the theoretical value of long range relative frequencies. In
fact, this is what motivates the notion of the theoretical distribution as the
limiting form to which the empirical frequency distribution tends with the
acquisition of increasing amounts of data.
However, this perspective also suers from some limitations:
How many trials, n, is sucient for Eq (2.22) to be useful in practice?
How about random phenomena for which the desired outcome does not
lend itself to repetitive experimentation under identical conditions, say,
for example, the prospect of snow falling on a specic April day in Wisconsin? or the prospect of your favorite team winning the basketball
championship next year?
Once again, these limitations arise primary because Eq (2.22) is simply just
another means of determining an appropriate value for P (A) that happens
to be valid only when the random phenomenon is such that the indicated repeated experimentation is not only possible and convenient, but for which, in
practice, truncating after a suciently large number of trials to produce a
nite approximation presents no conceptual dilemma. For example, after tossing a coin 500 times and obtaining 251 heads, declaring that the probability
of obtaining a head upon a single toss as 0.5 presents no conceptual dilemma
whatsoever.
Subjective Probability
There is yet another alternative perspective whereby P (A) is taken simply
as a measure of the degree of (personal) belief associated with the postulate
that A will occur, the value having been assigned subjectively by the individual concerned, akin to betting odds. Thus, for example, in rolling a perfect
die, the probability of obtaining a 3 is assigned strictly on the basis of what the

Random Phenomena, Variability and Uncertainty

47

individual believes to be the likely odds of obtaining this outcome, without


recourse to enumerating equally likely outcomes (the `
a-priori perspective), or
performing the die roll an innite number of times (the `
a-posteriori perspective).
The obvious diculty with this perspective is its subjectivity, so that outcomes that are equally likely (on an objective basis) may end up being assigned
dierent probabilities by dierent individuals. Nevertheless, for those practical applications where the outcomes cannot be enumerated, and for which the
experiment cannot be repeated a large number of times, the subjective allocation of probability may be the only viable option, at least `
a-priori. As we
show later, it is possible to combine this initial subjective declaration with subsequent limited experimentation in order to introduce objective information
contained in data in determining appropriate values of the sought probabilities
objectively.

2.4

The Probabilistic Framework

Beginning with the next chapter, Part II is devoted to an axiomatic treatment of probability, including basic elements of probability theory, random
variables, and probability distribution functions, within the context of a comprehensive framework for systematically analyzing random phenomena.
The central conceptual elements of this framework are: (i) a formal representation of uncertain outcomes with the random variable, X; and (ii) the
mathematical characterization of this random variable by the probability distribution function (pdf), f (x). How the probabilities are distributed over the
entire aggregate collection of all possible outcomes, expressed in terms of the
random variable, X, is contained in this pdf. The following is a procedure for
problem-solving within this framework:
1. Problem Formulation: Dene and formulate the problem appropriately.
Examine the random phenomenon in question, determine the random
variable(s), and assemble all available information about the underlying
mechanisms;
2. Model Development : Identify, postulate, or develop an appropriate ideal
model of the relevant random variability in the form of the probability
distribution function f (x);
3. Problem Solution: Use the model to solve the relevant problem (analysis,
prediction, inference, estimation, etc.);
4. Results validation: Analyze and validate the result and, if necessary,
return to any of the preceding steps as appropriate.

48

Random Phenomena

This problem-solving approach is illustrated throughout the rest of the book,


particularly in the chapters devoted to actual case studies.

2.5

Summary and Conclusions

Understanding why, despite appearances, randomly varying phenomena


can be subject to analysis of any sort at all is what has occupied our attention
in this chapter. Before beginning a formal discussion of random phenomena
analysis itself, it was necessary to devote some time to a closer examination of several important foundational issues that are essential to a solid understanding of randomly varying phenomena and their analysis: determinism
and randomness; variability and uncertainty; probability and the probabilistic
framework for solving problems involving random variability. Using idealized
chemical reactors as illustration, we have presented determinism and randomness as two extreme idealizations of natural phenomena. The residence time
of a dye molecule in the hollow tube of a plug ow reactor (PFR) was used to
demonstrate the ideal deterministic variable whose value is xed and determinable precisely. At the other end of the spectrum is the length of time the
dye molecule spends in a vigorously stirred vessel, the ideal continuous stirred
tank reactor (CSTR). This time the variable is random and hence impossible
to determine precisely `
a priori, but it is not haphazard. The mathematical
model derived for the distribution of residence times in the CSTRespecially
how it was obtained from rst principlesprovides a preview and a chemical
engineering analog of what is to come in Chapters 8 and 9, where models are
derived for a wide variety of randomly varying phenomena in similar fashion
on the basis of underlying phenomenological mechanisms.
We also examined the characteristics of random mass phenomena, especially highlighting the co-existence of aggregate predictability in the midst of
individual irregularities. This order-in-the-midst-of-variability makes possible the use of probability and probability distributions to characterize ensemble behavior mathematically. The subsequent introduction of the concept of
probability, while qualitative and informal, is nonetheless important. Among
other things, it provided a non-technical setting for dealing with the potentially confusing issue of how to interpret probability. In this regard, it bears
reiterating that much confusion can be avoided by remembering to keep the
concept of probabilityas a quantity between 0 and 1 used to quantify degree of uncertaintyseparate from the means by which numerical values are
determined for it. It is in this latter sense that the various interpretations
of probabilityclassical, relative frequency, and subjectiveare to be understood: these are all various means of determining a specic value for the
probability of a specic outcome; and, depending on the situation at hand,
one approach is often more appropriate than others.

Random Phenomena, Variability and Uncertainty

49

Here are some of the main points of the chapter again:


Randomness does not imply disorder; it is order of a dierent kind,
whereby aggregate predictability co-exists with individual irregularity;
Determinism and randomness are two extreme idealizations of naturally
occurring phenomena, and both are equally subject to rigorous analysis;
The mathematical framework to be employed in the rest of this book
is based on probability, the concept of a random variable, X, and its
mathematical characterization by the pdf, f (x).

REVIEW QUESTIONS
1. If not a synonym for disorder, then what is randomness in scientic observation?
2. What is the concept of determinism?
3. Why are the expressions in Eqs (16.2) and (2.2) considered deterministic?
4. What is an example phenomenon that had to be ignored in order to obtain the
deterministic expressions in Eq (16.2)? And what is an example phenomenon that
had to be ignored in order to obtain the deterministic expressions in Eq (2.2)?
5. What are the main characteristics of randomness as described in Subsection
2.1.1?
6. Compare and contrast determinism and randomness as two opposite idealizations
of natural phenomena.
7. Which idealized phenomenon does residence time in a plug ow reactor (PFR)
represent?
8. What is the central plug ow idealization in a plug ow reactor, and how will
departures from such idealization aect the residence time in the reactor?
9. Which idealized phenomenon does residence time in a continuous stirred-tank
reactor (CSTR) represent?
10. On what principle is the mathematical model in Eq (2.8) based?
11. What does the expression in Eq (4.41) represent?
12. What observation by John Graunt is widely considered to be the genesis of
population demographics and of modern actuarial science?
13. What are the dening characteristics of random mass phenomena?

50

Random Phenomena

14. How does inherent variability give rise to uncertainty?


15. What are the fundamental problems of random phenomena analysis as presented
in Subsection 2.2.3?
16. What is the primary mathematical vehicle introduced in Subsection 2.3.1 for
analyzing the behavior of random phenomena?
17. What is the classical (`
a-priori) perspective of probability and when is it not
applicable?
18. What is the relative frequency (`
a-posteriori) perspective of probability and what
are its limitations?
19. What is the subjective perspective of probability and under what circumstances
is it the only viable option for specifying probability in practice?
20. What are the central conceptual elements of the probabilistic framework?
21. What are the four steps in the procedure for problem-solving within the probabilistic framework?

EXERCISES
Section 2.1
2.1 Solve Eq (2.11) explicitly to conrm the result in Eq (2.12).
2.2 Plot the expression in Eq (2.15) as a function of the scaled time variable, t = t/ ;
determine the percentage of dye molecules with age less than or equal to the mean
residence time, .
2.3 Show that

1 /
e
d =

(2.23)

and hence conrm the result in Eq (2.19).


Section 2.2
2.4 The following probability distribution functions:

and

x2
1
f (x) = e 18 ; < x <
3 2

(2.24)

1 y2
f (y) = e 2 ; < y <
2

(2.25)

represent how the occurrences of all the possible outcomes of the two randomly
varying, continuous variables, X and Y , are distributed. Plot these two distribution

Random Phenomena, Variability and Uncertainty

51

functions on the same graph. Which of these variables has a higher degree of uncertainty associated with the determination of any particular outcome. Why?
2.5 When a fair coin is tossed 4 times, it is postulated that the probability of
obtaining x heads is given by the probability distribution function:
f (x) =

4!
0.54
x!(4 x)!

(2.26)

Determine the probability of obtaining x = 0, 1, 2, . . . , 4 heads. Intuitively, which of


these outcomes would you think will be the most likely? Are the results of your
computation consistent with your intuition?
Section 2.3
2.6 In tossing a fair coin once, describe the classical (`
a-priori), relative frequency
(`
a-posteriori), and the subjective perspectives of the probability of obtaining a head.

APPLICATION PROBLEMS
2.7 For each of the following two-reactor congurations:
(a) two plug ow reactors in series where the length of reactor 1 is l1 m, and
that of reactor 2 is l2 m, but both have the same uniform cross-sectional area
A m2 ;
(b) two continuous stirred tank reactors with volumes V1 and V2 m3 ;
(c) the PFR in Fig 2.1 followed by the CSTR in Fig 2.2;
given that the ow rate through each reactor ensemble is constant at F m3 /s, obtain
the residence time, , or the residence time distribution, f (), as appropriate. Make
any assumption you deem appropriate about the concentration C1 (t) and C2 (t) in
the rst and second reactors, respectively.
2.8 In the summer of 1943 during World War II, a total of 365 warships were attacked
by Kamikaze pilots: 180 took evasive action and 60 of these were hit; the remaining
185 counterattacked, of which 62 were hit. Using a relative frequency interpretation
and invoking any other assumption you deem necessary, determine the probability
that any attacked warship will be hit regardless of tactical response. Also determine
the probability that a warship taking evasive action will be hit and the probability
that a counterattacking warship will be hit. Compare these three probabilities and
discuss what this implies regarding choosing an appropriate tactical response. (A
full discussion of this problem is contained in Chapter 7.)
2.9 Two American National Football League (NFL) teams, A and B, with respective Win-Loss records 9-6 and 12-3 after 15 weeks, are preparing to face each other
in the 16th and nal game of the regular season.
(i) From a relative frequency perspective of probability, use the supplied information
(and any other assumption you deem necessary) to compute the probability of Team
A winning any generic game, and also of Team B winning any generic game.

52

Random Phenomena

(ii) When the two teams play each other, upon the presupposition that past record
is the best indicator of a teams chances of winning a new game, determine reasonable values for P (A), the probability that team A wins the game, and P (B), the
probability that team B wins, assuming that this game does not end up in a tie.
Note that for this particular case,
P (A) + P (B) = 1

(2.27)

Part II

Probability

53

55

Part II: Probability


Characterizing Random Variability

Here we have the opportunity of expounding more clearly what has


already been said
Rene Descartes (15961650)

56

Part II: Probability


Characterizing Random Variability

Chapter 3: Fundamentals of Probability Theory


Chapter 4: Random Variables
Chapter 5: Multidimensional Random Variables
Chapter 6: Random Variable Transformations
Chapter 7: Application Case Studies I: Probability

Chapter 3
Fundamentals of Probability Theory

3.1
3.2

3.3
3.4

3.5
3.6

Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Events, Sets and Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Probability Set Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Final considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 The Calculus of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Illustrating the Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Formalizing the Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58
60
61
64
67
68
69
69
71
72
72
73
74
76
77
78
79
80
84

Before setting out to attack any denite problem


it behooves us rst, without making any selection,
to assemble those truths that are obvious
as they present themselves to us
and afterwards, proceeding step by step,
to inquire whether any others can be deduced from these.
Rene Descartes (15961650)

The paradox of randomly varying phenomena that the aggregate ensemble behavior of unpredictable, irregular, individual observations is stable and
regular provides a basis for developing a systematic analysis approach.
Such an approach requires temporarily abandoning the futile task of predicting individual outcomes and instead focussing on characterizing the aggregate
ensemble in a mathematically appropriate manner. The central element is a
machinery for determining the mathematical probability of the occurrence
of each outcome and for quantifying the uncertainty associated with any attempts at predicting the intrinsically unpredictable individual outcomes. How
this probability machinery is assembled from a set of simple building blocks
and mathematical operations is presented in this chapter, along with the basic concepts required for its subsequent use for systematic analysis of random
57

58

Random Phenomena

phenomena. This chapter is therefore devoted to introducing probability in its


basic form rst, before we begin employing it in subsequent chapters to solve
problems involving random phenomena.

3.1

Building Blocks

A formal mathematical theory for studying random phenomena makes use


of certain words, concepts, and terminology in a more restricted technical
sense than is typically implied by common usage. We begin by providing the
denitions of:
Experiments; Trials; Outcomes
Sample space; Events
within the context of the machinery of probability theory .
1. Experiment: Any process that generates observable information about the
random phenomenon in question.
This could be the familiar sort of experiment in the sciences and engineering (such as the determination of the pH of a solution, the quantication
of the eectiveness of a new drug, or the determination of the eect of an
additive on gasoline consumption in an automobile engine); it also includes
the simple, almost articial sort, such as tossing a coin or some dice, drawing
a marble from a box, or a card from a well-shued deck. We will employ such
simple conceptual experiments with some regularity because they are simple
and easy to conceive mentally, but more importantly because they serve as
useful models for many practical, more complex problems, allowing us to focus on the essentials and avoid getting bogged down with unnecessary and
potentially distracting details. For example, in inspecting a manufactured lot
for defective parts, so long as the result of interest is whether the selected and
tested part is defective or not, the real experiment is well-modeled by the
toss of an appropriate coin.
2. Outcome: The result of an experiment.
This could be as simple as an attribute, such as the color of a marble drawn
from a box, or whether the part drawn from a manufactured lot is defective
or not; it could be a discrete quantity such as the number of heads observed
after 10 tosses of a coin, or the number of contaminants observed on a silicon
wafer; it could also be a continuous quantity such as the temperature of reactants in a chemical reactor, or the concentration of arsenic in a water sample.

Fundamentals of Probability Theory

59

3. Trial: A single performance of a well-dened experiment giving rise to an


outcome.
Random phenomena are characterized by the fact that each trial of the
same experiment performed under identical conditions can potentially produce dierent outcomes.
Closely associated with the possible outcomes of an experiment and crucial
to the development of probability theory are the concepts of the sample space
and events.
4. Sample Space: The set of all possible outcomes of an experiment.
If the elements of this set are individual, distinct, countable entities, then
the sample space is said to be discrete; if, on the other hand, the elements are
a continuum of values, the sample space is said to be continuous.
5. Event: A set of possible outcomes that share a common attribute.
The following examples illustrate these concepts.
Example 3.1 THE BUILDING BLOCKS OF PROBABILITY
In tossing a coin 3 times and recording the number of observed heads
and tails, identify the experiment, what each trial entails, the outcomes,
and the sample space.
Solution:
1. Experiment: Toss a coin 3 times; record the number of observed
heads (each one as an H) and tails (each one as a T);
2. Trial: Each trial involves 3 consecutive tosses of the coin;
3. Outcomes: Any one of the following is a possible outcome: HHH,
HHT, HTH, THH, HTT, THT, TTH, TTT.
4. Sample space: The set dened by
= {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T } (3.1)
consisting of all possible 8 outcomes, is the sample space for this
experiment. This is a discrete sample space because there are 8
individual, distinct and countable elements.

Example 3.2 EVENTS ASSOCIATED WITH EXAMPLE 3.1


Identify some events associated with the experiment introduced in Example 3.1.
Solution:
The set A = {HHT, HT H, T HH} consists of those outcomes involving the occurrence of exactly two heads; it therefore represents the
event that exactly 2 heads are observed when a coin is tossed 3 times.

60

Random Phenomena
The set B = {T T T } consists of the only outcome involving the
occurrence of 3 tails; it therefore represents the event that 3 tails are
observed.
The set C = {HHH, HHT, HT H, T HH} consists of the outcomes
involving the occurrence of at least 2 heads; it represents the event that
at least 2 heads are observed.
Similarly, the set D = {HHH} represents the event that 3 heads
are observed.

A simple or elementary event is one that consists of one and only one
outcome of the experiment; i.e. a set with only one element. Thus, in Example 3.2, set B and set D are examples of elementary events. Any other event
consisting of more than one outcome is a complex or compound event. Sets A
and C in Example 3.2 are compound events. (One must be careful to distinguish between the set and its elements. The set B in Example 3.2 contains
one element, TTT, but the set is not the same as the element. Thus, even
though the elementary event consists of a single outcome, one is not the same
as the other).
Elementary events possess an important property that is crucial to the
development of probability theory:
An experiment conducted once produces one and only one outcome;
The elementary event consists of only one outcome;
One and only one elementary event can occur for every experimental
trial;
Therefore:
Simple (elementary) events are mutually exclusive.
In Example 3.2, sets B and D represent elementary events; observe that if one
occurs, the other one cannot. Compound events do not have this property. In
this same example, observe that if, after a trial, the outcome is HTH (a tail
sandwiched between two heads), event A has occurred (we have observed
precisely 2 heads), but so has event C, which requires observing 2 or more
heads. In the language of sets, the element HTH belongs to both set A and
set B.
An elementary event therefore consists of a single outcome and cannot be decomposed into a simpler event; a compound event, on the other
hand, consists of a collection of more than one outcome and can therefore be
composed from several simple events.

Fundamentals of Probability Theory

3.2

61

Operations

If rational analysis of random phenomena depends on working with the


aggregate ensemble of all possible outcomes, the next step in the assembly
of the analytical machinery is a means of operating on the component building blocks identied above. First the outcomes, already represented as events,
must be rmly rooted in the mathematical soil of sets so that established basic
set operations can be used to operate on events. The same manipulations of
standard algebra and the algebra of sets can then be used to obtain algebraic
relationships between the events that comprise the aggregate ensemble of the
random phenomenon in question. The nal step is the denition of functions
and accompanying operational rules that allow us to perform functional analysis on the events.

3.2.1

Events, Sets and Set Operations

We earlier dened the sample space as a set whose elements are all the
possible outcomes of an experiment. Events are also sets, but they consist of
only certain elements from that share a common attribute. Thus,

Events are subsets of the sample space.

Of all the subsets of , there are two special ones with important connotations: , the empty set consisting of no elements at all, and itself. In
the language of events, the former represents the impossible event, while the
latter represents the certain event.
Since they are sets, events are amenable to analysis using precisely the
same algebra of set operations union, intersection and complement
which we now briey review.
1. Union: A B represents the set of elements that are either in A or B. In
general,
A1 A2 A3 . . . Ak =

Ai

(3.2)

i=1

is the set of elements that are in at least one of the k sets, {Ai }k1 .
2. Intersection: A B represents the set of elements that are in both A and

62

Random Phenomena

B. In general,
A1 A2 A3 . . . Ak =

Ai

(3.3)

i=1

is the set of elements that are common to all the k sets, {Ai }k1 .
To discuss the third set operation requires two special sets: The universal set (or universe), typically designated , and the null (or empty) set,
typically designated . The universal set consists of all possible elements of
interest, while the null set contains no elements. (We have just recently introduced such sets above but in the specic context of the sample space of an
experiment; the current discussion is general and not restricted to the analysis
of randomly varying phenomena and their associated sample spaces.)
These sets have the special properties that for any set A,
A
A

=
=

A;

(3.4)
(3.5)

(3.6)

(3.7)

and

3. Complement: A , the complement of set A, is always dened with respect


to the universal set ; it consists of all the elements of that are not in A. The
following are basic relationships associated with the complement operation:

(A )

(A B)
(A B)

= ; =
= A;
= A B
= A B

(3.8)
(3.9)
(3.10)
(3.11)

with the last two expressions known as DeMorgans Laws.


The rules of set algebra (similar to those of standard algebra) are as follows:
Commutative law:
AB = BA
AB = BA
Associative law:
(A B) C = A (B C)
(A B) C = A (B C)
Distributive Law:
(A B) C = (A C) (B C)
(A B) C = (A C) (B C)

Fundamentals of Probability Theory

63

TABLE 3.1:

Subsets and Events


Subset
Event

Certain event
Impossible event

A
Non-occurrence of event A
Event A or B
AB
AB
Events A and B

The following table presents some information about the nature of subsets of
interpreted in the language of events.
Note in particular that if A B = , A and B are said to be disjoint sets
(with no elements in common); in the language of events, this implies that
event A occurring together with event B is impossible. Under these circumstances, events A and B are said to be mutually exclusive.
Example 3.3 PRACTICAL ILLUSTRATION OF SETS AND
EVENTS
Samples from various batches of a polymer resin manufactured at a plant
site are tested in a quality control laboratory before release for sale. The
result of the tests allows the manufacturer to classify the product into
the following 3 categories:
1. Meets or exceeds quality requirement; Assign #1; approve for sale
as 1st quality.
2. Barely misses quality requirement; Assign #2; approve for sale as
2nd grade at a lower price.
3. Fails completely to meet quality requirement; Assign #3; reject as
poor grade and send back to be incinerated.
Identify the experiment, outcome, trial, sample space and the events
associated with this practical problem.
Solution:
1. Experiment: Take a sample of polymer resin and carry out the
prescribed product quality test.
2. Trial: Each trial involves taking a representative sample from each
polymer resin batch and testing it as prescribed.
3. Outcomes: The assignment of a number 1, 2, or 3 depending on
how the result of the test compares to the product quality requirements.
4. Sample space: The set = {1, 2, 3} containing all possible outcomes.
5. Events: The subsets of the sample space are identied as follows:
E0 = {}; E1 = {1}; E2 = {2}; E3 = {3}; E4 = {1, 2}; E5 =
{1, 3}; E6 = {2, 3}; E7 = {1, 2, 3}. Note that there are 8 in all. In
general, a set with n distinct elements will have 2n subsets.

64

Random Phenomena

Note that this real experiment is identical in spirit to the conceptual


experiment in which 3 identical ping-pong balls inscribed with the numbers
1, 2, and 3 are placed in a box, and each trial involves drawing one out
and recording the inscribed number found on the chosen ball. Employing the
articial surrogate may sometimes be a useful device to enable us focus on
the essential components of the problem.
Example 3.4 INTERPRETING EVENTS OF EXAMPLE 3.3
Provide a practical interpretation of the events identied in the quality
assurance problem of Example 3.3 above.
Solution:
E1 = {1} is the event that the batch is of 1st grade;
E2 = {2} is the event that the batch is of 2nd grade;
E3 = {3} is the event that the batch is rejected as poor grade.
These are elementary events; they are mutually exclusive.
E4 = {1, 2} is the event that the batch is either 1st grade or 2nd grade;
E5 = {1, 3} is the event that the batch is either 1st grade or rejected;
E6 = {2, 3} is the event that the batch is either 2nd grade or rejected.
These events are not elementary and are not mutually exclusive. For
instance, if a sample analysis indicates the batch is 1st grade, then the
events E1 , E4 and E5 have all occurred.
E7 = {1, 2, 3} = is the event that the batch is either 1st grade or 2nd
grade, or rejected;
E0 = is the event that the batch is neither 1st grade nor 2nd grade,
nor rejected.
Event E7 is certain to happen: the outcome of the experiment has to
be one of these three classicationsthere is no other alternative; event
E0 on the other hand is impossible, for the same reason.
Example 3.5 COMPOUND EVENTS FROM ELEMENTARY
EVENTS
Show how the compound events in Examples 3.3 and 3.4 can be composed from (or decomposed into) elementary events.
Solution:
The compound events E4 , E5 , E6 and E7 are related to the elementary
events E1 , E2 and E3 as follows:
E4

E1 E2

(3.12)

E5

E1 E3

(3.13)

E6

E2 E3

(3.14)

E7

E1 E2 E3

(3.15)

Fundamentals of Probability Theory

65

TABLE 3.2:
Name
Allison
Ben
Chrissy
Daoud
Evan
Fouad
Gopalan
Helmut
Ioannis
Jim
Katie
Larry
Moe
Nathan
Olu

3.2.2

Class list and attributes


Sex
Age
Amount in wallet
Height
(M or F) (in years) (to the nearest $) (in inches)
F
21
$ 17.00
66
M
23
$ 15.00
72
F
23
$ 26.00
65
M
25
$ 35.00
67
M
22
$ 27.00
73
M
20
$ 15.00
69
M
21
$ 29.00
68
M
19
$ 13.00
71
M
25
$ 32.00
70
M
24
$ 53.00
74
F
22
$ 41.00
70
M
24
$ 28.00
72
M
21
$ 18.00
71
M
22
$ 6.00
68
M
26
$ 23.00
72

Set Functions

A function F (.), dened on the subsets of such that it assigns one and
only one real number to each subset of , is known as a set function. By
this denition, no one subset can be assigned more than one number by a set
function. The following examples illustrate the concept.
Example 3.6 SET FUNCTIONS DEFINED ON THE SET OF
STUDENTS IN A CLASSROOM
The following table shows a list of attributes associated with 15 students
in attendance on a particular day in a 600 level course oered at the
University of Delaware. Let set A be the subset of female students and
B, the subset of male students. Obtain the real number assigned by the
following set functions:
1. N (A), the total number of female students in class;
2. N (), the total number of students in class;
3. M (B), the sum total amount of money carried by the male students;

4. H(A),
the average height (in inches) of female students;
5. Y + (B), the maximum age, in years, of male students
Solution:
1. N (A) = 3;
2. N () = 15;
3. M (B) = $293.00;

4. H(A)
= 67 ins.

66

Random Phenomena
B
A
3
6

37

FIGURE 3.1: Venn Diagram for Example 3.7


5. Y + (B) = 26 years.

A set function Q is said to be additive if for every pair of disjoint subsets A


and B of ,
Q(A B) = Q(A) + Q(B)
(3.16)
For example, the set function N (.) in Example 3.6 is an additive set function.
Observe that the sets A and B in this example are disjoint; furthermore =
A B. Now, N () = N (A B) = 15 while N (A) = 3 and N (B) = 12. Thus
for this example,
N (A B) = N (A) + N (B)
(3.17)
+

However, H(.) is not an additive set function, and neither is Y (.).


In general, when two sets are not disjoint, i.e. when A B = , so that
the intersection is non-empty, it is easy to show (see exercise at the end of the
chapter) that if Q(.) is an additive set function,
Q(A B) = Q(A) + Q(B) Q(A B)

(3.18)

Example 3.7 ADDITIVE SET FUNCTION ON NONDISJOINT SETS


An old batch of spare parts contains 40 parts, of which 3 are defective;
a newly manufactured batch of 60 parts was added to make up a consolidated batch of 100 parts, of which a total of 9 are defective. Find
the total number of parts that are either defective or from the old batch.
Solution:
If A is the set of defective parts and B is the set of parts from the old
batch, and if N (.) is the number of parts in a set, then we seek N (AB).
The Venn diagram in Fig 3.1 shows the distribution of elements in each
set.
From Eq (3.18),
N (A B) = N (A) + N (B) N (A B) = 9 + 40 3 = 46

(3.19)

Fundamentals of Probability Theory

67

so that there are 46 parts that are either defective or from the old batch.

3.2.3

Probability Set Function

Let P (.) be an additive set function dened on all subsets of , the sample
space of all the possible outcomes of an experiment, such that:
1. P (A) 0 for every A ;
2. P () = 1;
3. P (A B) = P (A) + P (B) for all mutually exclusive events A and B
then P (.) is a probability set function.
Remarkably, these three simple rules (axioms) due to Kolmogorov, are
sucient to develop the mathematical theory of probability. The following
are important properties of P (.) arising from these axioms.
1. To each event A, it assigns a non-negative number, P (A), its probability;
2. To the certain event , it assigns unit probability;
3. The probability that either one or the other of two mutually exclusive
events A, B will occur is the sum of the probabilities that each event
will occur.
The following corollaries are important consequences of the foregoing three
axioms:
Corollary 1. P (A ) = 1 P (A).
The probability of non-occurrence of A is 1 minus the probability of its occurrence. Equivalently, the combined probability of the occurrence of an event
and of its non-occurrence add up to 1. This follows from the fact that
= A A ;

(3.20)

that A and A are disjoint sets; that P (.) is an additive set function, and that
P () = 1.
Corollary 2. P () = 0.
The probability of an impossible event occurring is zero. This follows from the
fact that = and from corollary 1 above.
Corollary 3. A B P (A) P (B).
If A is a subset of B then the probability of occurrence of A is less than, or
equal to, the probability of the occurrence of B. This follows from the fact
that under these conditions, B can be represented as the union of 2 disjoint
sets:
(3.21)
B = A (B A )

68

Random Phenomena

and from the additivity of P (.),


P (B) = P (A) + P (B A )

(3.22)

so that from the non-negativity of P (.), we obtain,


P (B) P (A)

(3.23)

Corollary 4. 0 P (A) 1 for all A .


The probability of any realistic event occurring is bounded between zero and
1. This follows directly from the rst 2 axioms and from corollary 3 above.
Corollary 5. P (A B) = P (A) + P (B) P (A B) for any pair of subsets
A and B.
This follows directly from the additivity of P (.) and results presented earlier
in Eq (3.18).

3.2.4

Final considerations

Thus far, in assembling the machinery for dealing with random phenomena
by characterizing the aggregate ensemble of all possible outcomes, we have
encountered the sample space , whose elements are all the possible outcomes
of an experiment; we have presented events as collections of these outcomes
(and hence subsets of ); and nally P (.), the probability set function dened
on subsets of , allows the axiomatic denition of the probability of an event.
What we need next is a method for actually obtaining any particular probability P (A) once the event A has been dened. Before we can do this, however,
for completeness, a set of nal considerations are in order.
Even though as presented, events are subsets of , not all subsets of
are events. There are all sorts of subtle mathematical reasons for this, including the (somewhat unsettling) case in which consists of innitely many
elements, as is the case when the outcome is a continuous entity and can
therefore take on values on the real line. In this case, clearly, is the set of
all real numbers. A careful treatment of these issues requires the introduction
of Borel elds (see for example, Kingman and Taylor, 1966, Chapter 111 ).
This is necessary because, as the reader may have anticipated, the calculus of
probability requires making use of set operations, unions and intersections, as
well as sequences and limits of events. As a result, it is important that sets
resulting from such operations are themselves events. This is strictly true of
Borel elds.
Nevertheless, for all practical purposes, and most practical applications, it
is often not necessary to distinguish between the subsets of and genuine
events. For the reader willing to accept on faith the end resultthe probability
1 Kingman, J.F.C. and Taylor, S.J., Introduction to the Theory of Measure and Probability, Cambridge University Press, 1966.

Fundamentals of Probability Theory

69

distribution function presented fully in Chapters 4 and 5a lack of detailed


knowledge of such subtle, but important, ne points will not constitute a
hinderance to the appropriate use of the tool.

3.3

Probability

We are now in a position to discuss how to use the machinery we have


assembled above to determine the probability of any particular event A.

3.3.1

The Calculus of Probability

Once the sample space for any random experiment has been specied
and the events (subsets of the sample space) identied, the following is the
procedure for determining the probability of any event A, based on the important property that elementary events are mutually exclusive:

Assign probabilities to all the elementary events in ;


Determine the probability of any compound event from the probability
of the elementary events making up the compound event of interest.

The procedure is particularly straightforward to illustrate for discrete


sample spaces with a countable number of elements. For example, if =
{d1 , d2 , . . . dN } consists of N outcomes, then there are N elementary events,
Ei = {di }. To each of these elementary events, we assign the probability pi (we
will discuss shortly how such assignments are made) subject to the constraint

that N
i pi = 1. From here, if
A = {d1 , d2 , d4 }

(3.24)

and if P (A) represents the probability of event A occurring, then,


P (A) = p1 + p2 + p4

(3.25)

B = {d3 , d5 , . . . , dN }

(3.26)

P (B) = 1 p1 p2 p4

(3.27)

and for
then
The following examples illustrate how probabilities pi may be assigned to
elementary events.

70

Random Phenomena
Example 3.8 ASSIGNMENTS FOR EQUIPROBABLE OUTCOMES
The experiment of tossing a coin 3 times and recording the observed
number of heads and tails was considered in Examples 3.1 and 3.2.
There the sample space was obtained in Eq (4.5) as:
= {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T },

(3.28)

a set with 8 elements that comprise all the possible outcomes of the experiment. Several events associated with this experiment were identied
in Example 3.2.
If there is no reason for any one of the 8 possible outcomes to be
any more likely to occur that any other one, the outcomes are said to
be equiprobable and we assign a probability of 1/8 to each one. This
gives rise to the following equiprobale assignment of probability to the
8 elementary events:

Note that

P (E1 )

P {HHH} = 1/8

P (E2 )

P {HHT } = 1/8

P (E3 )
..
.

P {HT H} = 1/8

P (E7 )

P {T T H} = 1/8

P (E8 )

P {T T T } = 1/8

8

1

pi =

8


P (Ei ) = 1

(3.29)

(3.30)

And now because the event


A = {HHT, HT H, T HH}

(3.31)

identied in Example 3.2 (the event that exactly 2 heads are observed)
consists of three elementary events E2 , E3 and E4 , so that
A = E2 E3 E4 ,

(3.32)

because these sets are disjoint, we have that


P (A) = P (E2 ) + P (E3 ) + P (E4 ) = 3/8

(3.33)

Similarly, for the events B, C and D identied in Example 3.2, we have,


respectively, P (C) = 4/8 = 0.5 and P (B) = P (D) = 1/8.

Other means of probability assignment are possible, as illustrated by the following example.
`
Example 3.9 ALTERNATIVE ASSIGNMENTS FROM APRIORI KNOWLEDGE
Consider the manufacturing example discussed in Examples 3.3 and 3.4.

Fundamentals of Probability Theory

71

Suppose that historically 75% of manufactured batches have been of 1st


grade, 15% of grade 2 and the rest rejected. Assuming that nothing has
changed in the manufacturing process, use this information to assign
probabilities to the elementary events identied in Example 3.3, and
determine the probabilities for all the possible events associated with
this problem.
Solution:
Recall from Examples 3.3 and 3.4 that the sample space in this case is
= {1, 2, 3} containing all 3 possible outcomes; the 3 elementary events
are E1 = {1}; E2 = {2}; E3 = {3}; the other events (the remaining
subsets of the sample space) had been previously identied as: E0 =
{}; E4 = {1, 2}; E5 = {1, 3}; E6 = {2, 3}; E7 = {1, 2, 3}.
From the provided information, observe that it is entirely reasonable
to assign probabilities to the elementary events as follows:
P (E1 )

0.75

(3.34)

P (E2 )

0.15

(3.35)

P (E3 )

0.10

(3.36)

Note that these probabilities sum to 1 as required. From here we may


now compute the probabilities for the other events:
P (E4 )

P (E1 ) + P (E2 ) = 0.9

(3.37)

P (E5 )

P (E1 ) + P (E3 ) = 0.85

(3.38)

P (E6 )

P (E2 ) + P (E3 ) = 0.25

(3.39)

For completeness, we note that P (E0 ) = 0; and P (E7 ) = 1.

3.3.2

Implications

It is worth spending a few moments to reect on the results obtained from


this last example.
The premise is that the manufacturing process is subject to many sources
of variability so that despite having an objective of maintaining consistent
product quality, its product may still fall into any one of the three quality grade
levels in an unpredictable manner. Nevertheless, even though the particular
grade (outcome) of any particular tested sample (experiment) is uncertain and
unpredictable, this example shows us how we can determine the probability
of the occurrence of the entire collection of all possible events. First, the
more obvious elementary events: for example the probability that a sample
will be grade 1 is 0.75. Even the less obvious complex events have also been
characterized. For example, if we are interested in the probability of making
any money at all on what is currently being manufactured, this is the event E4
(producing saleable grade 1 or 2 material); the answer is 0.9. The probability

72

Random Phenomena
G (Graduate)
12

U (Undergraduate)
38

10

FIGURE 3.2: Venn diagram of students in a thermodynamics class


of not making grade 1 material is 0.25 (the non-occurrence of event E1 , or
equivalently the event E6 ).
With this example, what we have actually done is to construct a model of
how the probability of the occurrence of events is distributed over the entire
collection of all possible events. In subsequent chapters, we make extensive use
of the mechanism illustrated here in developing probability models for complex
random phenomena, proceeding from the probability of elementary events
and employing the calculus of probability to obtain the required probability
distribution expressions.

3.4
3.4.1

Conditional Probability
Illustrating the Concept

Consider a chemical engineering thermodynamics class consisting of 50 total students of which 38 are undergraduates and the rest are graduate students.
Of the 12 graduate students, 8 are chemistry students; of the 38 undergraduates, 10 are chemistry students. We may dene the following sets:
, the (universal) set of all students (50 elements);
G, the set of graduate students (12 elements);
C, the set of chemistry students (18 elements)
Note that the set G C, the set of graduate chemistry students, contains 8
elements. (See Fig 3.2.)
We are interested in the following problem: select a student at random;
given that the choice results in a chemistry student, what is the probability
that she/he is a graduate student? This is a problem of nding the probability
of the occurrence of an event conditioned upon the prior occurrence of another
one.

Fundamentals of Probability Theory

73

B
A

FIGURE 3.3: The role of conditioning Set B in conditional probability


In this particular case, the total number of students in the chemistry group
is 18, of which 8 are graduates. The required probability is thus precisely
that of choosing one of the 8 graduate students out of all the possible 18
chemistry students; and, assuming equiprobable outcomes, this probability is
8/18. (Note also from the denition of the sets above, that P (C) = 18/50 and
P (G C) = 8/50.)
We may now formalize the just illustrated concept as follows.

3.4.2

Formalizing the Concept

For two sets A and B, the conditional probability of A given B, denoted


P (A|B), is dened as
P (A B)
P (A|B) =
(3.40)
P (B)
where P (B) > 0.
Observe how the set B now plays the role that played in unconditional
probability (See Fig 3.3); in other words, the process of conditioning restricts
the set of relevant outcomes to B . In this sense, P (A) is really P (A|),
which, according to Eq. (5.33), may be written as
P (A|) =

P (A)
P (A )
=
P ()
1

(3.41)

Returning now to the previous illustration, we see that the required quantity is P (G|C), and by denition,
P (G|C) =

8/50
P (G C)
=
= 8/18
P (C)
18/50

(3.42)

as obtained previously. The unconditional probability P (G) is 12/50.


The conditional probability P (A|B) possesses all the required properties
of a probability set function dened on subsets of B:

74

Random Phenomena
B
A

A B*

AB

FIGURE 3.4: Representing set A as a union of 2 disjoint sets


1. 0 < P (A|B) 1;
2. P (B|B) = 1;
3. P (A1 A2 |B) = P (A1 |B) + P (A2 |B) for disjoint A1 and A2 .
The following identities are easily derived from the denition given above for
P (A|B):
P (A B) =

P (B)P (A|B); P (B) > 0

(3.43)

P (A)P (B|A); P (A) > 0

(3.44)

Conditional probability is a particularly important concept in science and


engineering applications because we often have available to us some `
a-priori
knowledge about a phenomenon; the required probabilities then become conditioned upon the available information.

3.4.3

Total Probability

It is possible to obtain total probabilities when only conditional probabilities are available. We now present some very important results relating
conditional probabilities to total probability.
Consider events A and B, not necessarily disjoint. From the Venn diagram
in Fig 3.4, we may write A as the union of 2 disjoint sets as follows:
A = (A B) (A B )

(3.45)

In words, this expression states that the points in A are made up of two
groups: the points in A that are also in B, and the points in A that are not
in B. And because the two sets are disjoint, so that the events they represent
are mutually exclusive, we have:
P (A) = P (A B) + P (A B )

(3.46)

Fundamentals of Probability Theory

75

A
A B2

A B3

A B1
B1

A Bk
B2

.....

B3

Bk

FIGURE 3.5: Partitioned sets for generalizing total probability result


and from the denition of conditional probability, we obtain:
P (A) = P (A|B)P (B) + P (A|B )P (B )

(3.47)

P (A) = P (A|B)P (B) + P (A|B )[1 P (B)]

(3.48)

or, alternatively,

This powerful result states that the (unconditional, or total) probability of


an event A is a weighted average of two partial (or conditional) probabilities:
the probability conditioned on the occurrence of B and the probability conditioned upon the non-occurrence of B; the weights, naturally, are the respective
probabilities of the conditioning event.
This may be generalized as follows: First we partition into a union of k
disjoint sets:
k

Bi
(3.49)
= B1 B2 B3 . . . Bk =
i=1

For any A that is an arbitrary subset of , observe that


A = (A B1 ) (A B2 ) (A B3 ) . . . (A Bk )

(3.50)

which is a partitioning of the set A as a union of k disjoint sets (See Fig 3.5).
As a result,
P (A) = P (A B1 ) + P (A B2 ) + . . . + P (A Bk )

(3.51)

P (A Bi ) = P (A|Bi )P (Bi )

(3.52)

but since
we immediately obtain
P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + . . . + P (A|Bk )P (Bk )
Thus:
P (A) =

k

i=1

P (A|Bi )P (Bi )

(3.53)

(3.54)

76

Random Phenomena

an expression that is sometimes referred to as the Theorem of total probability used to compute total probability P (A) from P (A|Bi ) and P (Bi ).
The following example provides an illustration.
Example 3.10 TOTAL PROBABILITY
A company manufactures light bulbs of 3 dierent types (T1 , T2 , T3 )
some of which are defective right from the factory. From experience
with the manufacturing process, it is known that the fraction of defective
Type 1 bulbs is 0.1; Types 2 and 3 have respective defective fractions
of 1/15 and 0.2.
A batch of 200 bulbs were sent to a quality control laboratory for
testing: 100 Type 1, 75 Type 2, and 25 Type 3. What is the probability
of nding a defective bulb?
Solution:
The supplied information may be summarized as follows: Prior conditional probabilities of defectiveness,
P (D|T1 ) = 0.1; P (D|T2 ) = 1/15; P (D|T3 ) = 0.2;

(3.55)

and the distribution of numbers of bulb types in the test batch:


N (T1 ) = 100; N (T2 ) = 75; N (T3 ) = 25.

(3.56)

Assuming equiprobable outcomes, this number distribution immediately implies the following:
P (T1 ) = 100/200 = 0.5; P (T2 ) = 0.375; P (T3 ) = 0.125

(3.57)

From the expression for total probability in Eq.(3.53), we have:


P (D) = P (D|T1 )P (T1 ) + P (D|T2 )P (T2 ) + P (D|T3 )P (T3 ) = 0.1 (3.58)

3.4.4

Bayes Rule

A question of practical importance in many applications is:

Given P (A|Bi ) and P (Bi ), how can we obtain P (Bi |A)?

In other words, how can we reverse probabilities?


The total probability expression we have just derived provides a way to
answer this question. Note from the denition of conditional probability that:
P (Bi |A) =

P (Bi A)
P (A)

(3.59)

Fundamentals of Probability Theory

77

but
P (Bi A) = P (A Bi ) = P (A|Bi )P (Bi )

(3.60)

which, when substituted into (3.59), gives rise to a very important result:
P (A|Bi )P (Bi )
P (Bi |A) = k
i=1 P (A|Bi )P (Bi )

(3.61)

This famous result, due to the Revd. Thomas Bayes (1763), is known as
Bayes Rule and we will encounter it again in subsequent chapters. For now,
it is an expression that can be used to compute the (unknown) `
a-posteriori
probability P (Bi |A) of events Bi from the `
a-priori probabilities P (Bi ) and
the (known) conditional probabilities P (A|Bi ). It indicates that the unknown
a-posteriori probability is proportional to the product of the `
`
a-priori probability and the known conditional probability we wish to reverse; the constant
of proportionality is the reciprocal of the total probability of event A.
This result is the basis of an alternative approach to data analysis (discussed in Section 14.6 of Chapter 14) wherein available prior information is
incorporated in a systematic fashion into the analysis of experimental data.

3.5

Independence

For two events A and B, the conditional probability P (A|B) was dened
earlier in Eq.(5.33). In general, this conditional probability will be dierent
from the unconditional probability P (A), indicating that the knowledge that
B has occurred aects the probability of the occurrence of A.
However, when the occurrence of B has no eect on the occurrence of A,
then the events A and B are said to be independent and
P (A|B) = P (A)

(3.62)

so that the conditional and unconditional probabilities are identical. This will
occur when
P (A B)
= P (A)
(3.63)
P (B)
so that
P (A B) = P (A)P (B)

(3.64)

Thus, when events A and B are independent, the probability of the two events
happening concurrently is the product of the probabilities of each one occurring by itself. Note that the expression in Eq.(3.64) is symmetric in A and B
so that if A is independent of B, then B is also independent of A.
This is another in the collection of very important results used in the

78

Random Phenomena

development of probability models. We already encountered the rst one: that


when two events A and B are mutually exclusive, P (A or B) = P (A B) =
P (A) + P (B). Under these circumstance, P (A B) = 0, since the event A
occurring together with event B is impossible when A and B are mutually
exclusive. Eq.(3.64) is the complementary result: that when two events are
independent P (A and B) = P (A B) = P (A)P (B).
Extended to three events, the result states that the events A, B, C are
independent if all of the following conditions hold:
P (A B) =

P (A)P (B)

(3.65)

P (B C) =
P (A C) =

P (B)P (C)
P (A)P (C)

(3.66)
(3.67)

P (A)P (B)P (C)

(3.68)

P (A B C) =

implying more than just pairwise independence.

3.6

Summary and Conclusions

This chapter has been primarily concerned with assembling the machinery
of probability from the building blocks of events in the sample space,
the collection of all possible randomly varying outcomes of an experiment.
We have seen how the probability of an event A arises naturally from the
probability set function, an additive set function dened on the set that
satises the three axioms of Kolmogorov.
Having established the concept of probability and how the probability of
any subset of can be computed, a straightforward extension to special events
restricted to conditioning sets in led to the related concept of conditional
probability. The idea of total probability, the result known as Bayes rule, and
especially the concept of independence all arise naturally from conditional
probability and have profound consequences for random phenomena analysis
that cannot be fully appreciated until much later.
We note in closing that the presentation of probability in this chapter (especially as a tool for solving problems involving randomly varying phenomena)
is still quite rudimentary because the development is not quite complete yet.
The nal step in the development of the probability machinery, undertaken
primarily in the next chapter, requires the introduction of the random variable, X, from which the analysis tool, the probability distribution function,
f (x), emerges and is fully characterized.
Here are some of the main points of the chapter again:
Events, as subsets of the sample space, , can be elementary (simple) or
compound (complex); if elementary, then they are mutually exclusive; if
compound, then they can be composed from several simple events.

Fundamentals of Probability Theory

79

Once probabilities have been assigned to all elementary events in , then


P (A), the probability of any other subset A of , can be determined on
the basis of the probability set function P (.) dened on all subsects of
according to the three axioms of Kolmogorov:
1. P (A) 0 for every A ;
2. P () = 1;
3. P (A B) = P (A) + P (B) for all mutually exclusive events A and
B.
Conditional Probability: For any two events A and B in , the conditional probability P (A|B) is given by
P (A|B) =

P (A B)
P (B)

Total Probability: Given conditional (partial) probabilities P (A|Bi ), and


P (Bi ) for each conditioning set, the unconditional (total) probability of
A is given by

P (A) =
P (A|Bi )P (Bi )
i=1

Mutual Exclusivity: Two events A and B are mutually exclusive if


P (A B) = P (A) + P (B),
in which case P (A B) = 0.
Independence: Two events A and B are independent if
P (A B) = P (A)P (B)

REVIEW QUESTIONS
1. What are the ve basic building blocks of probability theory as presented in Section 3.1? Dene each one.
2. What is a simple (or elementary) event and how is it dierent from a complex
(or compound) event?
3. Why are elementary events mutually exclusive?
4. What is the relationship between events and the sample space?
5. In the language of events, what does the empty set, , represent? What does the
entire sample space, , represent?

80

Random Phenomena

6. Given two sets A and B, in the language of events, what do the following sets
represent: A ; A B; and A B?
7. What does it mean that two events A and B are mutually exclusive?
8. What is a set function in general and what is an additive set function in particular?
9. What are the three fundamental properties of a probability set function (also
known as Kolmogorovs axioms)?
10. How is the probability of any event A determined from the elements events
in ?
11. For any two sets A and B, what is the denition of P (A|B), the conditional
probability of A given B? If the two sets are disjoint such that A B = , in words,
what does P (A|B) mean in this case?
12. How does one obtain total probability from partial (i.e., conditional) probabilities?
13. What is Bayes rule and what is it used for?
14. Given P (A|Bi ) and P (Bi ), how does one reverse the probability to determine
P (Bi |A)?
15. What does it mean for two events A and B to be independent?
16. What is P (A B) when two events A and B are (i) mutually exclusive, and (ii)
independent?

EXERCISES
Section 3.1
3.1 When two diceone black with white dots, the other black with white dots
are tossed once, simultaneously, and the number of dots shown on each dies top
face after coming to rest are recorded as an ordered pair (nB , nW ), where nB is the
number on the black die, and nW the number on the white die,
(i) identify the experiment, what constitutes a trial, the outcomes, and the sample
space.
(ii) If the sum of the numbers on the two dice is S, i.e.,
S = nB + nW ,

(3.69)

enumerate all the simple events associated with the observation S = 7.


3.2 In an opinion poll, 20 individuals selected at random from a group of college
students are asked to indicate which of three optionsapprove, disapprove, indierent best matches their individual opinions of a new campus policy. Let n0 be the
number of indierent students, n1 the number that approve, and n2 the number that

Fundamentals of Probability Theory

81

disapprove, so that the outcome of one such opinion sample is the ordered triplet
(n0 , n1 , n2 ). Write mathematical expressions in terms of the numbers n0 , n1 , and n2
for the following events:
(i) A = {Unanimous support for the policy}; and A , the complement of A.
(ii) B = {More students disapprove than approve}; and B .
(iii) C = {More students are indierent than approve};
(iv) D = {The majority of students are indierent }.
Section 3.2
3.3 Given the following two sets A and B:
A

{x : x = 1, 3, 5, 7, . . .}

(3.70)

{x : x = 0, 2, 4, 6, . . .}

(3.71)

nd A B and A B.
3.4 Let Ak = {x : 1/(k + 1) x 1} for k = 1, 2, 3, . . .. Find the set B dened by:
B = A1 A2 A3 . . . =

Ai

(3.72)

i=1

3.5 For sets A, B, C, subsets of the universal set , establish the following identities:
(A B)

A B

(3.73)

A B

A (B C)

(A B) (A C)

(3.75)

A (B C)

(A B) (A C)

(3.76)

(A B)

(3.74)

3.6 For every pair of sets A, B, subsets of the sample space upon which the
probability set function P (.) has been dened, prove that:
P (A B) = P (A) + P (B) P (A B)

(3.77)

3.7 In a certain engineering research and development company, apart from the support sta which number 25, all other employees are either engineers or statisticians
or both. The total number of employees (including the support sta) is 100. Of
these, 50 are engineers, and 40 are statisticians; the number of employees that are
both engineers and statisticians is not given. Find the probability that an employee
chosen at random is not one of those classied as being both an engineer and a
statistician.
Section 3.3
3.8 For every set A, let the set function Q(.) be dened as follows:

f (x)
Q(A) =

(3.78)

where

   x
1
2
f (x) =
; x = 0, 1, 2, . . .
3
3

(3.79)

82

Random Phenomena

If A1 = {x : x = 0, 1, 2, 3} and A2 = {x : x = 0, 1, 2, 3, . . .} nd Q(A1 ) and Q(A2 ).


3.9 Let the sample space for a certain experiment be = { : 0 < < }. Let
A represent the event A = { : 4 < < }. If the probability set function
P (A) is dened for any subset of the sample space according to:

ex dx
(3.80)
P (A) =
A

evaluate P (A), P (A ), P (A A )
3.10 For the experiment of rolling two diceone black with white dots, the other
black with white dotsonce, simultaneously, presented in Exercise 3.1, rst obtain
, the sample space, and, by assigning equal probability to each of the outcomes,
determine the probability of the following events:
(i) A = {nB + nW = 7}, i.e. the sum is 7;
(ii) B = {nB < nW };
(iii) B , the complement of B;
(iv) C = {nB = nW }, i.e. the two dice show the same number;
(v) D = {nB + nW = 5 or 9}.
3.11 A black velvet bag contains three red balls and three green balls. Each experiment involves drawing two balls at once, simultaneously, and recording their colors,
R for red, and G for green.
(i) Obtain the sample space, assuming that balls of the same color are indistinguishable.
(ii) Upon assigning equal probability to each element in the sample space, determine
the probability of drawing two balls of dierent colors.
(iii) If the balls are distinguishable and numbered from 1 to 6, and if the two balls
are drawn sequentially, not simultaneously, now obtain the sample space and from
this determine the probability of drawing two balls of dierent colors.
3.12 An experiment is performed by selecting a card from an ordinary deck of 52
playing cards. The outcome, , is the type of card chosen, classied as: Ace,
King, Queen, Jack, and others. The random variable X() assigns the
number 4 to the outcome if is an Ace; X() = 3 if the outcome is a King;
X() = 2 if the outcome is a Queen, and X() = 1 if the outcome is a Jack;
X() = 0 for all other outcomes.
(i) What is the space V of this random variable?
(ii) If the probability set function P () dened on the subsets of the original sample
space assigns a probability 1/52 to each of these outcomes, describe the induced
probability set function PX (A) induced on all the subsets of the space V by this
random variable.
(iii) Describe a physical (scientic or engineering) problem for which the above would
be a good surrogate model.
3.13 Obtain the sample space, , for the experiment involving tossing a fair coin 4
times. Upon assigning equal probability to each outcome, determine the probabilities
of obtaining, 0, 1, 2, 3, or 4 heads. Conrm that your result is consistent with the

Fundamentals of Probability Theory

83

postulate that the probability model for this phenomenon is given by the probability
distribution function:
n!
f (x) =
(3.81)
px (1 p)nx
x!(n x)!
where f (x) is the probability of obtaining x heads in n = 4 tosses, and p = 12 is the
probability of obtaining a head in a single toss of the coin. (See Chapter 8.)
3.14 In the fall of 2007, k students born in 1989 attended an all-freshman introductory general engineering class at the University of Delaware. Conrm that if p is the
probability that at least two of the students have the same birthday then:
1p=

1
365!
(365 k)! (365)k

(3.82)

Show that for a class with 23 or more students born in 1989, the probability of at
least 2 students sharing the same birthday, is more than 1/2, i.e., if k > 23 then
p > 1/2.
Sections 3.4 and 3.5
3.15 Six simple events, with probabilities P (E1 ) = 0.11; P (E2 ) = P (E5 ) =
0.20; P (E3 ) = 0.25; P (E4 ) = 0.09; P (E6 ) = 0.15, constitute the entire set of outcomes of an experiment. The following events are of interest:
A = {E1 , E2 }; B = {E2 , E3 , E4 }; C = {E5 , E6 }; D = {E1 , E2 , E5 }
Determine the following probabilities:
(i) P (A), P (B), P (C), P (D);
(ii) P (A B), P (A B); P (A D), P (A D); P (B C), P (B C);
(iii) P (B|A), P (A|B); P (B|C), P (D|C)
Which of the events A, B, C and D are mutually exclusive?
3.16 Assuming that giving birth to a boy or a girl is equally likely, and further, that
no multiple births have occurred, rst, determine the probability of a family having
three boys in a row. Now consider the conjecture (based on empirical data) that, for
a family that has already had two boys in a row, the probability of having a third
boy is 0.8. Under these conditions, what is now the probability of a family having
three boys in a row?
3.17 As a follow-up to the concept of independence of two events A and B,
Event A is said to be attracted to event B if
P (A|B) > P (A)

(3.83)

Event A is said to be repelled by event B if


P (A|B) < P (A)

(3.84)

(Of course, when P (A|B) = P (A), the two events have been previously identied
as independent.) Establish the result that if B attracts A, then: (i) A attracts B

84

Random Phenomena

(mutual attraction); and (ii) B repels A.


3.18 Show that if A and B are independent, then A and B are also independent.
3.19 Show that for two events A and B, P (A B|A B) P (A B|A). State the
condition for equality.
3.20 An exchange student from Switzerland, who is male, has been assigned to be
your partner in an introductory psychology class. As part of a class assignment, he
responds to your question about his family by stating only that he comes from a
family of two children, without specifying whether he is the older or the younger.
What is the probability that his sibling is female? Assume equal probability of having a boy or girl. Why does this result seem counterintuitive at rst?
3.21 A system consisting of two components A and B that are connected in series functions if both of them function. If P (A), the probability that component A
functions, is 0.99, and the probability that component B functions is 0.90, nd the
probability that this series system functions, assuming that whether one component
functions or not is independent of the status of the other component. If these components are connected in parallel, the system fails (i.e., will not function) only if
both components fail. Assuming independence, determine the probability that the
parallel system functions. Which probability is higher and why is it reasonable to
expect a higher probability from the system in question?
3.22 The functioning status of a complex system that consists of several components
arranged in series and parallel and with cross-links (i.e., whether the system functions
or not) can be determined from the status of a keystone component, Ck . If the
probability that the keystone component for a particular system functions is given
as P (Ck ) = 0.9 and the probability that the the system function when the keystone
functions, P (S|Ck ), is given as 0.9, with the complementary probability that the
system functions when the keystone does not function, P (S|Ck ), given as 0.8, nd
the unconditional probability, P (S), that the system functions.

APPLICATION PROBLEMS
3.23 Patients suering from manic depression and other similar disorders are sometimes treated with lithium, but the dosage must be monitored carefully because
lithium toxicity, which is often fatal, can be dicult to diagnose. A new assay used
to determine lithium concentration in blood samples is being promoted as a reliable
way to diagnose lithium toxicity because the assay result is purported to correlate
very strongly with toxicity.
A careful study of the relationship between this blood assay and lithium toxicity
in 150 patients yielded results summarized in Table 3.3. Here A+ indicates high
lithium concentrations in the blood assay and A indicates low lithium concentration; L+ indicates conrmed Lithium toxicity and L indicates no lithium toxicity.
(i) From these data, compute the following probabilities regarding the lithium toxicity status of a patient chosen at random::

Fundamentals of Probability Theory

85

TABLE 3.3:

Lithium toxicity
study results
Lithium Toxicity
Assay L+
L
Total
A+
30
17
47
A
21
82
103
Total

51

92

150

1. P (L+ ), the probability that the patient has lithium toxicity (regardless of the
blood assay result);
2. P (L+ |A+ ), the conditional probability that the patient has lithium toxicity
given that the blood assay result indicates high lithium concentration. What
does this value indicate about the potential benet of having this assay result
available?
3. P (L+ |A ) the conditional probability that the patient has lithium toxicity
given that the blood assay result indicates low lithium concentration. What
does this value indicate about the potential for missed diagnoses?
(ii) Compute the following probabilities regarding the blood lithium assay:
1. P (A+ ), the (total) probability of observing high lithium blood concentration
(regardless of actual lithium toxicity status);
2. P (A+ |L+ ) the conditional probability that the blood assay result indicates
high lithium concentration given that the patient indeed has lithium toxicity.
Why do you think that this quantity is referred to as the sensitivity of the
assay, and what does the computed value indicate about the sensitivity of
the particular assay in this study?
3. From information about P (L+ ) (as the prior probability of lithium toxicity)
along with the just computed values of P (A+ ) and P (A+ |L+ ) as the relevant
assay results, now use Bayes Rule to compute P (L+ |A+ ) as the posterior
probability of lithium toxicity after obtaining assay data, even though it has
already been computed directly in (i) above.
3.24 An experimental crystallizer produces ve dierent polymorphs of the same
crystal via mechanisms that are currently not well-understood. Types 1, 2 and 3
are approved for pharmaceutical application A; Types 2, 3 and 4 for a dierent
application B; Type 5 is mostly unstable and has no known application. How much
of each type is made in any batch varied randomly, but with the current operating
procedure, 30% of the total product made by the crystallizer in a month is of Type
1; 20% is of Type 2, with the same percentage of Types 3 and 4; and 10% is of Type
5. Assuming that the polymorhps can be separated without loss,
(i) Determine the probability of making product in a month that can be used for
application A;
(ii) Given a batch ready to be shipped for application B, what is the probabilities
that any crystal selected at random is of Type 2? What is the probability that it is
of Type 3 or Type 4. State any assumptions you may need to make.

86

Random Phenomena

(iii) What is the probability that an order change to one for application A can be
lled from a batch ready to be shipped for application B?
(iv) What is the converse probability that an order change to one for application B
can be lled given a batch that is ready to be shipped for application A?
3.25 A test for a relatively rare disease involves taking from the patient an appropriate tissue sample which is then assessed for abnormality. A few sources of error
are associated with this test. First, there is a small, but non-zero probability, s ,
that the tissue sampling procedure will miss abnormal cells primarily because these
cells (at least in the earlier stages) being relatively few in number, are randomly distributed in the tissue and tend not to cluster. In addition, during the examination
of the tissue sample itself, there is a probability, f , of failing to identify an abnormality when present; and a probability, m , of misclassifying a perfectly normal cell
as abnormal.
If the proportion of the population with this disease who are subjected to this
test is D ,
(i) In terms of the given parameters, determine the probability that the test result is
correct. (Hint: rst compute the probability that the test result is incorrect, keeping
in mind that the test may identify an abnormal cell incorrectly as normal, or a
normal cell as abnormal.)
(ii) Determine the probability of a false positive (i.e., returning an abnormality result
when none exists).
(iii) Determine the probability of a false negative (i.e., failing to identify an abnormality that is present).
3.26 Repeat Problem 3.25 for the specic values of s = 0.1; f = 0.05; m = 0.1
for a population in which 2% have the disease. A program sponsored by the Center
for Disease Control (CDC) is to be aimed at reducing the number of false positives
and/or false negatives by reducing one of the three probabilities s , f , and m .
Which of these parameters would you recommend and why?
3.27 A manufacturer of at-screen TVs purchases pre-cut glass sheets from three
dierent manufacturers, M1 , M2 and M3 , whose products are characterized in the
TV manufacturers incoming material quality control lab as premier grade, Q1 ,
acceptable grade, Q2 , and marginal grade, Q3 , on the basis of objective, measurable quality criteria, such as inclusions, warp, etc. Incoming glass sheets deemed
unacceptable are rejected and returned to the manufacturer. An incoming batch of
425 accepted sheets has been classied by an automatic classifying system as shown
in the table below.
Quality
Manufacturer
M1
M2
M3

Premier
Q1
110
150
76

Acceptable
Q2
25
33
13

Marginal
Q3
15
2
1

Total
150
185
90

If a sheet is selected at random from this batch,


(i) Determine the probability that it is of premier grade; also determine the probability that it is not of marginal grade.

Fundamentals of Probability Theory

87

(ii) Determine the probability that it is of premier grade given that it is from
manufacturer M1 ; also determine the probability that is of premier grade given
that it is from either manufacturer M2 or M3 .
(iii) Determine the probability that it is from manufacturer M3 given that it is of
marginal grade; also determine the probability that it is from manufacturer M2
given that it is of acceptable grade.
3.28 In a 1984 report2 , the IRS published the information shown in the following
table regarding 89.9 million federal tax returns it received, the income bracket of
the lers, and the percentage audited.
Income
Bracket
Below $10, 000
$10, 000 $24, 999
$25, 000 $49, 999
$50, 000 and above

Number of
lers (millions)
31.4
30.7
22.2
5.5

Percent
Audited
0.34
0.92
2.05
4.00

(i) Determine the probability that a tax ler selected at random from this population
would be audited.
(ii) Determine the probability that a tax ler selected at random is in the $25, 000
$49, 999 income bracket and was audited.
(iii) If we know that a tax ler selected at random was audited, determine the
probability that this person belongs in the $50, 000 and above income bracket.

2 Annual Report of Commissioner and Chief Counsel, Internal Revenue Service, U.S.
Department of Treasury, 1984, p 60.

88

Random Phenomena

Chapter 4
Random Variables and Distributions

4.1

4.2

4.3
4.4

4.5

4.6

Introduction and Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


4.1.1 Mathematical Concept of the Random Variable . . . . . . . . . . . . . . . . . .
4.1.2 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Types of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 The Probability Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mathematical Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Motivating the Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Denition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Characterizing Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Moments of a Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4 Additional Distributional Characteristics . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.6 Probability Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Special Derived Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Survival Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.3 Cumulative Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90
90
93
94
95
95
98
100
102
102
104
107
107
113
115
116
119
119
122
122
123
124
124
126
129
133

An idea, in the highest sense of that word,


cannot be conveyed but by a symbol.
S. T. Coleridge (1772-1834)

Even though the machinery of probability as presented thus far can already be
used to solve some practical problems, its development is far from complete.
In particular, with a sample space of raw outcomes that can be anything from
attributes and numbers, to letters and other sundry objects, this most basic
form of probability will be quite tedious and inecient in dealing with general
random phenomena. This chapter and the next one are devoted to completing
the development of the machinery of probability with the introduction of the
concept of the random variable, from which arises the probability distribution functionan ecient mathematical form for representing the ensemble
behavior of general random phenomena. The emergence, properties and characteristics of the probability distribution function are discussed extensively in
89

90

Random Phenomena

this chapter for single dimensional random variables; the discussion is generalized to multi-dimensional random variables in the next chapter.

4.1

Introduction and Definition

4.1.1

Mathematical Concept of the Random Variable

In general, the sample space presented thus far may be quite tedious
to describe and inecient to analyze mathematically if its elements are not
numbers. To facilitate mathematical analysis, it is desirable to nd a means
of converting this sample space into one with real numbers. This is achieved
via the vehicle of the random variable dened as follows:

Denition: Given a random experiment with a sample space ,


let there be a function X, which assigns to each element ,
one and only one real number X() = x. This function, X, is
called a random variable.

Upon the introduction of this entity, X, the following happens (See Fig
4.1):
1. is mapped onto V , i.e.
V = {x : X() = x, }

(4.1)

so that V is the set of all values x generated from X() = x for all
elements in the sample space ;
2. The probability set function encountered before, P , dened on , gives
rise to another probability set function, PX , dened on V and induced
by X. PX is therefore often referred to as an induced probability set
function.
The role of PX in V is identical to that of P in . Thus, for any arbitrary
subsect A of V , PX (A) is the probability of event A occurring.
The primary question of practical importance may now be stated as follows: How does one nd PX (A) in the new setting created by the introduction
of the random variable X, given the original sample space , and the original
probability set function P dened on it?
The answer is to go back to what we know, i.e., to nd that set A

Random Variables and Distributions

91

*A

Z
X(Z) = x

FIGURE 4.1: The original sample space, , and the corresponding space V induced
by the random variable X
which corresponds to the set of values of in that are mapped by X into
A, i.e.
(4.2)
A = { : and X() A}
Such a set A is called the pre-image of A, that set on the original sample
space from which A is obtained when X is applied on its elements (see Fig
4.1). We now simply dene
PX (A) = P (A )

(4.3)

P {X() A} = P { A }

(4.4)

since, by denition of A ,

from where we see how X induces PX (.) from the known P (.). It is easy to
show that the induced PX is an authentic probability set function in the spirit
of Kolmogorovs axioms.
Remarks:
1. The random variable is X; the value it takes is the real number x. The
one is a completely dierent entity from the other.
2. The expression P (X = x) will be used to indicate the probability that
the application of the random variable X results in an outcome with
assigned value x; or, more simply, the probability that the random
variable X takes on a particular value x. As such, X = x should
not be confused with the familiar arithmetic statement of equality or
equivalence.
3. In many instances, the starting point is the space V and not the tedious
sample space , with PX (.) already dened so that there is no further
need for reference to a P (.) dened on .

92

Random Phenomena

Let us illustrate these concepts with some examples.


Example 4.1 RANDOM VARIABLE AND INDUCED PROBABILITY FUNCTION FOR COIN TOSS EXPERIMENT
The experiment in Example 3.1 in Chapter 3 involved tossing a coin 3
times and recording the number of observed heads and tails. From the
sample space obtained there, dene a random variable X as the total
number of tails obtained in the 3 tosses. (1) Obtain the new space V
and, (2) if A is the event that X = 2, determine the probability of this
events occurrence.
Solution:
(1) Recall from Example 3.1 that the sample space is given by
= {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }

(4.5)

consisting of all possible 8 outcomes, represented respectively, as i ; i =


1, 2, . . . , 8, i.e.
= {1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 }.

(4.6)

This is clearly one of the tedious types, not as conveniently amenable


to mathematical manipulation. And now, by denition of X, we see that
X(1 ) = 0; X(2 ) = X(3 ) = X(4 ) = 1; X(5 ) = X(6 ) = X(7 ) =
2; X(8 ) = 3, from where we now obtain the space V as:
V = {0, 1, 2, 3}

(4.7)

since these are all the possible values that X can take.
(2) To obtain PX (A), rst we nd A , the pre-image of A in . In this
case,
(4.8)
A = {5 , 6 , 7 }
so that upon recalling the probability set function P (.) generated in
Chapter 3 on the assumption of equiprobable outcomes, we obtain
P (A ) = 3/8, hence,
PX (A) = P (A ) = 3/8

(4.9)

The next two examples illustrate sample spaces that occur naturally in the
form of V .
Example 4.2 SAMPLE SPACE FOR SINGLE DIE TOSS EXPERIMENT
Consider an experiment in which a single die is thrown and the outcome
is the number that shows up on the dies top face when it comes to rest.
Obtain the sample space of all possible outcomes.
Solution:
The required sample space is the set {1, 2, 3, 4, 5, 6}, since this set
of numbers is an exhaustive collection of all the possible outcomes of
this experiment. Observe that this is a set of real numbers, so that it
is already in the form of V . We can therefore dene a probability set
function directly on it, with no further need to obtain a separate V and
an induced PX (.).

Random Variables and Distributions

93

Strictly speaking, this last example did involve an implicit application


of a random variable, if we acknowledge that the primitive outcomes for the
die toss experiment are actually dots on the top face of the die. However, by
pre-specifying the outcome as a count of the dots shown on the resting top
face, we simply skipped a step and went straight to the result of the application
of the random variable transforming the dots to the count. The next example
also involves similar die tosses, but in this case, the application of the random
variable is explicit, following the implicit one that automatically produces
numbers as the de-facto outcomes.
Example 4.3 SAMPLE SPACE AND RANDOM VARIABLE
FOR DOUBLE DICE TOSS EXPERIMENT
Consider an experiment in which two dice are thrown at the same time,
and the outcome is an ordered pair of the numbers that show up on
each dies top face after coming to rest. Assume that we are careful to
specify and distinguish a rst die (black die with white spots) from
the second (white die with black spots) (1) Obtain the sample space
of all possible outcomes. (2) Dene a random variable X as the sum of
numbers that show up on the two dice; obtain the new space V arising
as a result of the application of this random variable.
Solution:
(1) The original sample space, , of the raw outcomes, is given by
= {(1, 1), (1, 2), . . . (1, 6); (2, 1), (2, 2), . . . ; . . . , (6, 6)}

(4.10)

a set of the 36 ordered pairs of all possible outcomes (n1 , n2 ), where


n1 is the number showing up on the face of the rst die, and n2 is the
number on the second die. (Had it not been possible to distinguish a
rst die from the second, outcomes such as (2,1) and (1,2) could
not have been distinguishable, and will contain only 21 elements, the
6 diagonal and one set of the 15 o-diagonal elements of the 66 matrix
of ordered pairs.)
The elements of this set are clearly real numbers the 2dimensional kind and are already amenable to mathematical manipulation. The denition of a random variable X in this case is therefore
not for purposes of converting to a more mathematically convenient
form; the random variable denition is a reection of what aspect of the
experiment is of interest to us.
(2) By the denition of X, we see that the required space V is:
V = {2, 3, 4, . . . , 12}

(4.11)

a set containing 11 elements, a collection of all the possible values that


X can take in this case.

As an exercise, (see Exercise 4.7) the reader should compute the probability
PX (A) of the event A that X = 7, assuming equiprobable outcomes for each
die toss.

94

4.1.2

Random Phenomena

Practical Considerations

Rigor and precision are intrinsic to mathematics and mathematical analysis; without the former, the latter simply cannot exist. Such is the case with
the mathematical concept of the random variable as we have just presented
it: rigor demands that X be specied in this manner, as a function through
whose agency each element of the sample space of an experiment becomes
associated with an unambiguous numerical value. As illustrated in Fig 4.1, X
therefore appears as a mapping from one space, , that can contain all sorts
of raw objects, into one that is more conducive to mathematical analysis, V ,
containing only real numbers. Such a formal denition of the random variable
tends to appear sti, and almost sterile; and those encountering it for the rst
time may be unsure of what it really means in practice.
As a practical matter, the random variable may be considered (informally)
as an experimental outcome whose numerical value is subject to random variations with each exact replicate performance (trial) of the experiment. Thus,
for example, with the three coin-toss experiment discussed earlier, by specifying the outcome of interest as the total number of tails observed, we see
right away that the implied random variable can take on numerical values 0,
1, 2, or 3, even though the raw outcomes will consist of T s and Hs; also what
value the random variable takes is subject to random variation each time the
experiment is performed. In the same manner, we see that in attempting to
determine the temperature of an equilibrium mixture of ice and water, the observed temperature measurement in C takes on numerical values that vary
randomly around the number 0.

4.1.3

Types of Random Variables

A random variable can be either discrete or continuous, as determined


by the nature of the space V . For a discrete random variable, the space V
consists of isolated pointsisolated in the sense that, on the real line, every
neighborhood of each point contains no other point of V . For instance, in
Example 4.1 above, the random variable X can only take values 0, 1, 2, or 3;
it is therefore a discrete random variable.
On the other hand, the space V associated with a continuous random
variable consists of an interval of the real line, or, in higher dimensions, a set
of intervals. For example, let be dened as:
= { : 1 1}

(4.12)

If we dene a random variable X as:


X() = 1 ||,

(4.13)

observe that the random variable space V in this case is given by:
V = {x : 0 x 1}.

(4.14)

Random Variables and Distributions

95

This is an example of a continuous random variable.


Random variables can also be dened in higher dimensions. For example,
given a sample space with a probability set function P (.) dened on its
subsets, a two-dimensional random variable is a function dened on which
assigns one and only one ordered number pair (X1 (), X2 ()) to each element
. Associated with this random variable is a space V and a probability
set function PX induced by X = (X1 , X2 ), where V is dened as:
V = {(x1 , x2 ) : X1 () = x1 , X2 () = x2 ; }

(4.15)

The following is a simple example of a two-dimensional random variable.


Example 4.4 A 2-DIMENSIONAL RANDOM VARIABLE
AND ITS SAMPLE SPACE
Revisit Example 4.1 and the problem discussed therein involving tossing a coin 3 times and recording the number of observed heads and
tails; dene the following 2-dimensional random variable: X1 = total
number of tails; X2 = total number of heads. Obtain the sample space
V associated with this random variable.
Solution:
The required sample space in this case is:
V = {(0, 3), (1, 2), (2, 1), (3, 0)}.

(4.16)

Note that the two component random variables X1 and X2 are not
independent since their sum, X1 + X2 , by virtue of the experiment, is
constrained to equal 3 always.

What is noted briey here for two dimensions can be generalized to ndimensions, and the next chapter is devoted entirely to a discussion of multidimensional random variables.

4.2
4.2.1

Distributions
Discrete Random Variables

Let us return once more to Example 4.1 and, this time, for each element
of V , compute P (X = x), and denote this by f (x); i.e.
f (x) = P (X = x)

(4.17)

Observe that P (X = 0) = P (0 ) where 0 = {1 }, so that


f (0) = P (X = 0) = 1/8

(4.18)

Similarly, we obtain P (X = 1) = P (1 ) where 1 = {2 , 3 , 4 }, so that:


f (1) = P (X = 1) = 3/8

(4.19)

96

Random Phenomena

Likewise,
f (2) = P (X = 2) = 3/8

(4.20)

f (3) = P (X = 3) = 1/8

(4.21)

This function, f (x), indicates how the probabilities are distributed over the
entire random variable space.
Of importance also is a dierent, but related, function, F (x), dened as:
F (x) = P (X x)

(4.22)

the probability that the random variable X takes on values less than or equal
to x. For the specic example under consideration, we have: F (0) = P (X
0) = 1/8. As for F (1) = P (X 1), since the event A = {X 1} consists of
two mutually exclusive elementary events A0 = {X = 0} and A1 = {X = 1},
it then follows that:
F (1) = P (X 1) = P (X = 0) + P (X = 1) = 1/8 + 3/8 = 4/8

(4.23)

By similar arguments, we obtain:


F (2) = P (X 2) = P (X = 0) + P (X = 1) + P (X = 2) = 7/8 (4.24)
F (3) = P (X 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 8/8

(4.25)

These results are tabulated in Table 4.1.

TABLE 4.1:
f (x) and F (x) for
the three coin-toss
experiments of
Example 4.1
x
f (x) F (x)
0
1/8
1/8
1
3/8
4/8
2
3/8
7/8
3
1/8
8/8
The function, f (x), is referred to as the probability distribution function
(pdf), or sometimes as the probability mass function; F (x) is known as the
cumulative distribution function, or sometimes simply as the distribution function.
Note, once again, that X can assume only a nite number of discrete
values, in this case, 0, 1, 2, or 3; it is therefore a discrete random variable,
and both f (x) and F (x) are discrete functions. As shown in Fig 4.2, f (x) is
characterized by non-zero spikes at values of x = 0, 1, 2 and 3, and F (x) by
the indicated staircase form.

Random Variables and Distributions


0.40

97

1.0

0.35
0.8
0.30
F(x)

f(x)

0.6
0.25

0.4
0.20
0.2

0.15

0.0

0.10
0.0

0.5

1.0

1.5
x

2.0

2.5

3.0

2
x

FIGURE 4.2: Probability distribution function, f (x), and cumulative distribution function, F (x), for 3-coin toss experiment of Example 4.1

Let x0 = 0, x1 = 1, x2 = 2, x3 = 3; then
P (X = xi ) = f (xi ) for i = 0, 1, 2, 3

(4.26)

with f (xi ) given explicitly as:

1/8;

3/8;
f (xi ) =
3/8;

1/8;

x0
x1
x2
x3

=0
=1
=2
=3

(4.27)

and the two functions in Table 4.1 are related explicitly according to the
following expression:

F (xi ) =

i


f (xj )

(4.28)

j=0

We may now also note the following about the function f (xi ):
f (xi ) > 0; xi

3
i=0

f (xi ) = 1

These ideas may now be generalized beyond the specic example used above.

98

Random Phenomena
Denition: Let there exist a sample space (along with a probability set function, P , dened on its subsets), and a random variable X, with an attendant random variable space V : a function f
dened on V such that:
1. f (x) 0; x V ;

x f (x) = 1; x V ;

3. PX (A) = xA f (x); for A V (and when A contains the
single element xi , PX (X = xi ) = f (xi ))

2.

is called a probability distribution function of the random variable


X.

Upon comparing these formal statements regarding f (x) to the 3 axioms


of Kolmogorov (regarding the probability set function P dened on ) given
earlier in Chapter 3, we readily see that these are the same concepts extended
from to V for the random variable X.

4.2.2

Continuous Random Variables

For the continuous random variable X, because it takes on a continuum


of values, not discrete points as with the discrete counterpart, the concepts
presented above are modied as follows, primarily by replacing sums with
integrals:

Denition: The function f dened on the space V (whose elements consist of segments of the real line) such that:
1. f (x) 0; x V ;
2. f has at most a nite number of discontinuities in every nite
interval;

3. The (Riemann) integral, V f (x)dx = 1;

4. PX (A) = A f (x)dx; for A V
is called a probability density function of the continuous random
variable X.

(The second point above, unnecessary for the discrete case, is a mathematical ne point needed to safeguard against pathological situations where the

Random Variables and Distributions

99

probability measure becomes undened; it is hardly ever an issue in most


practical applications.)
In this case, the expression for the cumulative distribution function, F (x),
corresponding to that in Eq (4.28), is:
 xi
F (xi ) = P (X xi ) =
f (x)dx
(4.29)

from where we may now observe that when F (x) possesses a derivative,
dF (x)
= f (x)
dx

(4.30)

This f (x) is the continuous counterpart of the discrete f (x) encountered earlier; but rather than express the probability that X takes on a particular point
value xi (as in the discrete case), the continuous f (x) expresses a measure of
the probability that X lies in the innitesimal interval between xi and xi + dx.
Observe, from item 4 in the denition given above, that:

P (xi X xi + dx) =

xi +dx

f (x)dx f (xi )dx

(4.31)

xi

for a very small interval size dx.


In general, because the event A = {X x + dx} can be decomposed into
2 mutually exclusive events B = {x X} and C = {x X x + dx}, so
that:
{X x + dx} = {x X} {x X x + dx},
(4.32)
we see that:
P (X x + dx)
F (x + dx)

=
=

P (x X) + P (x X x + dx)
F (x) + P (x X x + dx)

(4.33)

and therefore:
P (x X x + dx) = F (x + dx) F (x)

(4.34)

which, upon introducing Eq (4.31) for the LHS, dividing by dx, and taking
limits as dx 0, yields:


F (x + dx) F (x)
dF (x)
lim
= f (x)
(4.35)
=
dx0
dx
dx
establishing Eq (4.30).
In general, we can use Eq (4.29) to establish that, for any arbitrary b a,

P (a X b) =

f (x)dx = F (b) F (a).


a

(4.36)

100

Random Phenomena

For the sake of completeness, we note that F (x), the cumulative distribution function, is actually the more fundamental function for determining
probabilities. This is because, regardless of whether X is continuous or discrete, F (.) can be used to determine all desired probabilities. Observe from
the foregoing discussion that the expression
P (a1 < X a2 ) = F (a2 ) F (a1 )

(4.37)

will hold true whether X is continuous or discrete.


From now on in this book, we will simply talk of the probability distribution function (or pdf for short) for all random variables X (continuous or
discrete) and mean by this, probability distribution function if X is discrete,
and probability density function if X is continuous and expect that the context
will make clear what we mean.

4.2.3

The Probability Distribution Function

We have now seen that the pdf f (x) (or equivalently, the cdf F (x)) is the
function that indicates how the probabilities of occurrence of various outcomes
and events arising from the random phenomenon in question are distributed
over the entire space of the associated random variable X.
Let us return once more to the three coin-toss example: we understand
that the random phenomenon in question is such that we cannot predict, `
apriori, the specic outcome of each experiment; but from the ensemble aggregate of all possible outcomes, we have been able to characterize, with f (x), the
behavior of an associated random variable of interest, X, the total number
of tails obtained in the experiment. (Note that other random variables could
also be dened for this experiment: for example, the total number of heads,
or the number of tosses until the appearance of the rst head, etc.) What
Table 4.1 provides is a complete description of the probability of occurrence
for the entire collection of all possible events associated with this random
variablea description that can now be used to analyze the particular random phenomenon of the total number of tails observed when a coin is tossed
three times.
For instance, the pdf f (x) indicates that, even though we cannot predict a
specic outcome precisely, we now know that after each experiment, observing
no tails (X = 0) is just as likely as observing all tails (X = 3), each with
a probability of 1/8. Also, observing two tails is just as likely as observing
one tail, each with a probability of 3/8, so that these latter group of events
are three times as likely as the former group of events. Note the symmetry of
the distribution of probabilities indicated by f (x) for this particular random
phenomenon.
It turns out that these specic results can be generalized for the class of
random phenomena to which the three coin-toss example belongs a class
characterized by the following features:

Random Variables and Distributions

101

1. each experiment involves n identical trials (e.g. coin tosses, or number


of fertilized embryos transferred in an in-vitro fertilization (IVF) treatment cycle, etc), and each trial can produce only two mutually exclusive
outcomes: S (success) or F (failure);
2. the probability of success in each trial is p; and,
3. the outcome of interest, X, is the total of successes observed (e.g. tails
in coin tosses, live births in IVF, etc).
As we show a bit later, the pdf characterizing this family of random phenomena is given by:
f (x) =

n!
px (1 p)nx ; x = 0, 1, 2, . . . , n
x!(n x)!

(4.38)

The results in Table 4.1 are obtained for the special case n = 3; p = 0.5.
Such functions as these provide convenient and compact mathematical
representations of the desired ensemble behavior of random variables; they
constitute the centerpiece of the probabilistic framework the fundamental
tool used for analyzing random phenomena.
We have, in fact, already encountered in earlier chapters, several actual
pdfs for some real-world random variables. For example, we had stated in
Chapter 1 (thus far without justication) that the continuous random variable
representing the yield obtained from the example manufacturing processes has
the pdf:
(x)2
1
f (x) = e 22 ; < x <
(4.39)
2
We are able to use this pdf to compute the probabilities of obtaining yields
in various intervals on the real line for the two contemplated processes, once
the parameters and are specied for each process.
We had also stated in Chapter 1 that, for the (discrete) random variable
X representing the number of inclusions found on the manufactured glass
sheet, the pdf is:
e x
; x = 0, 1, 2, . . .
(4.40)
f (x) =
x!
from which, again, given a specic value for the parameter , we are able to
compute the probabilities of nding any given number of inclusions on any
selected glass sheet. And in Chapter 2, we showed, using chemical engineering
principles, that the pdf for the (continuous) random variable X, representing
the residence time in an ideal CSTR, is given by:
f (x) =

1 x/
e
;0 < x <

(4.41)

an expression that is used in practice for certain aspects of chemical reactor


design.

102

Random Phenomena

These pdfs are all ideal models of the random variability associated with
each of the random variables in question; they make possible rigorous and
precise mathematical analyses of the ensemble behavior of the respective random phenomena. Such mathematical representations are systematically derived for actual, specic real-world phenomena of practical importance in Part
III, where the resulting pdfs are also discussed and analyzed extensively.
The rest of this chapter is devoted to taking a deeper look at the fundamental characteristics and general properties of the pdf, f (x), for singledimensional random variables; the next chapter is devoted to a parallel treatment for multi-dimensional random variables.

4.3

Mathematical Expectation

We begin our investigations into the fundamental characteristics of a random variable, X, and its pdf, f (x), with one of the most important: the
mathematical expectation or expected value. As will soon become clear,
the concept of expectations of random variables (or functions of random variables) is of signicant practical importance; but before giving a formal denition, we rst provide a motivation and an illustration of the concept.

4.3.1

Motivating the Denition

Consider a game where each turn involves a player drawing a ball at random from a black velvet bag containing 9 balls, identical in every way except
that 5 are red, 3 are blue and one is green. The player receives $1.00 for
drawing a red ball, $4.00 for a blue ball, and $10.00 for the green ball, but
each turn at the game costs $4.00 to play. The question is: Is this game worth
playing?
The primary issue, of course, is the random variation in the color of the
drawn ball each time the game is played. Even though simple and somewhat
articial, this example provides a perfect illustration of how to solve problems
involving random phenomena using the probabilistic framework.
To arrive at a rational decision regarding whether to play this game or
not, we proceed as follows, noting rst the following characteristics of the
phenomenon in question:
Experiment : Draw a ball at random from a bag containing 9 balls composed as given above; note the color of the drawn ball, then replace the
ball;
Outcome: The color of the drawn ball: R = Red; B = Blue; G = Green.
Probabilistic Model Development

Random Variables and Distributions

103

TABLE 4.2:
The pdf f (x) for
the ball-drawing
game
x
f (x)
1
5/9
4
3/9
10
1/9

From the problem denition, we see that the sample space is given by:
= {R, R, R, R, R, B, B, B, G}

(4.42)

The random variable, X, is clearly the monetary value assigned to the outcome
of each draw; i.e. in terms of the formal denition, X assigns the real number
1 to R, 4 to B, and 10 to G. (Informally, we could just as easily say that X is
the amount of money received upon each draw.) The random variable space
V is therefore given by:
V = {1, 4, 10}
(4.43)
And now, since there is no reason to think otherwise, we assume that each
outcome is equally probable, in which case the probability distribution for the
random variable X is obtained as follows:
PX (X = 1) = P (R) =
PX (X = 4) = P (B)
PX (X = 10) = P (G)

=
=

5/9

(4.44)

3/9
1/9

(4.45)
(4.46)

so that f (x), the pdf for this discrete random variable, is as shown in the
Table 4.2, or, mathematically as:

5/9; x1 = 1

3/9; x2 = 4
(4.47)
f (xi ) =
1/9; x3 = 10

0;
otherwise
This is an ideal model of the random phenomenon underlying this game; it
will now be used to analyze the problem and to decide rationally whether to
play the game or not.
Using the Model
We begin by observing that this is a case where it is possible to repeat the
experiment a large number of times; in fact, this is precisely what the person
setting up the game wants each player to do: play the game repeatedly! Thus,
if the game is played a very large number of times, say n, it is reasonable from
the model to expect 5n/9 red ball draws, 3n/9 blue ball draws, and n/9 green

104

Random Phenomena

ball draws; the corresponding nancial returns will be $(5n/9), $(4 3n/9),
and $(10 n/9), respectively, in each case.
Observe now that after n turns at the game, we would expect the total
nancial returns in dollars, say Rn , to be:
Rn =



3n
n
5n
+4
+ 10
1
= 3n
9
9
9

(4.48)

These results are summarized in Table 4.3.

TABLE 4.3:

Summary analysis for the ball-drawing

game
Ball
Color

Expected # of Financial
times drawn
returns
(after n trials) per draw
Red
5n/9
1
Blue
3n/9
4
n/9
10
Green
Total

Expected nancial
returns
(after n trials)
$5n/9
$12n/9
$10n/9
3n

In the meantime, the total cost Cn , the amount of money, in dollars, paid
out to play the game, would have been 4n. On the basis of these calculations,
therefore, the expected net gain (in dollars) after n trials, Gn , is given by
Gn = Rn Cn = n

(4.49)

indicating a net loss of $n, so that the rational decision is not to play the
game. (The house always wins!)
Eq (4.48) implies that the expected return per draw will be:
Rn
=
n



5
3
1
1 + 4 + 10
= 3,
9
9
9

(4.50)

a sum of all possible values of the random variable X, weighted by their


corresponding probabilities, i.e. from Eq (4.47),
3


Rn
=
xi f (xi )
n
i=1

(4.51)

This quantity is known as the expected value, or the mathematical expectation


of the random variable X a weighted average of the values taken by X with
the respective probabilities of obtaining each value as the weights.
We are now in a position to provide a formal denition of the mathematical
expectation.

Random Variables and Distributions

4.3.2

105

Denition and Properties

The expected value, or mathematical expectation, of a random variable,


denoted by E(X), is dened for a discrete random variable as:

xi f (xi )
(4.52)
E(X) =
i

and for a continuous random variable,



xf (x)dx
E(X) =

(4.53)

provided that the following conditions hold:



|xi |f (xi ) <

(4.54)

(known as absolute convergence) for discrete X, and



|x|f (x)dx <

(4.55)

(absolute integrability) for continuous X. If these conditions are not satised,


then E(X) does not exist for X.
If the pdf is interpreted as an assignment of weights to point values of a
discrete random variable, or intervals of a continuous random variable, then
observe that E(X) is that value of the random variable that is the center of
gravity of the distribution.
Some important points to note about the mathematical expectation:
1. E(X) is not a random variable; it is an exactly dened real number;
2. When X has units, E(X) has the same units as X;
3. E(X) is often called the mean value of the random variable (or equivalently, of its distribution f (x)), represented as (X) or simply , thus:
E(X) = (X)

(4.56)

Example 4.5 EXPECTED VALUE OF TWO DISCRETE


RANDOM VARIABLES
(1) Find the expected value of the random variable, X, the total number of tails observed in the three coin-toss experiment, whose pdf f (x)
is given in Table 4.1 and in Eq (4.27).
(2) Find the expected value of the random variable, X, the nancial
returns on the ball-draw game, whose pdf f (x) is given in Eq (4.47).
Solution:
(1) From the denition of E(X), we have in this case,
E(X) = (0 1/8 + 1 3/8 + 2 3/8 + 3 1/8) = 1.5

(4.57)

106

Random Phenomena
indicating that with this experiment, the expected, or average, number
of tails per toss is 1.5, which makes perfect sense.
(2) The expected nancial return for the ball-draw game is obtained
formally from Eq (4.47) as:
E(X) = (1 5/9 + 4 3/9 + 10 1/9) = 3.0

(4.58)

as we had obtained earlier.


Example 4.6 EXPECTED VALUE OF TWO CONTINUOUS
RANDOM VARIABLES
(1) Find the expected value of the random variable, X, whose pdf f (x)
is given by:
1
2 x; 0 < x < 2
(4.59)
f (x) =

0;
otherwise
(2) Find the expected value of the random variable, X, the residence
time in a CSTR, whose pdf f (x) is given in Eq (4.41).
Solution:
(1) First, we observe that Eq (4.59) is a legitimate pdf because
2
 2

1
1 
f (x)dx =
(4.60)
xdx = x2  = 1
4

0 2
0
and, by denition,
2


1 2 2
1 
4
E(X) =
xf (x)dx =
x dx = x3  =
2 0
6
3

0
(2) In the case of the residence time,


1 x/
1 x/
xe
E(X) =
dx =
xe
dx
0

(4.61)

(4.62)

since the random variable X, residence time, takes no negative values.


Upon integrating the RHS by parts, we obtain,
 



ex/ dx = 0 ex/  =
(4.63)
E(X) = xex/  +
0

indicating that the expected, or average, residence time is the reactor


parameter , providing justication for why this parameter is known in
chemical reactor design as the mean residence time.

An important property of the mathematical expectation of a random variable X is that for any function of this random variable, say G(X),

for discrete X

i G(xi )f (xi );
E[G(X)] =
(4.64)

G(x)f
(x);
for
continuous
X

provided that the conditions of absolute convergence and absolute integrability


stated earlier for X in Eqs (4.54) and (4.55), respectively, hold for G(X).

Random Variables and Distributions

107

In particular, if G(X) is a linear function, say for example,


G(X) = c1 X + c2

(4.65)

where c1 and c2 are constants, then from Eq (4.64) above, in the discrete case,
we have that:

(c1 xi + c2 )f (xi )
E(c1 X + c2 ) =
i

c1

xi f (xi ) + c2

f (xi )

c1 E(X) + c2

(4.66)

so that:
E(c1 X + c2 ) = c1 E(X) + c2

(4.67)

Thus, treated like an operator, E(.) is a linear operator. Similar arguments


follow for the continuous case, replacing sums with appropriate integrals (see
end-of-chapter Exercise 4.12).

4.4

Characterizing Distributions

One of the primary utilities of the result in Eq (4.64) is for obtaining


certain useful characteristics of the pdf f (x) by investigating the expectations
of special cases of G(X).

4.4.1

Moments of a Distributions

Consider rst the case where G(X) in Eq (4.64) is given as:


G(X) = X k

(4.68)

for any integer k. The expectation of this function is known as the k th (ordinary) moment of the random variable X (or, equivalently, the k th (ordinary)
moment of the pdf, f (x)), dened by:
mk = E[X k ]

(4.69)

First (Ordinary) Moment: Mean


Observe that m0 = 1 always for all random variables, X, and provided that
E[|X|k ] < , then the other k moments exist; in particular, the rst moment
m1 = E(X) =

(4.70)

Thus, the expected value of X, E(X), is also the same as the rst (ordinary)

108

Random Phenomena

moment of X (or, equivalently, of the pdf f (x)).


Central Moments
Next, consider the case where G(X) in Eq (4.64) is given as:
G(X) = (X a)k

(4.71)

for any constant value a and integer k. The expectation of this function is
known as the k th moment of the random variable X about the point a (or,
equivalently, the k th moment of the pdf, f (x), about the point a). Of particular
interest are the moments about the mean value , dened by:
k = E[(X )k ]

(4.72)

known as the central moments of the random variable X (or of the pdf, f (x)).
Observe from here that 0 = 1, and 1 = 0, always, regardless of X or ; these
therefore provide no particularly useful information regarding the characteristics of any particular X. However, provided that the conditions of absolute
convergence and absolute integrability hold, the higher central moments exist
and do in fact provide very useful information about the random variable X
and its distribution.
Second Central Moment: Variance
Observe from above that the quantity
2 = E[(X )2 ]

(4.73)

is the lowest central moment of the random variable X that contains any
meaningful information about the average deviation of a random variable
from its mean value. It is called the variance of X and is sometimes represented
as 2 (X). Thus,

Note that

2 = E[(X )2 ] = V ar(X) = 2 (X).

(4.74)

2 (X) = E[(X )2 ] = E(X 2 2X + 2 )

(4.75)

so that by the linearity of the E[.] operator, we obtain:


2 (X) = E(X 2 ) 2 = E(X 2 ) [E(X)]2

(4.76)

or, in terms of the ordinary moments,


2 (X) = m2 2

(4.77)

It is easy to verify the following important properties of V ar(X):


1. For constant b,
V ar(b) = 0

(4.78)

Random Variables and Distributions

109

2. For constants a and b,


V ar(aX + b) = a2 V ar(X)

(4.79)

The positive square root of 2 is called the standard deviation of X, and


naturally represented by ; it has the same units as X. The ratio of the
standard deviation to the mean value of a random variable, known as the
coecient of variation Cv , i.e.,
Cv =

(4.80)

provides a dimensionless measure of the relative amount of variability displayed by the random variable.
Third Central Moment: Skewness
The third central moment,
3 = E[(X )3 ]

(4.81)

is called the skewness of the random variable; it provides information about


the relative dierence that exists between negative and positive deviations
from the mean. It is therefore a measure of asymmetry. The dimensionless
quantity
3
(4.82)
3 = 3

known as the coecient of skewness, is often the more commonly used measure
precisely because it is dimensionless. For a perfectly symmetric distribution,
negative deviations from the mean exactly counterbalance positive deviations,
and both 3 and 3 vanish.
When there are more values of X to the left of the mean than to the
right, (i.e. when negative deviations from the mean dominate), 3 < 0 (as is
3 ), and the distribution is said to skew left or is negatively skewed. Such
distributions will have long left tails, as illustrated in Fig 4.3. An example
random variable with this characteristic is the gasoline-mileage (in miles per
gallon) of cars in the US. While many cars get relatively high gas-mileage,
there remains a few classes of cars (SUVs, Hummers, etc) with gas-mileage
much worse than the ensemble average. It is this latter class that contribute
to the long left tail.
On the other hand, when there are more values of X to the right of the
mean than to the left, so that positive deviations from the mean dominate,
both 3 and 3 are positive, and the distribution is said to skew right
or is positively skewed. As one would expect, such distributions will have
long right tails (see Fig 4.4). An example of this class of random variables
is the household income/net-worth in the US. While the vast majority of
household incomes/net-worth are moderate, the few truly super-rich whose
incomes/net-worth are a few orders of magnitude larger than the ensemble

110

Random Phenomena

3.0

2.5

f(x)

2.0

1.5

1.0

0.5

0.0

0.0

0.2

0.4

0.6

0.8

1.0

FIGURE 4.3: Distribution of a negatively skewed random variable

3.0

2.5

f(x)

2.0

1.5

1.0

0.5

0.0

0.0

0.2

0.4

0.6

0.8

1.0

FIGURE 4.4: Distribution of a positively skewed random variable

Random Variables and Distributions

111

0.4

f(x)

0.3

0.2

0.1

0.0

-3

-2

-1

0
X

FIGURE 4.5: Distributions with reference kurtosis (solid line) and mild kurtosis (dashed
line)

average contribute to the long right tail.


Fourth Central Moment: Kurtosis
The fourth central moment,
4 = E[(X )4 ]

(4.83)

is called the kurtosis of the random variable. Sometimes, it is the dimensionless


version
4
(4.84)
4 = 4 ,

technically known as the coecient of kurtosis, that is simply called the kurtosis. Either quantity is a measure of how peaked or at a probability distribution is. A high kurtosis random variable has a distribution with a sharper
peak and thicker tails; the low kurtosis random variable on the other hand
has a distribution with a more rounded, atter peak, with broader shoulders.
For reasons discussed later, the value 4 = 3 is the accepted normal
reference for kurtosis, so that distributions for which 4 < 3 are said to be
platykurtic (mildly peaked) while those for which 4 > 3 are said to be leptokurtic (sharply peaked). Figures 4.5 and 4.6 show a reference distribution
with kurtosis 4 = 3, in the solid lines, compared to a distribution with mild
kurtosis (actually 4 = 1.8) (dashed line in Fig 4.5), and a distribution with
high kurtosis (dashed line in Fig 4.6).
Practical Applications
Of course, it is possible to compute as many moments (ordinary or central) of

112

Random Phenomena

0.4

f(x)

0.3

0.2

0.1

0.0

-10

-5

0
X

10

FIGURE 4.6: Distributions with reference kurtosis (solid line) and high kurtosis (dashed
line)

a distribution as we wish and we shall shortly present a general expression


from which one can generate all such moments; but the four specically singled
out above have been the most useful for characterizing random variables and
their distributions, in practice. They tell us much about the random variable
we are dealing with.
The rst (ordinary) moment, m1 or , tells us about the location of the
center of gravity (centroid) of the random variable, its mean value; and, as
we show later, it is a popular candidate for the single value most representative
of the ensemble. The second central moment, 2 or 2 , the variance, tells us
how tightly clustered or broadly dispersed the random variable is around
its mean. The third central moment, 3 , the skewness, tells us whether lower
extreme values of the random variable are farther to the left of the centroid
(the ensemble average) than the higher extreme values are to the right (as is
the case with automobile gas-mileage in the US), or vice versa, with higher
extreme values signicantly farther to the right of the centroid than the lower
extreme values (as is the case with household incomes/net worth in the US).
Just like the third central moment tells us how much of the average deviation from the mean is due to infrequent extreme values, the fourth central
moment, 4 (the kurtosis) tells us how much of the variance is due to infrequent extreme deviations. With sharper peaks and thicker tails, extreme
values in the tails contribute more to the variance, and the kurtosis is high (as
in Fig 4.6); with atter peaks and very little in terms of tails (as in Fig 4.5),
there will be more contributions to the variance from central values, which
naturally show modest deviations from the mean, and very little contribution
from the extreme values; the kurtosis will therefore be lower.

Random Variables and Distributions

113

Finally, we note that moments of a random variable are not merely interesting theoretical characteristics; they have signicant practical applications.
For example, polymers, being macromolecules with non-uniform molecular
weights (because random events occurring during the manufacturing process
ensure that polymer molecules grow to varying sizes) are primarily characterized by their molecular weight distributions (MWDs). Not surprisingly, therefore, the performance of a polymeric material depends critically on its MWD:
for instance, with most elastomers, a narrow distribution (very low second
central moments) is associated with poor processing but superior mechanical
properties.
MWDs are so important in polymer chemistry and engineering that a wide
variety of analytical techniques have been developed for experimental determination of the MWD and the following special molecular weight averages
that are in common use:
1. Mn , the number average molecular weight, is the ratio of the rst (ordinary) moment to the zeroth ordinary moment. (In polymer applications,
the MWD, unlike a pdf f (x), is not normalized to sum or integrate to
1. The zeroth moment of the MWD is therefore not 1; it is the total
number of molecules present in the sample of interest.)
2. Mw , the weight average molecular weight, is the ratio of the second
moment to the rst moment; and
3. Mz , the so-called z average molecular weight, is the ratio of the third
moment to the second.
One other important practical characteristic of the polymeric material is its
polydispersity index, PDI, the ratio of Mw to Mn . A measure of the breadth
of the MWD, it is always > 1 and approximately 2 for most linear polymers;
for highly branched polymers, it can be as high as 20 or even higher.
What is true of polymers is also true of particulate products such as granulated sugar, or fertilizer granules sold in bags. These products are made up
of particles with non-uniform sizes and are characterized by their particle size
distributions. The behavior of these products, whether it is their ow characteristics, or how they dissolve in solution, are determined by the moments of
these distributions.

4.4.2

Moment Generating Function

When G(X) in Eq (4.64) is given as:


G(X) = etX
the expectation of this function, when it exists, is the function:
 tx
i
f (xi );
for discrete X;

ie
MX (t) =
tx
e f (x)dx; for continuous X

(4.85)

(4.86)

114

Random Phenomena

a function of the real-valued variable, t, known as the moment generating


function (MGF) of X. MX (t) is so called because all the (ordinary) moments
of X can be generated from it as follows:
By denition,


MX (t) = E etX
(4.87)
and by dierentiating with respect to t, we obtain,



d  tX 
d 

MX
e
= E
(t) = E etX
dt
dt
 tX 
= E Xe

(4.88)

(The indicated swapping of the order of the dierentiation and expectation


operators is allowed under conditions that essentially imply the existence of
the moments.) From here we easily obtain, for t = 0, that:

(0) = E(X) = m1
MX

(4.89)

the rst (ordinary) moment. Similarly, by dierentiating once more, we obtain:



MX
(t) =

so that, for t = 0,



d  tX 
E Xe
= E X 2 etX
dt

 

MX
(0) = E X 2 = m2

(4.90)

(4.91)

and in general, after n such dierentiations, we obtain


(n)

MX (0) = E[X n ] = mn

(4.92)

Now, it is also possible to establish this result by considering the following


Taylor series expansion about the point t = 0,
etX = 1 + Xt +

X2 2 X3 3
t +
t +
2
3!

(4.93)

Clearly, this innite series converges only under certain conditions. For those
random variables, X, for which the series does not converge, MX (t) does not
exist; but when it exists, this series converges, and by repeated dierentiation
of Eq (4.93) with respect to t, followed by taking expectations, we are then
able to establish the result in Eq (4.92).
The following are some important properties of the MGF.
1. Uniqueness: The MGF, MX (t), does not exist for all random variables,
X; but when it exists, it uniquely determines the distribution, so that if
two random variables have the same MGF, they have the same distribution. Conversely, random variables with dierent MGFs have dierent
distributions.

Random Variables and Distributions

115

2. Linear Transformations: If two random variables Y and X are related


according to the linear expression:
Y = aX + b

(4.94)

MY (t) = ebt MX (at)

(4.95)

for constant a and b, then:

3. Independent Sums: For independent random variables X and Y with


respective MGFs MX (t), and MY (t), the MGF of their sum Z = X + Y
is:
(4.96)
MZ (t) = MX+Y (t) = MX (t)MY (t)
Example 4.7 MOMENT GENERATING FUNCTION OF A
CONTINUOUS RANDOM VARIABLE
Find the MGF MX (t), for the random variable, X, the residence time
in a CSTR, whose pdf is given in Eq (4.41).
Solution:
In this case, the required MX (t) is given by:




1 tx x/
1 (1 t)x
e e
dx =
e
dx (4.97)
MX (t) = E etX =
0
0
Upon integrating the RHS appropriately, we obtain,



(1 t)x 
1
MX (t) =
e 
1 t
0
1
=
1 t

(4.98)
(4.99)

From here, one easily obtains: m1 = ; m2 = 2 , . . . , mk = k .

4.4.3

Characteristic Function

As alluded to above, the MGF does not exist for all random variables, a
fact that sometimes limits its usefulness. However, a similarly dened function,
the characteristic function, shares all the properties of the MGF but does not
suer from this primary limitation: it exists for all random variables.
When G(X) in Eq (4.64) is given as:
G(X) = ejtX
(4.100)

where j is the complex variable (1), then the function of the real-valued
variable t dened as,


(4.101)
X (t) = E ejtX

116

Random Phenomena

i.e.
X (t) =

 jtx
i
f (xi );

ie

for discrete X;

for continuous X

ejtx f (x)dx;

(4.102)

is known as the characteristic function (CF) of the random variable X.


Because of the denition of the complex exponential, whereby
ejtX = cos(tX) + j sin(tX)
observe that

(4.103)

 jtX 
e  = cos2 (tX) + sin2 (tX) = 1

(4.104)
 jtX 


= 1 < , always, regardless of X, with the direct impliso that E e
cation that X (t) = E(ejtX ) always exists for all random variables. Thus,
anything one would have typically used the MGF for (e.g., for deriving limit
theorems in advanced courses in probability), one can always substitute the
CF when the MGF does not exist.
The reader familiar with Laplace transforms and Fourier transforms will
probably have noticed the similarities between the former and the MGF (see
Eq (4.86)), and between the latter and the CF (see Eq (4.102)). Furthermore,
the relationship between these two probability functions are also reminiscent
of the relationship between the two transforms: not all functions have Laplace
transforms; the Fourier transform, on the other hand, does not suer such
limitations.
We now state, without proof, that given the expression for the characteristic function in Eq (4.102), there is a corresponding inversion formula whereby
f (x) is recovered from X (t), given as follows:

b jtx
1
e
X (t)dt; for discrete X;
limb 2b
b
(4.105)
f (x) =
1 jtx
e

(t)dt;
for
continuous
X
X
2
In fact, the two sets of equations, Eqs (4.102) and (4.105), are formal Fourier
transform pairs precisely as in other engineering applications of the theory of
Fourier transforms. These transform pairs are extremely useful in obtaining
the pdfs of functions of random variables, most especially sums of random
variables. As with classic engineering applications of the Fourier (and Laplace)
transform, the characteristic functions of the functions of independent random
variables in question are obtained rst, being easier to obtain directly than
the pdfs; the inversion formula is subsequently invoked to recover the desired
pdfs. This strategy is employed at appropriate places in upcoming chapters.

4.4.4

Additional Distributional Characteristics

Apart from the mean, variance and other higher moments noted above,
there are other characteristic attributes of importance.

Random Variables and Distributions


0.4

117

x* = 1 (Mode)

f(x)

0.3

0.2

0.1

0.0

4
X

FIGURE 4.7: The pdf of a continuous random variable X with a mode at x = 1


Mode
The mode, x , of a distribution is that value of the random variable for which
the pdf achieves a (local) maximum. For a discrete random variable, it is
the value of X that possesses the maximum probability (the most popular
value); i.e.
(4.106)
arg max{P (X = x)} = x
x

For a continuous random variable with a dierentiable pdf, it is the value of


x for which
df (x)
d2 f (x)
= 0;
<0
(4.107)
dx
dx2
as shown in Fig 12.21. A pdf having only one such maximum value is said to
be unimodal ; if more than one such maximum value exists, the distribution is
said to be multimodal.
Median
The median of a distribution is that mid-point value xm for which the cumulative distribution is exactly 1/2, i.e.
F (xm ) = P (X < xm ) = P [X > xm ] = 0.5
For a continuous random variable, xm is the value for which
 xm

f (x)dx =
f (x)dx = 0.5

(4.108)

(4.109)

xm

(For the discrete random variable, replace the integral above with appropriate

118

Random Phenomena

100

F(x); Percent

80

75

60
50
40
25

2.140

1.58

1.020

20

FIGURE 4.8: The cdf of a continuous random variable X showing the lower and upper
quartiles and the median
sums.) Observe therefore that the median, xm , divides the total range of the
random variable into two parts with equal probability.
For a symmetric unimodal distribution, the mean, mode and median coincide; they are dierent for asymmetric (skewed) distributions.
Quartiles
The concept of a median, which divides the cdf at the 50% point, can be
extended to other values indicative of other fractional sectioning o of the
cdf. Thus, by referring to the median as x0.5 , or x50 , we are able to dene, in
the same spirit, the following values of the random variable, x0.25 and x0.75
(or, in terms of percentages, x25 and x75 respectively) as follows:
F (x0.25 ) = 0.25

(4.110)

that value of X below which a quarter of the population resides; and


F (x0.75 ) = 0.75

(4.111)

the value of X below which lies three quarters of the population. These values
are known respectively as the lower and upper quartiles of the distribution
because, along with the median x0.5 , these values divide the population into
four quarters, each part with equal probability.
These concepts are illustrated in Fig 4.8 where the lower quartile is located
at x = 1.02; the median at x = 1.58 and the upper quartile at x = 2.14. Thus,
for this particular example, P (X < 1.02 = 0.25); P (1.02 < X < 1.50) =
0.25; P (1.58 < X < 2.14) = 0.25 and P (X > 1.58) = 0.25.

Random Variables and Distributions

119

There is nothing restricting us to dividing the population in halves (median) or in quarters (quartiles); in general, for any 0 < q < 1, the q th quantile
is dened as that value xq of the random variable for which
 xq
F (xq ) =
f (x)dx = q
(4.112)

for a continuous random variable (with the integral replaced by the appropriate sum for the discrete random variable).
This quantity is sometimes dened instead in terms of percentiles, in which
case, the q th quantile is simply the 100q percentile. Thus, the median is equivalently the half quantile, the 50th percentile, or the second quartile.

4.4.5

Entropy

A concept to be explored more completely in Chapter 10 is concerned with


quantifying the information content contained in the statement, X = x,
i.e. that the (discrete) random variable X has been observed to take on the
specic value x. Whatever this information content is, it will clearly be related
to the pdf, f (x); in fact, it has been shown to be dened as:
I[f (x)] = log2 f (x)

(4.113)

Now, when G(X) in Eq (4.64) is dened as:


G(X) = log2 f (x)
then the expectation in this case is the function H(x), dened as:

for discrete X
i f (xi ) log2 f (xi );
H(x) =

f (x) log2 f (x)dx; for continuous X

(4.114)

(4.115)

known as the entropy of the random variable, or, its mean information content.
Chapter 10 explores how to use the concept of information and entropy to
develop appropriate probability models for practical problems in science and
engineering.

4.4.6

Probability Bounds

We now know that the pdf f (x) of a random variable contains all the
information about it to enable us compute the probabilities of occurrence of
various outcomes of interest. As valuable as this is, there are times when all
we need are bounds on probabilities, not exact values. We now discuss some
of the most important results regarding bounds on probabilities that can be
determined for any general random variable, X without specic reference to

120

Random Phenomena

any particular pdf. These results are very useful in analyzing the behavior of
random phenomena and have practical implications in determining values of
unknown population parameters.
We begin with a general lemma from which we then derive two important
results.

Lemma: Given a random variable X (with a pdf f (x)), and G(X)


a function of this random variable such that G(X) > 0, for an
arbitrary constant, c > 0,
P (G(X) c)

E[G(X)]
c

(4.116)

There are several dierent ways of proving this result; one of the most
direct is shown below.
Proof : By denition,


E[G(X)] =

G(x)f (x)dx

(4.117)

If we now divide the real line < x < into two mutually
exclusive regions, A = {x : G(x) c} and B = {x : G(x) < c}, i.e.
A is that region on the real line where G(x) c, and B is what is
left, then, Eq (4.117) becomes:


G(x)f (x)dx +
G(x)f (x)dx
(4.118)
E[G(X)] =
A

and since G(X) is non-negative, the second integral is 0, so that




E[G(X)]
G(x)f (x)dx
cf (x)dx
(4.119)
A

where the last inequality arises because, for all x A, (the region
over which we are integrating) G(x) c, with the net results that:
E[G(X)] cP (G(X) c)

(4.120)

because the last integral is, by denition, cP (A). From here, we


now obtain
E[G(X)]
(4.121)
P [G(X) c]
c
as required.

Random Variables and Distributions

121

This remarkable result holds for all random variables, X, and for any nonnegative functions of the random variable, G(X). Two specic cases of G(X)
give rise to results of special interest.
Markovs Inequality
When G(X) = X, Eq (4.116) immediately becomes:
P (X c)

E(X)
c

(4.122)

a result known as Markovs inequality. It allows us to place bounds on probabilities when only the mean value of a random variable is known. For example,
if the average number of inclusions on glass sheets manufactured in a specic
site is known to be 2, then according to Markovs inequality, the probability
of nding a glass sheet containing 5 or more inclusions at this manufacturing
site can never exceed 2/5. Thus if glass sheets containing 5 or more inclusions
are considered unsaleable, without reference to any specic probability model
of the random phenomenon in question, the plant manager concerned about
making unsaleable product can, by appealing to Markovs inequality, be sure
that things will never be worse than 2 in 5 unsaleable products.
It is truly remarkable, of course, that such statements can be made at all;
but in fact, this inequality is actually quite conservative. As one would expect,
with an appropriate probability model, one can be even more precise. (Table
2.1 in Chapter 2 in fact shows that the actual probability of obtaining 5 or
more inclusions on glass sheets manufactured at this site is 0.053, nowhere
close to the upper limit of 0.4 given by Markovs inequality.)
Chebychevs Inequality
Now let G(X) = (X )2 , and c = k 2 2 , where is the mean value of X,
and 2 is the variance, i.e. 2 = E[(x )2 ]. In this case, Eq (4.116) becomes
P [(X )2 k 2 2 ]

1
k2

(4.123)

which may be simplied to:


P (|X | k)

1
,
k2

(4.124)

a result known as Chebychevs inequality. The implication is that 1/k 2 is


an upper bound for the probability that any random variable will take on
values that deviate from the mean by more than k standard deviations. This
is still a rather weak inequality in the sense that in most cases, the indicated
probability is far less than 1/k 2 . Nevertheless, the added information of known
helps sharpen the bounds a bit, when compared to Markovs inequality. For
example, if we now add to the glass
sheets inclusions information the fact that
the variance is 2 (so that = 2), then, the desired probability P (X 5)

122

Random Phenomena

now translates to P (|X 2| 3) since = 2. In this case, therefore, k = 3,


and from Chebychevs inequality, we obtain:
P (|X 2| 3)

2
2
=
9
9

(4.125)

an upper bound which, even though still conservative, is nevertheless much


sharper than the 2/5 obtained earlier from Markovs inequality.
Chebychevs inequality plays a signicant role in Chapter 8 in establishing
a fundamental result relating relative frequencies in repeatable experiments
to the probabilities of occurrence of events.

4.5

Special Derived Probability Functions

In studying phenomena involving lifetimes (of humans and other living


organisms, or equipment, or, for that matter, social movements), or more
generally in studying the elapsed time until the occurrence of specic events
studies that encompass the related problem of reliability of equipment and
systems the application of probability theory obviously still involves the
use of the pdf f (x) and the cdf F (x), but in specialized forms unique to such
problems. The following is a discussion of special probability functions, derived
from f (x) and F (x), that have been customized for such applications. As a
result, these special probability functions are exclusively for random variables
that are (a) continuous, and (b) non-negative; they do not exist for random
variables that do not satisfy these conditions.

4.5.1

Survival Function

The survival function, S(x), is the probability that the random variable
X exceeds the specic value x; in lifetime applications, this translates to the
probability that the object of study survives beyond the value x, i.e.
S(x) = P (X > x)

(4.126)

From the denition of the cdf, F (x), we see immediately that


S(x) = 1 F (x)

(4.127)

so that where F (x) is a monotonically increasing function of x that starts at 0


and ends at 1, S(x) is the exact mirror image, monotonically decreasing from
1 to 0.
Example 4.8 SURVIVAL FUNCTION OF A CONTINUOUS
RANDOM VARIABLE

Random Variables and Distributions

123

Find the survival function S(x), for the random variable, X, the residence time in a CSTR, whose pdf is given in Eq (4.41). This function
directly provides the probability that any particular dye molecule survives in the CSTR beyond a time x.
Solution:
Observe rst that this random variable is continuous and non-negative
so that the desired S(x) does in fact exist. The required S(x) is given
by

1 x/
e
dx = ex/
(4.128)
S(x) =
x
We could equally well have arrived at the result by noting that the cdf
F (x) for this random variable is given by:
F (x) = (1 ex/ ).

(4.129)

Note from Eq (4.128) that with increasing x (residence time), survival


becomes smaller; i.e. the probability of still nding a dye molecule in
the reactor after a time x has elapsed diminishes exponentially with x.

4.5.2

Hazard Function

In reliability and life-testing studies, it is useful to have a means of directly


computing the probability of failure in the intervals beyond the current time, x,
for entities that have survived thus far; i.e. probabilities of failure conditioned
on survival until x. The hazard function, h(x), dened as follows:
h(x) =

f (x)
f (x)
=
S(x)
1 F (x)

(4.130)

provides just such a function. It does for future failure what f (x) does for
lifetimes in general. Recall that by denition, because X is continuous, f (x)
provides the (unconditional) probability of a lifetime in the innitesimal interval {xi < X < xi + dx} as f (xi )dx; in the same manner, the probability of
failure occurring in that same interval, given that the object of study survived
until the beginning of the current time interval, xi , is given by h(xi )dx. In
general
P (x < X < x + dx)
f (x)dx
=
(4.131)
h(x)dx =
S(x)
P (X > x)
so that, from the denition of conditional probability given in Chapter 3,
h(x)dx is seen as equivalent to P (x < X < x + dx|X > x). h(x) is therefore
sometimes referred to as the death rate of failure rate at x of those surviving until x (i.e. of those at risk at x); it describes how the risk of failure
changes with age.
Example 4.9 HAZARD FUNCTION OF A CONTINUOUS
RANDOM VARIABLE

124

Random Phenomena
Find the hazard function h(x), for the random variable, X, the residence
time in a CSTR.
Solution:
From the given pdf and the survival function obtained in Example 4.8
above, the required function h(x) is given by,
h(x) =

1 x/
e

ex/

(4.132)

a constant, with the interesting implication that the probability that a


dye molecule exits the reactor immediately after time x, given that it
had stayed in the reactor until then, is independent of x. Thus molecules
that have survived in the reactor until x have the same chance of exiting
the rector immediately after this time as the chance of exiting at any
other time in the future no more, no less. Such a random variable
is said to be memoryless; how long it lasts beyond the current time
does not depend on its current age.

4.5.3

Cumulative Hazard Function

Analogous to the cdf, F (x), the cumulative hazard function, H(x), is dened
as:
 x
H(x) =
h(u)du
(4.133)
0

It can be shown that H(x) is related to the more well-known F (x) according
to
(4.134)
F (x) = 1 eH(x)
and that the relationship between S(x) and H(x) is given by:
S(x) = eH(x)

(4.135)

H(x) = log[S(x)]

(4.136)

or, conversely,

4.6

Summary and Conclusions

We are now in a position to look back at this chapter and observe, with
some perspective, how the introduction of the seemingly innocuous random
variable, X, has profoundly aected the analysis of randomly varying phenomena in a manner analogous to how the introduction of the unknown
quantity, x, transformed algebra and the solution of algebraic problems. We
have seen how the random variable, X, maps the sometimes awkward and

Random Variables and Distributions

125

tedious sample space, , into a space of real numbers; how this in turn leads
to the emergence of f (x), the probability distribution function (pdf); and
how f (x) has essentially supplanted and replaced the probability set function,
P (A), the probability analysis tool in place at the end of Chapter 3.
The full signicance of the role of f (x) in random phenomena analysis may
not be completely obvious now, but it will become more so as we progress in
our studies. So far, we have used it to characterize the random variable in
terms of its mathematical expectation, and the expectation of various other
functions of the random variable. And this has led, among other things, to our
rst encounter with the mean, variance, skewness and kurtosis, of a random
variable, important descriptors of data that we are sure to encounter again
later (in Chapter 12 and beyond).
Despite initial appearances, every single topic discussed in this chapter
nds useful application in later chapters. In the meantime, we have taken
pains to try and breathe some practical life into many of these typically dry
and formal denitions and mathematical functions. But if some, especially
the moment generating function, the characteristic function, and entropy, still
appear to be of dubious practical consequence, such lingering doubts will be
dispelled completely by Chapters 6, 8, 9 and 10. Similarly, the probability
bounds (especially Chebyshevs inequality) will be employed in Chapter 8,
and the special functions of Section 4.5 will be used extensively in their more
natural setting in Chapter 23.
The task of building an ecient machinery for random phenomena analysis, which began in Chapter 3, is now almost complete. But before the generic
pdf, f (x), introduced and characterized in this chapter begins to take on
specic, distinct personalities for various random phenomena, some residual
issues remain to be addressed in order to complete the development of the
probability machinery. Specically, the discussion in this chapter will be extended to higher dimensions in Chapter 5, and the characteristics of functions
of random variables will be explored in Chapter 6. Chapter 7 is devoted to
two application case studies that put the complete set of discussions in Part
II in perspective.
Here are some of the main points of the chapter again.
Formally, the random variable, Xdiscrete or continuousassigns to
each element , one and only one real number, X() = x, thereby
mapping onto a new space, V ; informally it is an experimental outcome whose numerical value is subject to random variations with each
exact replicate trial of the experiment.
The introduction of the random variable, X, leads directly to the emergence of f (x), the probability distribution function; it represents how the
probabilities of occurrence of all the possible outcomes of the random
experiment of interest are distributed over the entire random variable
space, and is a direct extension of P (A).

126

Random Phenomena

The cumulative distribution function (cdf), F (x), is P (X x); if dis xi



crete F (xi ) = ij=0 f (xj ); if continuous,
f (x)dx, so that if dierentiable,

dF (x)
dx

= f (x).

The mathematical expectation of a random variable, E(X), is dened


as;
 
discrete;
i xi f (xi );
E(X) =
x(f
(x)dx;
continuous


It exists only when i |xi |f (xi ) < (absolute convergence for discrete

random variables) or |x|(f (x)dx < (absolute integrability for


continuous random variables).

E[G(X)] provides various characterizations of the random variables, X,


for various functions G(X):
G(X) = (X )k yields the k th moment of X;
G(X) = etX and G(X) = ejtX respectively yield the moment generating function (MGF), and the characteristic function (CF), of
X;
G(X) = log2 f (x) yields the entropy of X.
The mean, indicates the central location or center of gravity of the
random variable while the variance, skewness and kurtosis indicate the
shape of the distribution in relation to the mean. Additional characterization is provided by the mode, where the distribution is maximum
and by the median, which divides the distribution into two equal probability halves; the quartiles, which divide the distribution into four equal
probability quarters, or more generally, the percentiles, which divide the
distribution into 100 equal probability portions.
Lifetimes and related phenomena are more conveniently studied with
special probability functions, which include:
The survival function, S(x), the probability that X exceeds the
value x; by denition, it is related to F (x) according to S(x) =
1 F (x);
The hazard function, h(x), which does for future failure probabilities what f (x) does for lifetime probabilities; and
The cumulative hazard function, H(x), which is to the hazard function, h(x), what the cdf F (x) is to the pdf f (x).

Random Variables and Distributions

127

REVIEW QUESTIONS
1. Why is the raw sample space, , often tedious to describe and inecient to analyze mathematically?
2. Through what means is the general sample space converted into a space with real
numbers?
3. Formally, what is a random variable?
4. What two mathematical transformations occur as a consequence of the formal
introduction of the random variable, X?
5. How is the induced probability set function, PX , related to the probability set
function, P , dened on ?
6. What is the pre-image, A , of the set A?
7. What is the relationship between the random variable, X, and the associated real
number, x? What does the expression, P (X = x) indicate?
8. When does the sample space, , naturally occur in the form of the random variable space, V ?
9. Informally, what is a random variable?
10. What is the dierence between a discrete random variable and a continuous one?
11. What is the pdf, f (x), and what does it represent for the random variable, X?
12. What is the relationship between the pdf, f (xi ), and the cdf, F (xi ), for a discrete random variable, X?
13. What is the relationship between the pdf, f (x), and the cdf, F (x), for a continuous random variable, X?
14. Dene mathematically the expected value, E(X), for a discrete random variable
and for a continuous one.
15. What conditions must be satised for E(X) to exist?
16. Is E(X) a random variable and does it have units?
17. What is the relationship between the expected value, E(X), and the mean value,
of a random variable (or equivalently, of its distribution)?
18. Distinguish between ordinary moments and central moments of a random variable.

128

Random Phenomena

19. What are the common names by which the second, third and fourth central
moments of a random variable are known?
20. What is Cv , the coecient of variation of a random variable?
21. What is the distinguishing characteristic of a skewed distribution (positive or
negative)?
22. Give an example each of a negatively skewed and a positively skewed randomly
varying phenomenon.
23. What do the mean, variance, skewness, and kurtosis tell us about the distribution of the random variable in question?
24. What do Mn , Mw , and Mz represent for a polymer material?
25. What is the polydispersity index of a polymer and what does it indicate about
the molecular weight distribution?
26. Dene the moment generating function (MGF) of a random variable, X. Why
is it called by this name?
27. What is the uniqueness property of the MGF?
28. Dene the characteristic function of a random variable, X. What distinguishes
it from the MGF?
29. How are the MGF and characteristic function (CF) of a random variable related
to the Laplace and Fourier transforms?
30. Dene the mode, median, quartiles and percentiles of a random variable.
31. Within the context of this chapter, what is Entropy?
32. Dene Markovs inequality. It allows us to place probability bounds when what
is known about the random variable?
33. Dene Chebychevs inequality.
34. Which probability bound is sharper, the one provided by Markovs inequality
or the one provided by Chebychevs?
35. What are the dening characteristics of those random variables for which the special probability functions, the survival and hazard functions, are applicable? These
functions are used predominantly in studying what types of phenomena?
36. Dene the survival function, S(x). How is it related to the cdf, F (x)?

Random Variables and Distributions

129

37. Dene the hazard function, h(x). How is it related to the pdf, f (x)?
38. Dene the cumulative hazard function, H(x). How is it related to the cdf, F (x),
and the survival function, S(x)?

EXERCISES
Section 4.1
4.1 Consider a family that plans to have a total of three children; assuming that
they will not have any twins, generate the sample space, , for the possible outcomes. By dening the random variable, X as the total number of female children
born to this family, obtain the corresponding random variable space, V . Given that
this particular family is genetically predisposed to having boys, with a probability,
p = 0.75 of giving birth to a boy, obtain the probability that this family will have
three boys and compare it to the probability of having other combinations.
4.2 Revisit Example 4.1 in the text, and this time, instead of tossing a coin three
times, it is tossed 4 times. Generate the sample space, ; and using the same denition of X as the total number of tails, obtain the random variable space, V , and
compute anew the probability of A, the event that X = 2.
4.3 Given the spaces and V for the double dice toss experiment in Example 4.3
in the text,
(i) Compute the probability of the event A that X = 7;
(ii) If B is the event that X = 6, and C the event that X = 10 or X = 11, compute
P (B) and P (C).
Section 4.2
4.4 Revisit Example 4.3 in the text on the double dice toss experiment and obtain
the complete pdf f (x) for the entire random variable space. Also obtain the cdf,
F (x). Plot both distribution functions.
4.5 Given the following probability distribution function for a discrete random variable, X,
x
f (x)

1
0.10

2
0.25

3
0.30

4
0.25

5
0.10

(i) Obtain the cdf F (x).


(ii) Obtain P (X 3); P (X < 3); P (X > 3); P (2 X 4)
4.6 A particular discrete random variable, X, has the cdf
F (x) =

 x k
n

; x = 1, 2, . . . , n

(4.137)

where k and n are constants characteristic of the underlying random phenomenon.


Determine f (x), the pdf for this random variable, and, for the specic values
k = 2, n = 8, compute and plot f (x) and F (x).

130

Random Phenomena

4.7 The random variable, X, has the following pdf:



cx 0 < x < 1
f (x) =
0
otherwise

(4.138)

(i) First obtain the value of the constant, c, required for this to be a legitimate pdf,
and then obtain an expression for the cdf F (x).
(ii) Obtain P (X 1/2) and P (X 1/2).
(iii) Obtain the value xm such that
P (X xm ) = P (X xm )

(4.139)

4.8 From the distribution of residence times in an ideal CSTR is given in Eq (4.41),
determine, for a reactor with average residence time, = 30 mins, the probability
that a reactant molecule (i) spends less than 30 mins in the reactor; (ii) spends more
than 30 mins in the reactor; (iii) spends less than (30 ln 2) mins in the reactor; and
(iv) spends more than (30 ln 2) mins in the reactor.
Section 4.3
4.9 Determine E(X) for the discrete random variable in Exercise 4.5; for the continuous random variable in Exercise 4.6; and establish that E(X) for the residence
time distribution in Eq (4.41) is , thereby justifying why this parameter is known
as the mean residence time.
4.10 (Adapted from Stirzaker, 20031 ) Show that E(X) exists for the discrete random
variable, X, with the pdf:
f (x) =

4
; x = 1, 2, . . .
x(x + 1)(x + 2)

(4.140)

while E(X) does not exist for the discrete random random variable with the pdf
f (x) =

1
; x = 1, 2, . . .
x(x + 1)

(4.141)

4.11 Establish that E(X) = 1/p for a random variable X whose pdf is
f (x) = p(1 p)x1 ; x = 1, 2, 3, . . .

(4.142)

by dierentiating with respect to p both sides of the expression:

p(1 p)x1 = 1

(4.143)

x=1

4.12 From the denition of the mathematical expectation function, E(.), establish
that for the random variable, X, discrete or continuous:
E[k1 g1 (X) + k2 g2 (X)] = k1 E[g1 (X)] + k2 E[g2 (X)],

(4.144)

and that given E(X) = ,


E[(X )3 ] = E(X 3 ) 3 2 3

(4.145)

1 D. Stirzaker, (2003). Elementary Probability, 2nd Ed., Cambridge University Press,


p120.

Random Variables and Distributions

131

where 2 is the variance, dened by 2 = V ar(X) = E[(X )2 ].


Section 4.4
4.13 Show that for two random variables X and Y , and a third random variable
dened as
Z =X Y
(4.146)
show, from the denition of the expectation function, that regardless of whether the
random variables are continuous or discrete,
E(Z)

E(X) E(Y )

i.e., Z

X Y

(4.147)

and that
V ar(Z) = V ar(X) + V ar(Y )

(4.148)

when E[(X X )(Y Y )] = 0 (i.e., when X and Y are independent: see Chapter 5).
4.14 Given that the pdf of a certain discrete random variable X is:
f (x) =

x e
; x = 0, 1, 2, . . .
x!

(4.149)

Establish the following results:

f (x)

(4.150)

E(X)
V ar(X)

(4.151)

(4.152)

x=0

4.15 Obtain the variance and skewness of the discrete random variable in Exercise
4.5 and for the continuous random variable in Exercise 4.6. Which random variables
distribution is skewed and which is symmetric?
4.16 From the formal denitions of the moment generating function, establish Eqns
(4.95) and (4.96).
4.17 Given the pdf for the residence time for two identical CSTRs in series as
f (x) =

1 x/
xe
2

(4.153)

(i) obtain the MGF for this pdf and compare it with that derived in Example 4.7 in
the text. From this comparison, what would you conjecture to be the MGF for the
distribution of residence times for n identical CSTRs in series?
(ii) Obtain the characteristic function for the pdf in Eq (4.41) for the single CSTR
and also for the pdf in Eq (4.153) for two CSTRs. Compare the two characteristic
functions and conjecture what the corresponding characteristic function will be for
the distribution of residence times for n identical CSTRs in series.
4.18 Given that M (t) is the moment generating function of a random variable,
dene the psi-function, (t), as:
(t) = ln M (t)

(4.154)

132

Random Phenomena

(i) Prove that  (0) = , and  (0) = 2 , where each prime indicates dierentiation
with respect to t; and E(X) = , is the mean of the random variable, and 2 is the
variance, dened by 2 = V ar(X) = E[(X )2 ].
(ii) Given the pdf of a discrete random variable X as:
f (x) =

x e
; x = 0, 1, 2, . . .
x!

obtain its (t) function and show, using the results in (i) above, that the mean and
variance of this pdf are identical.
4.19 The pdf for the yield data discussed in Chapter 1 was postulated as
f (y) =

(y)2
1
e 22 ; < y <
2

(4.155)

If we are given that is the mean, rst establish that the mode is also , and then
use the fact that the distribution is perfectly symmetric about to establish that
median is also , hence conrming that for this distribution, the mean, mode and
median coincide.
4.20 Given the pdf:
1 1
; < x <
(4.156)
1 + x2
nd the mode and the median and show that they coincide. For extra credit:
Establish that = E(X) does not exist.
f (x) =

4.21 Compute the median and the other quartiles for the random variable whose
pdf is given as:

x 0<x<2
f (x) =
(4.157)
0 otherwise
4.22 Given the binary random variable, X, that takes the value 1 with probability
p, and the value 0 with probability (1 p), so that its pdf is given by

1 p x = 0;
p
x = 1;
(4.158)
f (x) =

0
elsewhere.
obtain an expression for the entropy H(X) and show that it is maximized when
p = 0.5, taking on the value H (X) = 1 at this point.
Section 4.5
4.23 First show that the cumulative hazard function, H(x), for the random variable,
X, the residence time in a CSTR is the linear function,
H(x) = x

(4.159)

(where = 1 ). Next, for a related random variable, Y , whose cumulative hazard


function is given by
(4.160)
H(y) = (y)

Random Variables and Distributions

133

where is a constant parameter, show that the corresponding survival function is


S(y) = e(x)

(4.161)

and from here obtain the pdf, f (y), for this random variable.
4.24 Given the pdf for the residence time for two identical CSTRs in series in Exercise 4.17, Eq (4.153), determine the survival function, S(x), and the hazard function,
h(x). Compare them to the corresponding results obtained for the single CSTR in
Example 4.8 and Example 4.9 in the text.

APPLICATION PROBLEMS
4.25 Before an automobile parts manufacturer takes full delivery of polymer resins
made by a supplier in a reactive extrusion process, a sample is processed and the
performance is tested for Toughness. The batch is either accepted (if the processed
samples Toughness equals or exceeds 140 J/m3 ) or it is rejected. As a result of
process and raw material variability, the acceptance/rejection status of each batch
varies randomly. If the supplier sends four batches weekly to the parts manufacturer, and each batch is made independently on the extrusion process, so that the
ultimate fate of one batch is independent of the fate of any other batch, dene X
as the random variable representing the number of acceptable batches a week and
answer the following questions:
(i) Obtain the sample space, , and the corresponding random variable space, V .
(ii) First, assume equal probability of acceptance and rejection, and obtain the the
pdf, f (x), for the entire sample space. If, for long term protability it is necessary
that at least 3 batches be acceptable per week, what is the probability that the
supplier will remain protable?
4.26 Revisit Problem 4.25 above and consider that after an extensive process and
control system improvement project, the probability of acceptance of a single batch
is improved to 0.8; obtain the new pdf, f (x). If the revenue from a single acceptable
batch is $20,000, but every rejected batch costs the supplier $8,000 in retrieval and
incineration fees, which will be deducted from the revenue, what is the expected net
revenue per week under the current circumstances?
4.27 A gas station situated on a back country road has only one gasoline pump
and one attendant and, on average, receives = 3 (cars/hour). The average rate at
which this lone attendant services the cars is (cars/hour). It can be shown that
the total number of cars at this gas station at any time (i.e. the one currently being
served, and those waiting in line to be served) is the random variable X with the
following pdf:
  x


; x = 0, 1, 2, . . .
(4.162)
f (x) = 1

(i) Show that so long as < , the probability that the line at the gas station is
innitely long is zero.
(ii) Find the value of required so that the expected value of the total number of

134

Random Phenomena

cars at the station is 2.


(iii) Using the value obtained in (ii), nd the probability that there are more than
two cars at the station, and also the probability that there are no cars.
4.28 The distribution of income of families in the US in 1979 (in actual dollars
uncorrected for ination) is shown in the table below:
Income level, x,
( $103 )
05
510
1015
1520
2025
2530
3035
3540
4045
4550
5055
> 55

Percent of Population
with income level, x
4
13
17
20
16
12
7
4
3
2
1
1

(i) Plot the data histogram and comment on the shape.


(ii) Using the center of the interval to represent each income group, determine the
mean, median, mode; and the variance and skewness for this data set. Comment on
how consistent the numerical values computed for these characteristics are with the
shape of the histogram.
(iii) If the 1979 population is broadly classied according to income into Lower
Class for income range (in thousands of dollars) 015, Middle Class for income
range, 1550 and Upper Class for income range > 50, what is the probability that
two people selected at random and sequentially to participate in a survey from the
Census Bureau (in preparation for the 1980 census) are (a) both from the Lower
Class, (b) both from the Middle Class, (c) one from the Middle class and one
from the Upper class, and (d) both from the Upper class?
(iv) If, in 1979, engineers with at least 3 years of college education (excluding graduate students) constitute approximately 1% of the population, (2.2 million out of 223
million) and span the income range from 2055, determine the probability that an
individual selected at random from the population is in the middle class given that
he/she is an engineer. Determine the converse, that the person selected at random
is an engineer given that he/she is in the middle class.
4.29 Life-testing results on a rst generation microprocessor-based (computercontrolled) toaster indicate that X, the life-span (in years) of the central control
chip, is a random variable that is reasonably well-modeled by the pdf:
f (x) =

1 x/
e
;x > 0

(4.163)

with = 6.25. A malfunctioning chip will have to be replaced to restore proper


toaster operation.
(i) The warranty for the chip is to be set at xw years (in whole integers) such that

Random Variables and Distributions

135

no more than 15% would have to be replaced before the warranty period expires.
Find xw .
(ii) In planning for the second generation toaster, design engineers wish to set a target value to aim for ( = 2 ) such that 85% of the second generation chips survive
beyond 3 years. Determine 2 and interpret your results in terms of the implied
fold increase in mean life-span from the rst to the second generation of chips.
4.30 The probability of a single transferred embryo resulting in a live birth in an
in-vitro fertilization treatment, p, is given as 0.5 for a younger patient and 0.2 for
an older patient. When n = 5 embryos are transferred in a single treatment, it is
also known that if X is the total number of live births resulting from this treatment,
then E(X) = 2.5 for the younger patient and E(X) = 1 for the older patient, and
the associated variance, V ar(X) = 1.25 for the younger and V ar(X) = 0.8 for the
older.
(i) Use Markovs inequality and Chebyshevs inequality to obtain bounds on the
probability of each patient giving birth to quadruplets or a quintuplets at the end
of the treatment.
(ii) These bounds are known to be quite conservative, but to determine just how
conservative , compute the actual probabilities of the stated events for each patient
given that an appropriate pdf for X is
f (x) =

5!
px (1 p)5x
x!(5 x)!

(4.164)

where p is as given above. Compare the actual probabilities with the Markov and
Chebychev bounds and identify which bound is sharper.
4.31 The following data table, obtained from the United States Life Tables 196971,
(published in 1973 by the National Center for Health Statistics) shows the probability of survival until the age of 65 for individuals of the given age2 .
Age
y
0
10
20
30
35
40
45
50
55
60

Prob of survival
to age 65
0.72
0.74
0.74
0.75
0.76
0.77
0.79
0.81
0.85
0.90

The data should be interpreted as follows: the probability that all newborns, and
children up to the age of ten survive until 65 years of age is 0.72; for those older
than 10 and up to 20 years, the probability of survival to 65 years is 0.74, and so on.
Assuming that the data is still valid in 1975, a community cooperative wishes to
2 More up-to-date versions, available, for example, in National Vital Statistics Reports,
Vol. 56, No. 9, December 28, 2007 contain far more detailed information.

136

Random Phenomena

set up a life insurance program that year whereby each participant pays a relatively
small annual premium, $, and, in the event of death before 65 years, a one-time
death gratuity payment of $ is made to the participants designated beneciary.
If the participant survives beyond 65 years, nothing is paid. If the cooperative is
to realize a xed, modest expected revenue, $RE = $30, per year, per participant,
over the duration of his/her participation (mostly to cover administrative and other
costs) provide answers to the following questions:
(i) For a policy based on a xed annual premium of $90 for all participants, and
age-dependent payout, determine values for (y), the published payout for a person
of age y that dies before age 65, for all values of y listed in this table.
(ii) For a policy based instead on a xed death payout of $8, 000, and age-dependent
annual premiums, determine values for (y), the published annual premium to be
collected each year from a participant of age y.
(iii) If it becomes necessary to increase the expected revenue by 50% as a result
of increased administrative and overhead costs, determine the eect on each of the
policies in (i) and (ii) above.
(iv) If by 1990, the probabilities of survival have increased across the board by 0.05,
determine the eect on each of the policies in (i) and (ii).

Chapter 5
Multidimensional Random Variables

5.1

5.2

5.3

5.4

Introduction and Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


5.1.1 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 2-Dimensional (Bivariate) Random Variables . . . . . . . . . . . . . . . . . . . . .
5.1.3 Higher-Dimensional (Multivariate) Random Variables . . . . . . . . . . .
Distributions of Several Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 General Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distributional Characteristics of Jointly Distributed Random Variables
5.3.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Marginal Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137
138
139
140
141
141
144
147
152
153
154
155
156
157
158
163
164
166
168

Servant of God, well done,


well has thou fought the better ght,
who single hast maintained,
against revolted multitudes the cause of truth,
in word mightier than they in arms.
John Milton (16081674)

When the outcome of interest in an experiment is not one, but two or more
variables simultaneously, additional issues arise that are not fully addressed
by the probability machinery as it stands at the end of the last chapter. The
concept of the random variable, restricted as it currently is to the single, onedimensional random variable X, needs to be extended to higher dimensions;
and doing so is the sole objective of this chapter. With the introduction of a
few new concepts, new varieties of the probability distribution function (pdf)
emerge along with new variations on familiar results; together, they expand
and supplement what we already know about random variables and bring to
a conclusion the discussion we started in Chapter 4.
137

138

5.1
5.1.1

Random Phenomena

Introduction and Definitions


Perspectives

Consider a clinical study of the additional eects of the Type 2 diabetes


drug, Avandia , in which a group of 193 patients with type 2 diabetes who
had undergone cardiac bypass surgery were randomly assigned to receive the
drug or a placebo. After one year, the researchers reported that patients taking
Avandia not only had better blood sugar control, they also showed improved
cholesterol levels, fewer signs of inammation of blood vessels, and lower blood
pressure, compared with those on a placebo.1
Extracting the desired scientic information accurately and eciently from
the clinical study data, of course, relies on many principles of probability,
statistical analysis and experimental design issues that are not of concern
at this moment. For purposes of this chapters discussion, we restrict our
attention to the basic (but central) fact that for each patient in the study,
the result of interest involves not one but several variables simultaneously,
including: (i) blood sugar level, (ii) cholesterol levels, (more specically, the
low-density lipoprotein, or LDL, version, and the high-density lipoprotein,
or HDL, version), and (iii) blood pressure, (more specically, the systolic and
the diastolic pressures).
This is a real-life example of an experiment whose outcome is intrinsically
multivariate, consisting of several distinct variables, each subject to random
variability. As it currently stands, the probability machinery of Chapter 4 is
only capable of dealing with one single random variable at a time. As such,
we are only able to use it to characterize the variability inherent in each of the
variables of interest one at a time. This raises some important questions that
we did not have to contend with when dealing with single random variables:
1. Do these physiological variables vary jointly or separately? For example,
do patients with high LDL cholesterol levels tend to have high systolic
blood pressures also, or do the levels of one have nothing in common
with the levels of the other?
2. If there is even the remotest possibility that one variable interacts
with another, can we deal with each variable by itself as if the others do
not exist without incurring serious errors?
3. If we accept that until proven otherwise these variables should be considered jointly, how should such joint variabilities be represented?
4. What other aspects of the joint behavior of inter-related random vari1 Avandia May Slow Atherosclerosis After Bypass Surgery, by Steven Reinberg, US
News and World Report, April 1, 2008.

Multidimensional Random Variables

139

ables provide useful means of characterizing jointly varying random variables?


These questions indicate that what we know about random variables from
Chapter 4 must be extended appropriately to enable us deal with the new
class of issues that arise when multiple random variables must be considered
simultaneously.
The logical place to begin, of course, is with a 2-dimensional (or bivariate)
random variable, before extending the discussion to the general case with
n > 2 variables.

5.1.2

2-Dimensional (Bivariate) Random Variables

The following is a direct extension of the formal denition of the single


random variable given in Chapter 4.

Denition: Bivariate Random Variable.


Given a random experiment with a sample space , and a probability set function P (.) dened on its subsets; let there be a function
X, dened on , which assigns to each element , one and
only one ordered number pair (X1 (), X2 ()). This function, X,
is called a two-dimensional, or bivariate random variable.

As with the single random variable case, associated with this twodimensional random variable is a space, V , and a probability set function
PX induced by X = (X1 , X2 ), where V is dened as:
V = {(x1 , x2 ) : X1 () = x1 , X2 () = x2 ; }

(5.1)

The most important point to note at this point is that the random variable
space V involves X1 and X2 simultaneously; it is not merely a union of separate spaces V1 for X1 and V2 for X2 .
An example of a bivariate random variable was presented in Example 4.4
in Chapter 4; here is another.
Example 5.1 BIVARIATE RANDOM VARIABLE AND INDUCED PROBABILITY FUNCTION FOR COIN TOSS EXPERIMENT
Consider an experiment involving tossing a coin 2 times and recording
the number of observed heads and tails: (1) Obtain the sample space ;
and (2) Dene X as a two-dimensional random variable (X1 , X2 ) where
X1 is the number of heads obtained in the rst toss, and X2 is the number of heads obtained in the second toss. Obtain the new space V . (3)
Assuming equiprobable outcomes, obtain the induced probability PX .

140

Random Phenomena
Solution:
(1) From the nature of the experiment, the required sample space, , is
given by
= {HH, HT, T H, T T }
(5.2)
consisting of all 4 possible outcomes, which may be represented respectively, as i ; i = 1, 2, 3, 4, so that
= {1 , 2 , 3 , 4 }.

(5.3)

(2) By denition of X, we see that X(1 ) = (1, 1); X(2 ) =


(1, 0); X(3 ) = (0, 1)X(4 ) = (0, 0); so that the space V is given by:
V = {(1, 1); (1, 0); (0, 1); (0, 0)}

(5.4)

since these are all the possible values that the two-dimensional X can
take.
(3) This is a case where there is a direct one-to-one mapping between
the 4 elements of the original sample space and the induced random
variables space V ; as such, for equiprobable outcomes, we obtain,
PX (1, 1)

1/4

PX (1, 0)

1/4

PX (0, 1)

1/4

PX (0, 0)

1/4

(5.5)

In making sense of the formal denition given here for the bivariate (2dimensional) random variable, the reader should keep in mind the practical
considerations presented in Chapter 4 for the single random variable. The same
issues there apply here. In a practical sense, the bivariate random variable
may be considered simply, if informally, as an experimental outcome with two
components, each with numerical values that are subject to random variations
with exact replicate performance of the experiment.
For example, consider a polymer used for packaging applications, for which
the quality measurements of interest are melt index (indicative of the molecular weight distribution), and density (indicative of co-polymer composition). With each performance of lab analysis on samples taken from the manufacturing process, the values obtained for each of these quantities are subject
to random variations. Without worrying so much about the original sample
space or the induced one, we may consider the packaging polymer quality characteristics directly as the two-dimensional random variable whose components
are melt index (as X1 ), and density (as X2 ).
We now note that it is fairly common for many textbooks to use X and Y
to represent bivariate random variables. We choose to use X1 and X2 because
it oers a notational convenience that facilitates generalization to n > 2.

5.1.3

Higher-Dimensional (Multivariate) Random Variables

The foregoing discussion is generalized to n > 2 as follows.

Multidimensional Random Variables

141

Denition: Multivariate Random Variable.


Given a random experiment with a sample space , and a probability set function P (.) dened on its subsets; let there be a function
X, dened on which assigns to each element , one and only
one ntuple (X1 (), X2 (), , Xn ()) to each element .
This function, X, is called an n-dimensional random variable.

Similarly, associated with this n-dimensional random variable is a space,


V:
V = {(x1 , x2 , , xn ) : X1 () = x1 , X2 () = x2 ; , Xn () = xn ; }
(5.6)
and a probability set function PX induced by X.
As a practical matter, we may observe, for example, that in the Avandia
study mentioned at the beginning of this chapter, the outcome of interest
for each patient is a continuous, 5-dimensional random variable, X, whose
components are: X1 = Blood sugar level; X2 = LDL cholesterol level; X3 =
HDL cholesterol level; X4 = systolic blood pressure; and X5 = diastolic blood
pressure. The specic observed values for each patient will be the quintuple
measurement (x1 , x2 , x3 , x4 , x5 ).
Everything we have discussed above for the bivariate random variable n =
2 extends directly for the general n.

5.2
5.2.1

Distributions of Several Random Variables


Joint Distributions

The results of Example 5.1 can be written as:

1/4; x1 = 1, x2

1/4; x1 = 1, x2
1/4; x1 = 0, x2
f (x1 , x2 ) =

1/4; x1 = 0, x2

0;
otherwise

=1
=0
=1
=0

(5.7)

showing how the probabilities are distributed over the 2-dimensional random variable space, V . Once again, we note the following about the function
f (x1 , x2 ):
f (x1 , x2 ) > 0; x1 , x2
 

x1
x2 f (x1 , x2 ) = 1

142

Random Phenomena

We may now generalize beyond this specic example as follows:

Denition: Joint pdf


Let there exist a sample space (along with a probability set
function, P , dened on its subsets), and a random variable X =
(X1 , X2 ), with an attendant random variable space V : a function
f dened on V such that:
1. f (x1 , x2 ) 0; x1 , x2 V ;
 
2.
x1
x2 f (x1 , x2 ) = 1; x1 , x2 V ;
3. PX (X1 = x1 , X2 = x2 ) = f (x1 , x2 )
is called the joint probability distribution function of the discrete
two-dimensional random variable X = (X1 , X2 ).

These results are direct extensions of the axiomatic statements given earlier
for the discrete single random variable pdf.
The probability that both X1 < x1 and X2 < x2 is given by the cumulative
distribution function,
F (x1 , x2 ) = P (X1 < x1 , X2 < x2 )

(5.8)

valid for discrete and continuous random variables. When F is a continuous


function of both x1 and x2 and possesses rst partial derivatives, the twodimensional function,
f (x1 , x2 ) =


F (x1 , x2 )
x1 x2

(5.9)

is called the joint probability density function for the continuous twodimensional random variables X1 and X2 . As with the discrete case, the formal
properties of the continuous joint pdf are:

1. f (x1 , x2 ) 0; x1 , x2 V ;
2. f has at most a nite number of discontinuities in every nite interval
in V ;

3. The double integral, x1 x2 f (x1 , x2 )dx1 dx2 = 1;

4. PX (A) = A f (x1 , x2 )dx1 , dx2 ; for A V

Multidimensional Random Variables

143

Thus,

P (a1 X1 a2 ; b1 X2 b2 ) =

b2

a2

f (x1 , x2 )dx1 dx2


b1

(5.10)

a1

These results generalize directly to the multidimensional random variable


X = (X1 , X2 , , Xn ) with a joint pdf f (x1 , x2 , , xn ).
Example 5.2 JOINT PROBABILITY DISTRIBUTION OF
CONTINUOUS BIVARIATE RANDOM VARIABLE
The reliability of the temperature control system for a commercial,
highly exothermic polymer reactor is known to depend on the lifetimes
(in years) of the control hardware electronics, X1 , and of the control
valve on the cooling water line, X2 . If one component fails, the entire
control system fails. The random phenomenon in question is characterized by the two-dimensional random variable (X1 , X2 ) whose joint
probability distribution is given as:
1 (0.2x +0.1x )
1
2
; 0 < x1 <
50 e
f (x1 , x2 ) =
(5.11)
0 < x2 <

0
elsewhere
(1) Establish that this is a legitimate pdf; and (2) obtain the probability
that the system lasts more than two years; (3) obtain the probability
that the electronic component functions for more than 5 years and the
control valve for more than 10 years.
Solution:
(1) If this is a legitimate joint pdf, then the following should hold:
 
f (x1 , x2 )dx1 dx2 = 1
(5.12)
0

In this case, we have:


 
1 (0.2x1 +0.1x2 )
dx1 dx2
e
50
0
0

=
=

  
 
1 
5e0.2x1 0
10e0.1x2 0
50
1
(5.13)

We therefore conclude that the given joint pdf is legitimate.


(2) For the system to last more than 2 years, both components must
simultaneously last more than 2 year. The required probability is therefore given by:
 
1 (0.2x1 +0.1x2 )
e
dx1 dx2
(5.14)
P (X1 > 2, X2 > 2) =
50
2
2
which, upon carrying out the indicated integration and simplifying, reduces to:
P (X1 > 2, X2 > 2) = e0.4 e0.2 = 0.67 0.82 = 0.549

(5.15)

Thus, the probability that the system lasts beyond the rst two years

144

Random Phenomena
is 0.549.
(3) The required probability, P (X1 > 5; X2 > 10) is obtained as:
 
1 (0.2x1 +0.1x2 )
e
dx1 dx2
P (X1 > 5; X2 > 10) =
50
10
5






e0.1x2 10 = (0.368)2
=
e0.2x1 5
=

0.135

(5.16)

The preceding discussions have established the joint pdf f (x1 , x2 , , xn )


as the most direct extension of the single variable pdf f (x) of Chapter 4
to higher-dimensional random variables. However, additional distributions
needed to characterize other aspects of multidimensional random variables
can be derived from these joint pdfs distributions that we had no need for
in dealing with single random variables. We will discuss these new varieties of
distributions rst for the 2-dimensional (bivariate) random variable, and then
extend the discussion to the general n > 2.

5.2.2

Marginal Distributions

Consider the joint pdf f (x1 , x2 ) for the 2-dimensional random variable
(X1 , X2 ); it represents how probabilities are jointly distributed over the entire
(X1 , X2 ) plane in the random variable space. Were we to integrate over the
entire range of X2 (or sum over the entire range in the discrete case), what is
left is the following function of x1 in the continuous case:

f1 (x1 ) =
f (x1 , x2 )dx2
(5.17)

or, in the discrete case,


f1 (x1 ) =

f (x1 , x2 )

(5.18)

x2

This function, f1 (x1 ), characterizes the behavior of X1 alone, by itself, regardless of what is going on with X2 .
Observe that, if one wishes to determine P (a1 < X1 < a2 ) with X2 taking
any value, by denition, this probability is determined as:

 a2 
f (x1 , x2 )dx2 dx1
(5.19)
P (a1 < X1 < a2 ) =
a1

But according to Eq (5.17), the terms in the parentheses represent f1 (x1 ),


hence:
 a2
f1 (x1 )dx1
(5.20)
P (a1 < X1 < a2 ) =
a1

an expression that is reminiscent of probability computations for single random variable pdfs.

Multidimensional Random Variables

145

The function in Eq (5.17) is known as the marginal distribution of X1 ; and


by the same token, the marginal distribution of X2 , in the continuous case, is
given by:

f2 (x2 ) =

f (x1 , x2 )dx1 ,

(5.21)

obtained by integrating out X1 from the joint pdf of X1 and X2 ; or, in the
discrete case, it is:

f2 (x2 ) =
f (x1 , x2 )
(5.22)
x1

These pdfs, f1 (x1 ) and f2 (x2 ), respectively represent the probabilistic characteristics of each random variable X1 and X2 considered in isolation, as opposed to f (x1 , x2 ) that represents the joint probabilistic characteristics when
considered together. The formal denitions are given as follows:

Denition: Marginal pdfs


Let X = (X1 , X2 ) be a 2-dimensional random variable with a joint
pdf f (x1 , x2 ); the marginal probability distribution function of X1
alone, and of X2 alone, are dened as the following functions:

f1 (x1 ) =
f (x1 , x2 )dx2
(5.23)

and
f2 (x2 ) =

f (x1 , x2 )dx1

(5.24)

for continuous random variables, and, for discrete random variables, as the functions:

f1 (x1 ) =
f (x1 , x2 )
(5.25)
x2

and
f2 (x2 ) =

f (x1 , x2 )

(5.26)

x1

Each marginal pdf possesses all the usual properties of pdfs, i.e., for continuous random variables,

146

Random Phenomena

1. f1 (x1 ) 0; and f2 (x2 ) 0




2. f1 (x1 )dx1 = 1; and f2 (x2 )dx2 = 1


3. P (X1 A) = A f1 (x1 )dx1 ; and P (X2 A) = A f2 (x2 )dx2

with the integrals are replaced with sums for the discrete case. An illustrative example follows.
Example 5.3 MARGINAL DISTRIBUTIONS OF CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the marginal distributions of the joint pdfs given in Example 5.2
for characterizing the reliability of the commercial polymer reactors
temperature control system. Recall that the component random variables are X1 , the lifetimes (in years) of the control hardware electronics,
and X2 , the lifetime of the control valve on the cooling water line; the
joint pdf is as given in Eq (5.11):

f (x1 , x2 ) =

1 (0.2x1 +0.1x2 )
e
;
50

0 < x1 <
0 < x2 <
elsewhere

Solution:
(1) For this continuous bivariate random variable, we have from Eq
(5.17) that:

1 (0.2x1 +0.1x2 )
dx2
e
f1 (x1 ) =
50
0

1 0.2x1
1
=
e0.1x2 dx2 = e0.2x1
(5.27)
e
50
5
0
Similarly, from Eq (5.21), we have,

1 (0.2x1 +0.1x2 )
f2 (x2 ) =
dx1
e
50
0

1 0.1x2
1 0.1x2
=
e0.2x1 dx1 =
e
e
50
10
0

(5.28)

As an exercise, the reader should conrm that each of these marginal distributions is a legitimate pdf in its own right.
These ideas extend directly to n > 2 random variables whose joint pdf
is given by f (x1 , x2 , , xn ). There will be n separate marginal distributions
fi (xi ); i = 1, 2, , n, each obtained by integrating (or summing) out every
other random variable except the one in question, i.e.,
 

f1 (x1 ) =

f (x1 , x2 , , xn )dx2 dx3 dxn


(5.29)

Multidimensional Random Variables


or, in general,
 
fi (xi ) =

147

f (x1 , x2 , , xn )dx1 dx2 , dxi1 , dxi+1 , dxn

(5.30)
It is important to note that when n > 2, marginal distributions themselves
can be multivariate. For example, f12 (x1 , x2 ) is what is left of the joint pdf
f (x1 , x2 , , xn ) after integrating (or summing) over the remaining (n 2)
variables; it is a bivariate pdf of the two surviving random variables of interest.
The concepts are simple and carry over directly; however, the notation can
become quite confusing if one is not careful. We shall return to this point a
bit later in this chapter.

5.2.3

Conditional Distributions

If the joint pdf f (x1 , x2 ) of a bivariate random variable provides a description of how the two component random variables vary jointly; and if the
marginal distributions f1 (x1 ) and f2 (x2 ) describe how each random variable
behaves by itself, in isolation, without regard to the other; there remains yet
one more characteristic of importance: a description of how X1 behaves for
given specic values of X2 , and vice versa, how X2 behaves for specic values
of X1 (i.e., the probability distribution of X1 conditioned upon X2 taking on
specic values, and vice versa). Such conditional distributions are dened
as follows:

Denition: Conditional pdfs


Let X = (X1 , X2 ) be a 2-dimensional random variable, discrete
or continuous, with a joint pdf, f (x1 , x2 ), along with marginal
distributions f1 (x1 ) and f2 (x2 ); the conditional distribution of X1
given that X2 = x2 is dened as:
f (x1 |x2 ) =

f (x1 , x2 )
; f2 (x2 ) > 0
f2 (x2 )

(5.31)

Similarly, the conditional distribution of X2 given that X1 = x1 is


dened as:
f (x1 , x2 )
; f1 (x1 ) > 0
f (x2 |x1 ) =
(5.32)
f1 (x1 )

The similarity between these equations and the expression for conditional
probabilities of events dened as sets, as given in Eq (3.40) of Chapter 3
P (A|B) =

P (A B)
P (B)

(5.33)

148

Random Phenomena

should not be lost on the reader.


In Eq (5.31), the indicated pdf is a function of x1 , with x2 xed; it is a
straightforward exercise to show that this is a legitimate pdf. Observe that in
the continuous case,


f (x1 , x2 )dx1
(5.34)
f (x1 |x2 )dx1 =
f2 (x2 )

the numerator of which is recognized from Eq (5.21) as the marginal distribution of X2 so that:

f2 (x2 )
=1
(5.35)
f (x1 |x2 )dx1 =
f2 (x2 )

The same result holds for f (x2 |x1 ) in Eq (5.32) when integrated with respect
of x2 ; and, by replacing the integrals with sums, we obtain identical results
for the discrete case.
Example 5.4 CONDITIONAL DISTRIBUTIONS OF CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the conditional distributions of the 2-dimensional random variables given in Example 5.2 for the reliability of a temperature control
system.
Solution:
Recall from the previous examples that the joint pdf is:
1 (0.2x +0.1x )
1
2
; 0 < x1 <
50 e
f (x1 , x2 ) =
0 < x2 <

0
elsewhere
Recalling the result obtained in Example 5.3 for the marginal pdfs
f1 (x1 ) and f2 (x2 ), the desired conditional pdfs are given as follows:
f (x1 |x2 )

1 (0.2x1 +0.1x2 )
e
50
1 0.1x2
e
10

1 0.2x1
e
5

(5.36)

and for the complementary conditional pdf f (x2 |x1 ):


f (x2 |x1 ) =

1 (0.2x1 +0.1x2 )
e
50
1 0.2x1
e
5

1 0.1x2
e
10

(5.37)

The reader may have noticed two things about this specic example: (i)
f (x1 |x2 ) is entirely a function of x1 alone, containing no x2 whose value is to
be xed; the same is true for f (x2 |x1 ) which is entirely a function of x2 , with
no dependence on x1 . (ii) In fact, not only is f (x1 |x2 ) a function of x1 alone;
it is precisely the same function as the unconditional marginal pdf f1 (x1 )
obtained earlier. The same is obtained for f (x2 |x1 ), which also turns out to

Multidimensional Random Variables

149

f(x1,x2)

1
2.0
0

1.5
0.0

0.5
x2

1.0

x1

1.0

FIGURE 5.1: Graph of the joint pdf for the 2-dimensional random variable of Example
5.5

be the same as the unconditional marginal pdf f2 (x2 ) obtained earlier. Such
circumstances do not always occur for all 2-dimensional random variables, as
the next example shows; but the special cases where f (x1 |x2 ) = f1 (x1 ) and
f (x2 |x1 ) = f2 (x2 ) are indicative of a special relationship between the two
random variables X1 and X2 , as discussed later in this chapter.
Example 5.5 CONDITIONAL DISTRIBUTIONS OF ANOTHER CONTINUOUS BIVARIATE RANDOM VARIABLE
Find the conditional distributions of the 2-dimensional random variables whose joint pdf is given as follows:

x1 x2 ; 1 < x1 < 2
(5.38)
0 < x2 < 1
f (x1 , x2 ) =

0
elsewhere
shown graphically in Fig 5.1.
Solution:
To nd the conditional distributions, we must rst nd the marginal
distributions. (As an exercise, the reader may want to conrm that this
joint pdf is a legitimate pdf.) These marginal distributions are obtained
as follows:
1

 1
x2 
(x1 x2 )dx2 = x1 x2 2 
(5.39)
f1 (x1 ) =
2
0
0
which simplies to give:

f1 (x1 ) =

(x1 0.5);
0;

1 < x1 < 2
elsewhere

(5.40)

150

Random Phenomena
Similarly,

f2 (x2 ) =

2
1


(x1 x2 )dx1 =

2

x21
x1 x2 
2
1

(5.41)

which simplies to give:



f2 (x2 ) =

(1.5 x2 );
0;

0 < x2 < 1
elsewhere

(5.42)

Again the reader may want to conrm that these marginal pdfs are
legitimate pdfs.
With these marginal pdfs in hand, we can now determine the required conditional distributions as follows:
f (x1 |x2 ) =

(x1 x2 )
; 1 < x1 < 2;
(1.5 x2 )

(5.43)

f (x2 |x1 ) =

(x1 x2 )
; 0 < x2 < 1;
(x1 0.5)

(5.44)

and

(The reader should be careful to note that we did not explicitly impose
the restrictive conditions x2 = 1.5 and x1 = 0.5 in the expressions given
above so as to exclude the respective singularity points for f (x1 |x2 ) and
for f (x2 |x1 ). This is because the original space over which the joint
distribution f (x1 , x2 ) was dened, V = {(x1 , x2 ) : 1 < x1 < 2; 0 < x2 <
1}, already excludes these otherwise troublesome points.)
Observe now that these conditional distributions show mutual dependence of x1 and x2 , unlike in Example 5.4. In particular, say for
x2 = 1 (the rightmost edge of the x2 -axis of the plane in Fig 5.1), the
conditional pdf f (x1 |x2 ) becomes:
f (x1 |x2 = 1) = 2(x1 1); 1 < x1 < 2;

(5.45)

whereas, for x2 = 0 (the leftmost edge of the x2 -axis of the plane in Fig
5.1), this conditional pdf becomes
f (x1 |x2 = 0) =

2x1
; 1 < x1 < 2;
3

(5.46)

Similar arguments can be made for f (x2 |x1 ) and are left as an exercise
for the reader.

The following example provides a comprehensive illustration of these distributions specically for a discrete bivariate random variable.
Example 5.6 DISTRIBUTIONS OF DISCRETE BIVARIATE
RANDOM VARIABLE
An Apple computer store in a small town stocks only three types of
hardware components: low-end, mid-level and high-end, selling
respectively for $1600, $2000 and $2400; it also only stocks two types
of monitors, the 20-inch type, selling for $600, and the 23-inch type,

Multidimensional Random Variables


selling for $900. An analysis of sales records over a 1-year period (the
prices remained stable over the entire period) is shown in Table 5.1, indicating what fraction of the total sales is due to a particular hardware
component and monitor type. Each recorded sale involves one hardware
component and one monitor: X1 is the selling price of the hardware component; X2 the selling price of the accompanying monitor. The indicated
frequencies of occurrence of each sale combination can be considered
to be representative of the respective probabilities, so that Table 5.1
represents the joint distribution, f (x1 , x2 ).

TABLE 5.1:
pdf for computer
sales
X2 $600
X1
$1600 0.30
$2000 0.20
$2400 0.10

Joint
store
$900
0.25
0.10
0.05

(1) Show that f (x1 , x2 ) is a legitimate pdf and nd the sales combination (x1 , x2 ) with the highest probability, and the one with the lowest
probability.
(2) Obtain the marginal pdfs f1 (x1 ) and f2 (x2 ), and from these
compute P (X1 = $2000), regardless of X2 , (i.e., the probability of selling
a mid-level hardware component regardless of the monitor paired with
it). Also obtain P (X2 = $900) regardless of X1 , (i.e., the probability of
selling a 23-inch monitor, regardless of the hardware component with
which it is paired).
(3) Obtain the conditional pdfs f (x1 |x2 ) and f (x2 |x1 ) and determine
the highest value for each conditional probability; describe in words
what each means.
Solution:
(1) If f (x1 , x2 ) is a legitimate pdf, then it must hold that

f (x1 , x2 ) = 1
(5.47)
x2

x1

From the joint pdf shown in the table, this amounts to adding up all
the 6 entries, a simple arithmetic exercise that yields the desired result.
The combination with the highest probability is seen to be X1 =
$1600; X2 = $400 since P (X1 = $1600; X2 = $400) = 0.3; i.e., the
probability is highest (at 0.3) that any customer chosen at random
would have purchased the low-end hardware (for $1600) and the 20inch monitor (for $600). The lowest probability of 0.05 is associated
with X1 = $2400 and X2 = $900, i.e., the combination of a high-end
hardware component and a 23-inch monitor.
(2) By denition, the marginal pdf f1 (x1 ) is given by:

f1 (x1 ) =
f (x1 , x2 )
(5.48)
x2

151

152

Random Phenomena
so that, from the table, f1 (1600) = 0.3 + 0.25 = 0.55; similarly,
f1 (2000) = 0.30 and f1 (2400) = 0.15. In the same manner, the values for f2 (x2 ) are obtained as f2 (600) = 0.30 + 0.20 + 0.10 = 0.60,
and f2 (900) = 0.4. These values are combined with the original joint
pdf into a new Table 5.2 to provide a visual representation of the relationship between these distributions. The required probabilities are

TABLE 5.2:
pdfs for computer
X2
X1
$1600
$2000
$2400
f2 (x2 )

Joint and marginal


store sales
$600 $900
f1 (x1 )
0.30 0.25
0.55
0.20 0.10
0.30
0.10 0.05
0.15
0.6
0.4
(1.0)

obtained directly from this table as follows:


P (X1 = $2000)

f1 (2000) = 0.30

(5.49)

P (X2 = $900)

f2 (900) = 0.40

(5.50)

(3) By denition, the desired conditional pdfs are given as follows:


f (x1 |x2 ) =

f (x1 , x2 )
f (x1 , x2 )
; and f (x2 |x1 ) =
f2 (x2 )
f1 (x1 )

(5.51)

and upon carrying out the indicated divisions using the numbers contained in Table 5.2, we obtain the result shown in Table 5.3 for f (x1 |x2 ),
and in Table 5.4 for f (x2 |x1 ). From these tables, we obtain the highest conditional probability for f (x1 |x2 ) as 0.625, corresponding to the
probability of a customer buying the low end hardware component
(X1 = $1600) conditioned upon having bought the 23-inch monitor
(X2 = $900); i.e., in the entire population of those who bought the 23inch monitor, the probability is highest at 0.625 that a low-end hardware
component was purchased to go along with the monitor. When the conditioning variable is the hardware component, the highest conditional
probability f (x2 |x1 ) is a tie at 0.667 for customers buying the 20-inch
monitor (X2 = $600) conditioned upon buying the mid-range hardware
(X1 = $2000), and those buying the high-end hardware (X1 = $2400).

Conditional pdf f (x1 |x2 ) for


computer store sales
X1
f (x1 |x2 = 600) f (x1 |x2 = 900)
$1600
0.500
0.625
0.333
0.250
$2000
0.167
0.125
$2400
Sum Total
1.000
1.000

TABLE 5.3:

Multidimensional Random Variables

153

Conditional pdf f (x2 |x1 ) for


computer store sales
X2
$600 $900 Sum Total
f (x2 |x1 = 1600) 0.545 0.455
1.000
1.000
f (x2 |x1 = 2000) 0.667 0.333
1.000
f (x2 |x1 = 2400) 0.667 0.333

TABLE 5.4:

5.2.4

General Extensions

As noted in the section on marginal distributions, it is conceptually


straightforward to extend the foregoing ideas and results to the general case
with n > 2. Such a general discussion, however, is susceptible to confusion
primarily because the notation can become muddled very quickly. Observe
that not only can the variables whose conditional distributions we seek be
multivariate, the conditioning variables themselves can also be multivariate
(so that the required marginal distributions are multivariate pdfs); and there
is always the possibility that there will be some variables left over that are
neither of interest, nor in the conditioning set.
To illustrate, consider the 5-dimensional random variable associated with
the Avandia clinical test: the primary point of concern that precipitated this
study was not so much the eectiveness of the drug in controlling blood sugar;
it is the potential adverse side-eect on cardiovascular function. Thus, the researchers may well be concerned with characterizing the pdf for blood pressure
X4 , X5 , conditioned upon cholesterol level X2 , X3 , leaving out X1 , blood sugar
level. Note how the variable of interest is bivariate, as is the conditioning variable. In this case, the desired conditional pdf is obtained as:
f (x4 , x5 |x2 , x3 ) =

f (x1 , x2 , x3 , x4 , x5 )
f23 (x2 , x3 )

(5.52)

where f23 (x2 , x3 ) is the bivariate joint marginal pdf for cholesterol level. We
see therefore that the principles transfer quite directly, and, when dealing with
specic cases in practice (as we have just done), there is usually no confusion.
The challenge is how to generalize without confusion.
To present the results in a general fashion and avoid confusion requires
adopting a dierent notation: using the vector X to represent the entire collection of random variables, i.e., X = (X1 , X2 , , Xn ), and then partitioning
this into three distinct vectors: X , the variables of interest (X4 , X5 in the
Avandia example given above); Y, the conditioning variables (X2 , X3 in the
Avandia example), and Z, the remaining variables, if any. With this notation,
we now have
f (x , y, z)
(5.53)
f (x |y) =
fy (y)
as the most general multivariate conditional distribution.

154

Random Phenomena

5.3

Distributional Characteristics of Jointly Distributed


Random Variables

The concepts of mathematical expectation and moments used to characterize the distribution of single random variables in Chapter 4 can be extended to
multivariate, jointly distributed random variables. Even though we now have
many more versions of pdfs to consider (joint, marginal and conditional), the
primary notions remain the same.

5.3.1

Expectations

The mathematical expectation of the function U (X) = U (X1 , X2 , , Xn )


of an n-dimensional continuous random variable with joint pdf f (x1 , x2 , , xn )
is given by:

 

U (x1 , x2 , , xn )f (x1 , x2 , , xn )dx1 dx2 dxn


E[U (X)] =

(5.54)
a direct extension of the single variable denition. The discrete counterpart
is:


E[U (X)] =

U (x1 , x2 , , xn )f (x1 , x2 , , xn )
(5.55)
x1

x2

xn

Example 5.7 EXPECTATIONS OF CONTINUOUS BIVARIATE RANDOM VARIABLE


From the joint pdf given in Example 5.2 for the reliability of the reactor
temperature control system, deduce which component is expected to fail
rst and by how long it is expected to be outlasted by the more durable
component.
Solution:
Recalling that the random variables for this system are X1 , the lifetime
(in years) of the control hardware electronics, and X2 , the lifetime of
the control valve on the cooling water line, observe that the function
U (X1 , X2 ), dened as
U (X1 , X2 ) = X1 X2

(5.56)

represents the dierential lifetimes of the two components; its expected


value provides the answer to both aspects of this question as follows:
By the denition of expectations,
 
1
(x1 x2 )e(0.2x1 +0.1x2 ) dx1 dx2 (5.57)
E[U (X1 X2 )] =
50 0
0
The indicated integrals can be evaluated several dierent ways. By expanding this expression into the dierence of two double integrals as

Multidimensional Random Variables

155

suggested by the multiplying (x1 x2 ), integrating out x2 in the rst


and x1 in the second, leads to:
  

 
1
1
E[U (X1 X2 )] =
x1 e0.2x1 dx1
x2 e0.1x2 dx2 ;
5 0
10 0
(5.58)
and upon carrying out the indicated integration by parts, we obtain:
E(X1 X2 ) = 5 10 = 5.

(5.59)

The immediate implication is that the expected lifetime dierential favors the control valve (lifetime X2 ) so that the control hardware electronic component is expected to fail rst, with the control valve expected
to outlast it by 5 years.
Example 5.8 EXPECTATIONS OF DISCRETE BIVARIATE
RANDOM VARIABLE
From the joint pdf given in Example 5.6 for the Apple computer store
sales, obtain the expected revenue from each recorded sale.
Solution:
Recall that for this problem, the random variables of interest are X1 ,
the cost of the computer hardware component, and X2 , the cost of the
monitor in each recorded sale. The appropriate function U (X1 , X2 ), in
this case is
(5.60)
U (X1 , X2 ) = X1 + X2
the total amount of money realized on each sale. By the denition of
expectations for the discrete bivariate random variable, we have

E[U (X1 , X2 )] =
(x1 + x2 )f (x1 , x2 )
(5.61)
x2

x1

From Table 5.1, this is obtained as:



 



x1 f (x1 , x2 ) +
x2 f (x1 , x2 )
E(X1 + X2 ) =
x1

x2

(0.55 1600 + 0.30 2000 + 0.15 2400)


+(0.60 600 + 0.40 900) = 2560

(5.62)

so that the required expected revenue from each sale is $2560.

In the special case where U (X) = e(t1 X1 +t2 X2 ) , the expectation, E[U (X)]
is the joint moment generating function, M (t1 , t2 ), for the bivariate random
variable X = (X1 , X2 ) dened by
(t X +t X )
e 1 1 2 2 f (x1 , x2 )dx1 dx2 ;
(t1 X1 +t2 X2 )
]=
M (t1 , t2 ) = E[e
 
(t1 X1 +t2 X2 )
f (x1 , x2 );
x1
x2 e
(5.63)
for the continuous and the discrete cases, respectively an expression that
generalizes directly for the n-dimensional random variable.

156

Random Phenomena

Marginal Expectations
Recall that for the general n-dimensional random variable X =
(X1 , X2 , Xn ), the single variable marginal distribution fi (xi ) is the distribution of the component random variable Xi alone, as if the others did not
exist. It is therefore similar to the single random variable pdf dealt with extensively in Chapter 4. As such, the marginal expectation of U (Xi ) is precisely
as dened in Chapter 4, i.e.,

U (xi )fi (xi )dxi
(5.64)
E[U (Xi )] =

for the continuous case, and, for the discrete case,



E[U (Xi )] =
U (xi )fi (xi )

(5.65)

xi

In particular, when U (Xi ) = Xi , we obtain the marginal mean Xi , i.e.,



xi fi (xi )dxi ; continuous Xi
(5.66)
E(Xi ) = Xi =

x
f
(x
);
discrete
X
i
i
i
i
xi
All the moments (central and ordinary) dened for the single random variable
are precisely the same as the corresponding marginal moments for the multidimensional random variable. In particular, the marginal variance is dened
as

(xi Xi )2 fi (xi )dxi ; continuous Xi
2
2
Xi = E[(Xi Xi ) ] =

2
discrete Xi
xi (xi Xi ) fi (xi );
(5.67)
From the expression given for the joint MGF above in Eq (5.63), observe
that:
M (t1 , 0) = E[et1 X1 ]
M (0, t2 ) = E[et2 X2 ]

(5.68)
(5.69)

are, respectively, the marginal MGFs for f1 (x1 ) and for f2 (x2 ).
Keep in mind that in the general case, marginal distributions can be multivariate; in this case, the context of the problem at hand will make clear what
such a joint-marginal distribution will look like after the remaining variables
have been integrated out.
Conditional Expectations
As in the discussion about conditional distributions, it is best to deal with
the bivariate conditional expectations rst. For the bivariate random variable

Multidimensional Random Variables

157

X = (X1 , X2 ), the conditional expectation E[U (X1 )|X2 ] (i.e the expectation
of the function U (X1 ) conditioned upon X2 = x2 ) is obtained from the conditional distribution as follows:

U (x1 )f (x1 |x2 )dx1 ; continuous X
(5.70)
E[U (X1 )|X2 ] =

U
(x
)f
(x
|x
);
discrete
X
1
1 2
x1
with a corresponding expression for E[U (X2 )|X1 ] based on the conditional
distribution f (x2 |x1 ). In particular, when U (X1 ) = X1 (or, U (X2 ) = X2 ), the
result is the conditional mean dened by:

E(X1 |X2 ) = X1 |x2 =


x1 f (x1 |x2 )dx1 ; continuous X

xi

x1 f (x1 |x2 );

with a matching corresponding expression for X2 |x1 .


Similarly, if
U (X1 ) = (X1 X1 |x2 )2
we obtain the conditional variance,
2
= E[(X1 X1 |x2 )2 ] =
X
1 |x2

2
X
1 |x2

(5.72)

as:


(x1 X1 |x2 )2 f (x1 |x2 )dx1 ;


(5.71)

discrete X

2
x1 (x1 X1 |x2 ) f (x1 |x2 );

(5.73)

respectively for the continuous and discrete cases.


These concepts can be extended quite directly to general n-dimensional
random variables; but, as noted earlier, one must be careful to avoid confusing
notation.

5.3.2

Covariance and Correlation

Consider the 2-dimensional random variable X = (X1 , X2 ) whose marginal


means are given by X1 and X2 , and respective marginal variances 12 and
22 ; the quantity
(5.74)
12 = E[(X1 X1 )(X2 X2 )]
is known as the covariance of X1 with respect to X2 ; it is a measure of the
mutual dependence of variations in X1 and in X2 . It is straightforward to
show from Eq (5.74) that
12 = E(X1 X2 ) X1 X2

(5.75)

A popular and more frequently used measure of this mutual dependence is


the scaled quantity:
12
=
(5.76)
1 2

158

Random Phenomena

where 1 and 2 are the positive square roots of the respective marginal
variances of X1 and X2 . is known as the correlation coecient, with the
attractive property that
1 1

(5.77)

The most important points to note about the covariance, 12 , or the correlation coecient, are as follows:
1. 12 will be positive if values of X1 > X1 are generally associated with
values of X2 > X2 , or when values of X1 < X1 tend to be associated with values of X2 < X2 . Such variables are said to be positively
correlated and will be positive ( > 0), with the strength of the correlation indicated by the absolute value of : weakly correlated variables
will have low values close to zero while strongly correlated variables will
have values close to 1. (See Fig 5.2.) For perfectly positively correlated
variables, = 1.
2. The reverse is the case when 12 is negative: for such variables, values
of X1 > X1 appear preferentially together with values of X2 < X2 ,
or else values of X1 < X1 tend to be associated more with values of
X2 > X2 . In this case, the variables are said to be negatively correlated
and will be negative ( < 0); once again, with the strength of correlation indicated by the absolute values of . (See Fig 5.3). For perfectly
negatively correlated variables, = 1.
3. If the behavior of X1 has little or no bearing with that of X2 , as one
might expect, 12 and will tend to be close to zero (See Fig 5.4); and
when the two random variables are completely independent of each
other, then both 12 and will be exactly zero.
This last point brings up the concept of stochastic independence.

5.3.3

Independence

Consider a situation where electronic component parts manufactured at


two dierent plant sites are labeled 1 for plant site 1, and 2 for the other.
After combining these parts into one lot, each part is drawn at random and
tested: if found defective, it is labeled 0; otherwise it is labeled 1. Now
consider the 2-dimensional random variable X = (X1 , X2 ) where X1 is the
location of the manufacturing site (1 or 2), and X2 is the after-test status
of the electronic component part (0 or 1). If after many such draws and tests,
we discover that whether or not the part is defective has absolutely nothing
to do with where it was manufactured, (i.e., a defective part is just as likely
to come from one plant site as the other), we say that X1 is independent of
X2 . A formal denition now follows:

Multidimensional Random Variables

159

40

X2

30

20

10

0
1

5
X1

FIGURE 5.2: Positively correlated variables: = 0.923

50

40

X2

30

20

10

0
1

5
X1

FIGURE 5.3: Negatively correlated variables: = 0.689

160

Random Phenomena
35

30

X2

25

20

15

10
5
1

5
X1

FIGURE 5.4: Essentially uncorrelated variables: = 0.085

Denition: Stochastic Independence


Let X = (X1 , X2 ) be a 2-dimensional random variable, discrete or
continuous; X1 and X2 are independent if the following conditions
hold:
1. f (x2 |x1 ) = f2 (x2 );
2. f (x1 |x2 ) = f1 (x1 ); and
3. f (x1 , x2 ) = f1 (x1 )f2 (x2 )

The rst point indicates that the distribution of X2 conditional on X1 is


identical to the unconditional (or marginal) distribution of X2 . In other words,
conditioning on X1 has no eect on the distribution of X2 , indicating that X2
is independent of X1 . However, this very fact (that X2 is independent of X1 )
also immediately implies the converse: that X1 is independent of X2 (i.e., that
the independence in this case is mutual). To establish this, we note that, by
denition, in Eq (5.32),
f (x2 |x1 ) =

f (x1 , x2 )
f1 (x1 )

(5.78)

However, when X2 is independent of X1 ,


f (x2 |x1 ) = f2 (x2 )

(5.79)

i.e., point 1 above, holds; as a consequence, by replacing f (x2 |x1 ) in Eq (5.78)

Multidimensional Random Variables

161

above with f2 (x2 ), we obtain:


f (x1 , x2 ) = f1 (x1 )f2 (x2 )

(5.80)

which, rst of all, is item 3 in the denition above, but just as importantly,
when substituted into the numerator of the expression in Eq (5.31), i.e.,
f (x1 |x2 ) =

f (x1 , x2 )
f2 (x2 )

when the conditioning is now on X2 , reduces this equation to


f (x1 |x2 ) = f1 (x1 )

(5.81)

which is item number 2 above indicating that X1 is also independent


of X2 . The two variables are therefore said to be mutually stochastically
independent.
Let us now return to a point made earlier after Example 5.4. There we
noted that the distributional characteristics of the random variables X1 , the
lifetime (in years) of the control hardware electronics, and X2 , the lifetime
of the control valve on the cooling water line, were such that they satised
conditions now recognizable as the ones given in points 1 and 2 above. It is
therefore now clear that the special relationship between these two random
variables alluded to back then is that they are stochastically independent.
Note that the joint pdf, f (x1 , x2 ), for this system is a product of the two
marginal pdfs, as in condition 3 above. This is not the case for the random
variables in Example 5.5.
The following example takes us back to yet another example encountered
earlier.
Example 5.9 INDEPENDENCE OF TWO DISCRETE RANDOM VARIABLES
Return to the two-coin toss experiment discussed in Example 5.1. From
the joint pdf obtained for this bivariate random variable (given in Eq
(5.7)), show that the two random variables, X1 (the number of heads
obtained in the rst toss), and X2 (the number of heads obtained in the
second toss), are independent.
Solution:
By denition, and from the results in that example, the marginal distributions are obtained as follows:

f (x1 , x2 )
f1 (x1 ) =
x2

f (x1 , 0) + f (x1 , 1)

(5.82)

so that f1 (0) = 1/2; f1 (1) = 1/2. Similarly,



f (x1 , x2 )
f2 (x2 ) =
x1

f (0, x2 ) + f (1, x2 ) = 1/2

(5.83)

162

Random Phenomena

TABLE 5.5:
pdfs for two-coin
Example 5.1
X2
X1
0
1
f2 (x2 )

Joint and marginal


toss problem of
0

1/4 1/4
1/4 1/4
1/2 1/2

so that f2 (0) = 1/2; f2 (1) = 1/2; i.e.,

1/2;
1/2;
f1 (x1 ) =

0;

1/2;
1/2;
f2 (x2 ) =

0;

f1 (x1 )
1/2
1/2
1

x1 = 0
x1 = 1
otherwise

(5.84)

x2 = 0
x2 = 1
otherwise

(5.85)

If we now tabulate the joint pdf and the marginal pdfs, we obtain the
result in Table 5.5. It is now clear that for all x1 and x2 ,
f (x1 , x2 ) = f1 (x1 )f2 (x2 )

(5.86)

so that these two random variables are independent.


Of course, we know intuitively that the number of heads obtained
in the rst toss should have no eect on the number of heads obtained
in the second toss, but this fact has now been established theoretically.

The concept of independence is central to a great deal of the strategies


for solving problems involving random phenomena. The ideas presented in
this section are therefore used repeatedly in upcoming chapters in developing
models, and in solving many practical problems.
The following is one additional consequences of stochastic independence.
If X1 and X2 are independent, then
E[U (X1 )G(X2 )] = E[U (X1 )]E[G(X2 )]

(5.87)

An immediate consequence of this fact is that in this case,


12

= 0
= 0

(5.88)
(5.89)

since, by denition,
12 = E[(X1 X1 ).(X2 X2 )]

(5.90)

and, by virtue of Eq (5.87), independence implies:


12 = E[(X1 X1 )].E[(X2 X2 )] = 0

(5.91)

Multidimensional Random Variables

163

It also follows that = 0 since it is 12 /1 2 .


A note of caution: it is possible for E[U (X1 )G(X2 )] to equal the product
of expectations, E[U (X1 )]E[G(X2 )] by chance, without X1 and X2 being independent; however, if X1 and X2 are independent, then Eq. (5.87) will hold.
This expression is therefore a necessary but not sucient condition.
We must exercise care in extending the denition of stochastic independence to the n-dimensional random variable where n > 2. The random variables X1 , X2 , , Xn are said to be mutually stochastically independent if
and only if,
n

fi (xi )
(5.92)
f (x1 , x2 , , xn ) =
i=1

where f (x1 , x2 , , xn ) is the joint pdf, and fi (xi ); i = 1, 2, , n, are the


n individual marginal pdfs. On the other hand, these random variables are
pairwise stochastically independent if every pair Xi , Xj ; i = j, is stochastically
independent.
Obviously, mutual stochastic independence implies pairwise stochastic independence, but not vice versa.

5.4

Summary and Conclusions

The primary objective of this chapter was to extend the ideas presented in
Chapter 4 for the single random variable to the multidimensional case, where
the outcome of interest involves two or more random variables simultaneously.
With such higher-dimensional random variables, it became necessary to introduce a new variety of pdfs dierent from, but still related to, the familiar one
encountered in Chapter 4: the joint pdf to characterize joint variation among
the variables; the marginal pdfs to characterize individual behavior of each
variable in isolation from others; and the conditional pdfs, to characterize the
behavior of one random variable conditioned upon xing the others at prespecied values. This new array of pdfs provide the full set of mathematical
tools for characterizing various aspects of multivariate random variables much
as the f (x) of Chapter 4 did for single random variables.
The possibility of two or more random variables co-varying simultaneously, which was not of concern with single random variables, led to the introduction of two additional and related quantities, co-variance and correlation,
with which one quanties the mutual dependence of two random variables.
This in turn led to the important concept of stochastic independence, that
one random variable is entirely unaected by another. As we shall see in subsequent chapters, when dealing with multiple random variables, the analysis
of joint behavior is considerably simplied if the random variables in question

164

Random Phenomena

are independent. We shall therefore have cause to recall some of the results of
this chapter at that time.
Here are some of the main points of this chapter again.
A multivariate random variable is dened in the same manner as a single
random variable, but the associated space, V , is higher-dimensional;
The joint pdf of a bivariate random variable, f (x1 , x2 ), shows how
the probabilities are distributed over the two-dimensional random variable space; the joint cdf, F (x1 , x2 ), represents the probability, P (X1 <
x1 ; X2 < x2 ); they both extend directly to higher-dimensional random
variables.
In addition to the joint pdf, two other pdfs are needed to characterize
multi-dimensional random variables fully:
Marginal pdf : fi (xi ) characterizes the individual behavior of each
random variable, Xi , by itself, regardless of the others;
Conditional pdf : f (xi |xj ) characterizes the behavior of Xi conditioned upon Xj taking on specic values.
These pdfs can be used to obtain such random variable characteristics
as joint, marginal and conditional expectations.
The covariance of two random variables, X1 and X2 , dened as
12 = E[(X1 X1 )(X2 X2 )]
(where X1 and X2 , are respective marginal expectations), provides
a measure of the mutual dependence of variations in X1 and X2 . The
related correlation coecient, the scaled quantity:
=

12
1 2

(where 1 and 2 are the positive square roots of the respective marginal
variances of X1 and X2 ), has the property that 1 1, with || indicating the strength of the mutual dependence, and the sign indicating
the direction (negative or positive).
Two random variables, X1 and X2 , are independent if the behavior of
one has no bearing on the behavior of the other; more formally,
f (x1 |x2 ) = f1 (x1 ); f (x2 |x1 ) = f2 (x2 );
so that,
f (x1 , x2 ) = f (x1 )f (x2 )

Multidimensional Random Variables

165

REVIEW QUESTIONS
1. What characteristic of the Avandia clinical test makes it relevant to the discussion of this chapter?
2. How many random variables at a time can the probability machinery of Chapter
4 deal with?
3. In dealing with several random variables simultaneously, what are some of the
questions to be considered that were not of concern when dealing with single random
variables in Chapter 4?
4. Dene a bivariate random variable formally.
5. Informally, what is a bivariate random variable?
6. Dene a multivariate random variable formally.
7. State the axiomatic denition of the joint pdf of a discrete bivariate random variable and of its continuous counterpart.
8. What is the general relationship between the cdf, F (x1 , x2 ), of a continuous bivariate random variable and its pdf, f (x1 , x2 )? What conditions must be satised
for this relationship to exist?
9. Dene the marginal distributions, f1 (x1 ) and f2 (x2 ), for a two-dimensional random variable with a joint pdf f (x1 , x2 ).
10. Do marginal pdfs possess the usual properties of pdfs or are they dierent?
11. Given a bivariate joint pdf, f (x1 , x2 ), dene the conditional pdfs, f (x1 |x2 ) and
f (x2 |x1 ).
12. In what way is the denition of a conditional pdf similar to the conditional
probability of events A and B dened on a sample space, ?
13. Dene the expectation, E[(U (X1 , X2 )], for a bivariate random variable. Extend
this to an n-dimensional (multivariate) random variable.
14. Dene the marginal expectation, E[(U (Xi )], for a bivariate random variable.
Extend this to an n-dimensional (multivariate) random variable.
15. Dene the conditional expectations, E[(U (X1 )|X2 ] and E[(U (X2 )|X1 ], for a bivariate random variable.
16. Given two random variables, X1 and X2 , dene their covariance.
17. What is the relationship between covariance and the correlation coecient?

166

Random Phenomena

18. What does a negative correlation coecient indicate about the relationship between two random variables, X1 and X2 ? What does a positive correlation coecient
indicate?
19. If the behavior of the random variable, X1 , has little bearing on that of X2 , how
will this manifest in the value of the correlation coecient, ?
20. When the correlation coecient of two random variables, X1 and X2 , is such
that || 1, what does this indicate about the random variables?
21. What does it mean that two random variables, X1 and X2 , are stochastically
independent?
22. If two random variables are independent, what is the value of their covariance,
and of their correlation coecient?
23. When dealing with n > 2 random variables, what is the dierence between
pairwise stochastic independence and mutual stochastic independence? Does one
always imply the other?

EXERCISES
Sections 5.1 and 5.2
5.1 Revisit Example 5.1 in the text and dene the two-dimensional random variable
(X1 , X2 ) as follows: X1 is the total number of heads, and X2 is the total number of tails. Obtain the space, V , and determine the complete pdf, f (x1 , x2 ), for
x1 = 0, 1, 2; x2 = 0, 1, 2, assuming equiprobable outcomes in the original sample
space.
5.2 The two-dimensional random variable (X1 , X2 ) has the following joint pdf:
f (1, 1) = 14 ;
f (1, 2) = 18 ;
1
f (1, 3) = 16
;

f (2, 1) =
f (2, 2) =
f (2, 3) =

3
8
1
8
1
16

(i) Determine the following probabilities: (a) P (X1 X2 ); (b) P (X1 + X2 = 4); (c)
P (|X2 X1 | = 1); (d) P (X1 + X2 is even).
(ii) Obtain the joint cumulative distribution function, F (x1 , x2 ).
5.3 In a game of chess, one player either wins, W , loses, L, or draws, D (either by
mutual agreement with the opponent, or as a result of a stalemate). Consider a
player participating in a two-game, pre-tournament qualication series:
(i) Obtain the sample space, .
(ii) Dene the two-dimensional random variable (X1 , X2 ) where X1 is the total
number of wins, and X2 is the total number of draws. Obtain V and, assuming
equiprobable outcomes in the original sample space, determine the complete joint
pdf, f (x1 , x2 ).
(iii) If the player is awarded 3 points for a win, 1 point for a draw and no point for a
loss, dene the random variable Y as the total number of points assigned to a player

Multidimensional Random Variables

167

at the end of the two-game preliminary round. If a player needs at least 4 points to
qualify, determine the probability of qualifying.
5.4 Revisit Exercise 5.3 above but this time consider three players: Suzie, the superior player for whom the probability of winning a game, pW = 0.75, the probability of
drawing, pD = 0.2 and the probability of losing, pL = 0.05; Meredith, the mediocre
player for whom pW = 0.5; pD = 0.3; PL = 0.2; and Paula, the poor player, for
whom pW = 0.2; pD = 0.3; PL = 0.5. Determine the complete joint pdf for each
player, fS (x1 , x2 ), for Suzie, fM (x1 , x2 ), for Meredith, and fP (x1 , x2 ), for Paula;
and from these, determine for each player, the probability that she qualies for the
tournament.
5.5 The continuous random variables X1 and X2 have the joint pdf

cx1 x2 (1 x2 ); 0 < x1 < 2; 0 < x2 < 1
f (x, y) =
0;
elsewhere

(5.93)

(i) Find the value of c if this is to be a valid pdf.


(ii) Determine P (1 < x1 < 2; 0.5 < x2 < 1) and P (x1 > 1; x2 < 0.5).
(iii) Determine F (x1 , x2 ).
5.6 Revisit Exercise 5.5.
(i) Obtain the marginal pdfs f1 (x1 ), f2 (x2 ), and the marginal means, X1 , X2 . Are
X1 and X2 independent?
(ii) Obtain the conditional pdfs f (x1 |x2 ) and f (x2 |x1 ).
5.7 The joint pdf f (x1 , x2 ) for a two-dimensional random variable is given by the
following table:
X1
X2
0
1
2

0
0
1/4

0
1/2
0

1/4
0
0

(i) Obtain the marginal pdfs, f1 (x1 ) and f2 (x2 ), and determine whether or not X1
and X2 are independent.
(ii) Obtain the conditional pdfs f (x1 |x2 ) and f (x2 |x1 ). Describe in words what these
results imply in terms of the original experiments and these random variables.
(iii) It is conjectured that this joint pdf is for an experiment involving tossing a fair
coin twice, with X1 as the total number of heads, and X2 as the total number of
tails. Are the foregoing results consistent with this conjecture? Explain.
5.8 Given the joint pdf:

f (x1 , x2 ) =

ce(x1 +x2 ) ;
0;

0 < x1 < 1; 0 < x2 < 2;


elsewhere

(5.94)

First obtain c, then obtain the marginal pdfs f1 (x1 ) and f2 (x2 ), and hence determine
whether or not X1 and X2 are independent.

168

Random Phenomena

5.9 If the range of validity of the joint pdf in Exercise 5.8 and Eq (5.94) are modied
to 0 < x1 < and 0 < x2 < , obtain c and the marginal pdf, and then determine
whether or not these random variables are now independent.
Section 5.3
5.10 Revisit Exercise 5.3. From the joint pdf determine
(i) E[U (X1 , X2 ) = X1 + X2 ].
(ii) E[U (X1 , X2 ) = 3X1 + X2 ]. Use this result to determine if the player will be
expected to qualify or not.
5.11 For each of the three players in Exercise 5.4,
(i) Determine the marginal pdfs, f1 (x1 ) and f2 (x2 ) and the marginal means
X1 , X2 .
(ii) Determine E[U (X1 , X2 ) = 3X1 + X2 ] and use the result to determine which of
the three players, if any, will be expected to qualify for the tournament.
5.12 Determine the covariance and correlation coecient for the two random variables whose joint pdf, f (x1 , x2 ) is given in the table in Exercise 5.7.
5.13 For each of the three chess players in Exercise 5.4, Suzie, Meredith, and Paula,
and from the joint pdf of each players performance at the pre-tournament qualifying
games, determine the covariance and correlation coecients for each player. Discuss
what these results imply in terms of the relationship between wins and draws for
each player.
5.14 The joint pdf for two random variables X and Y is given as:

x + y; 0 < x < 1; 0 < y < 1;
f (x, y) =
0;
elsewhere

(5.95)

(i) Obtain f (x|y and f (y|x) and show that these two random variables are not
independent.
(ii) Obtain the covariance, XY , and the correlation coecient, . Comment on the
strength of the correlation between these two random variables.

APPLICATION PROBLEMS
5.15 Refer to Application Problem 3.23 in Chapter 3, where the relationship between
a blood assay used to determine lithium concentration in blood samples and lithium
toxicity in 150 patients was presented in a table reproduced here for ease of reference.

Assay
A+
A
Total

Lithium
L+
30
21
51

Toxicity
L
17
82
92

Total
47
103
150

A+ indicates high lithium concentrations in the blood assay and A indicates


low lithium concentration; L+ indicates conrmed Lithium toxicity and L indicates
no lithium toxicity.

Multidimensional Random Variables

169

(i) In general, consider the assay result as the random variable Y having two possible
outcomes y1 = A+ , and y2 = A ; and consider the true lithium toxicity status as
the random variable X also having having two possible outcomes x1 = L+ , and
x2 = L . Now consider that the relative frequencies (or proportions) indicated in
the data table can be approximately considered as close enough to true probabilities;
convert the data table to a table of joint probability distribution f (x, y). What is
the probability that the test method will produce the right result?
(ii) From the table of the joint pdf, compute the following probabilities and explain what they mean in words in terms of the problem at hand: f (y2 |x2 ); f (y1 |x2 );
f (y2 |x1 ).
5.16 The reliability of the temperature control system for a commercial, highly
exothermic polymer reactor presented in Example 5.2 in the text is known to depend
on the lifetimes (in years) of the control hardware electronics, X1 , and of the control
valve on the cooling water line, X2 ; the joint pdf is:
1 (0.2x +0.1x )
1
2
; 0 < x1 <
50 e
f (x1 , x2 ) =
0 < x2 <

0
elsewhere
(i) Determine the probability that the control valve outlasts the control hardware
electronics.
(ii) Determine the converse probability that the controller hardware electronics outlast the control valve.
(iii) If a component is replaced every time it fails, how frequently can one expect to
replace the control valve, and how frequently can one expect to replace the controller
hardware electronics?
(iv) If it costs $20,000 to replace the control hardware electronics and $10,000 to
replace the control valve, how much should be budgeted over the next 20 years for
keeping the control system functioning, assuming all other characteristics remain
essentially the same over this period?
5.17 In a major bio-vaccine research company, it is inevitable that workers are exposed to some hazardous, but highly treatable, disease causing agents. According
to papers led with the Safety and Hazards Authorities of the state in which the
facility is located, the treatment provided is tailored to the workers age, (the variable, X: 0 if younger than 30 years; 1 if 31 years or older), and location in the
facility (a surrogate for virulence of the proprietary strains used in various parts of
the facility, represented by the variable Y = 1, 2, 3 or 4. The composition of the
2,500 employees at the companys research headquarters is shown in the table below:
Location
Age
< 30
31

6%
17%

20%
14%

13%
12%

10%
8%

(i) If a worker is infected at random so that the outcome is the bivariate random
variable (X, Y ) where X has two outcomes, and Y has four, obtain the pdf f (x, y)
from the given data (assuming each worker in each location has an equal chance of
infection); and determine the marginal pdfs f1 (x) and f2 (y).

170

Random Phenomena

(ii) What is the probability that a worker in need of treatment was infected in
location 3 or 4 given that he/she is < 30 years old?
(iii) If the cost of the treating each infected worker (in dollars per year) is given by
the expression
C = 1500 100Y + 500X
(5.96)
how much should the company expect to spend per worker every year, assuming the
worker composition remains the same year after year?
5.18 A non-destructive quality control test on a military weapon system correctly
detects a aw in the central electronic guidance subunit if one exists, or correctly
accepts the system as fully functional if no aw exists, 85% of the time; it incorrectly
identies a aw when one does not exist (a false positive), 5% of the time, and
incorrectly fails to detect a aw when one exists (a false negative), 10% of the time.
When the test is repeated 5 times under mostly identical conditions, if X1 is the
number of times the test is correct, and X2 is the number of times it registers a false
positive, the joint pdf of these two random variables is given as:
f (x1 , x2 ) =

120
0.85x1 0.05x2
x1 !x2 !

(5.97)

(i) Why is no consideration given in the expression in Eq (5.97) to the third random
variable, X3 , the number of times the test registers a false negative?
(ii) From Eq (5.97), generate a 5 5 table of f (x1 , x2 ) for all the possible outcomes
and from this obtain the marginal pdfs, f1 (x1 ) and f2 (x2 ). Are these two random
variables independent?
(iii) Determine the expected number of correct test results regardless of the other
results; also determine the expected value of false positives regardless of other results.
(iv) What is the expected number of the total number of correct results and false
positives? Is this value the same as the sum of the expected values obtained in (iii)?
Explain.

Chapter 6
Random Variable Transformations

6.1
6.2

6.3
6.4

6.5

Introduction and Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Single Variable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Discrete Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2 Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.3 General Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.4 Random Variable Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Cumulative Distribution Function Approach . . . . . . . . . . . . . .
The Characteristic Function Approach . . . . . . . . . . . . . . . . . . . . . . . . .
Bivariate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Multivariate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Square Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.2 Non-Square Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.3 Non-Monotone Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REVIEW QUESTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPLICATION PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

171
172
173
173
175
176
177
177
179
181
184
184
185
188
188
189
190
192

From a god to a bull! a heavy descension!


it was Joves case.
From a prince to a prentice!
a low transformation!
that shall be mine; for in every thing
the purpose must weigh with the folly. Follow me, Ned.
King Henry the Fourth,
William Shakespeare (15641616)

Many problems of practical interest involve a random variable Y that is dened as a function of another random variable X, say according to Y = (X),
so that the characteristics of the one arise directly from those of the other via
the indicated transformation. In particular, if we already know the probability
distribution function for X as fX (x), it will be helpful to know how to determine the corresponding distribution function for Y . This chapter presents
techniques for characterizing functions of random variables, and the results,
important in their own right, become particularly useful in Part III where
probability models are derived for random phenomena of importance in engineering and science.
171

172

6.1

Random Phenomena

Introduction and Problem Definition

The problem of primary interest to us in this chapter may be stated as


follows:

Given a random variable X with pdf fX (x), we are interested in


deriving an expression for the corresponding pdf fY (y) for the
random variable Y related to X according to:
Y = (X)

(6.1)

More generally, given the n-dimensional random variable X =


(X1 , X2 , . . . , Xn ) with the joint pdf fX (x), we want to nd the corresponding
pdf fY (y), for the m-dimensional random variable Y = (Y1 , Y2 , . . . , Ym ) when
the two are related according to:
=

1 (X1 , X2 , . . . , Xn );

Y2 =
... =

2 (X1 , X2 , . . . , Xn );
...

Ym

m (X1 , X2 , . . . , Xn )

Y1

(6.2)

As demonstrated in later chapters, these results are extremely useful in deriving probability models for more complicated random variables from the
probability models of simpler ones.

6.2

Single Variable Transformations

We begin with the simplest case when Y is a function of a single variable


X
Y = (X); X VX

(6.3)

is a continuous function that transforms each point x in VX , the space over


which the random variable X is dened, to y, thereby mapping VX onto the
corresponding space VY for the resulting random variable Y . Furthermore, this
transformation is one-to-one in the sense that each point in VX corresponds
to one and only one point in VY . In this case, the inverse transformation,
X = (Y ); Y VY

(6.4)

Random Variable Transformations

173

exists and is also one-to-one. The procedure for obtaining fY (y) given fX (x)
is highly dependent on the nature of the random variable in question, being
more straightforward for the discrete case than for the continuous.

6.2.1

Discrete Case

When X is a discrete random variable, we have


fY (y) = P (Y = y) = P (X = (y)) = fX [(y)]; Y VY

(6.5)

We illustrate this straightforward result rst with the following simple example.
Example 6.1 LINEAR TRANSFORMATION OF A POISSON
RANDOM VARIABLE
As discussed in more detail in Part III, the discrete random variable X
having the following pdf:
fX (x) =

x e
; x = 0, 1, 2, 3, . . .
x!

(6.6)

is a Poisson random variable; it provides a useful model of random


phenomena involving the occurrence of rare events in a nite interval
of length, time or space. Find the pdf fY (y) for the random variable Y
related to X according to:
Y = 2X.
(6.7)
Solution:
First we note that the transformation in Eq (6.7) is one-to-one, mapping
VX = {0, 1, 2, 3, . . .} onto VY = {0, 2, 4, 6, . . .}; the inverse transformation is:
1
(6.8)
X= Y
2
so that from Eq (6.5) we obtain:
fY (y)

P (Y = y) = P (X = y/2)

y/2 e
; y = 0, 2, 4, 6, . . .
(y/2)!

(6.9)

Thus, under the transformation Y = 2X and fX (x) as given in Eq (6.6),


the desired pdf fY (y) is given by:
fY (y) =

y/2 e
; y = 0, 2, 4, 6, . . .
(y/2)!

(6.10)

A Practical Application
The number of times, X, that each cell in a cell culture divides in a time
interval of length, t, is a random variable whose specic value depends on
many factors both intrinsic (e.g. individual cell characteristics) and extrinsic

174

Random Phenomena

(e.g. media characteristics, temperature, oxygen). As discussed in Chapter 8,


the underlying random phenomenon matches well to that of the ideal Poisson
random variable, so that if is the mean rate of division per unit time associated with a particular cell population, the probability distribution of X is
given by Eq (6.11)
fX (x) =

(t)x e(t)
; x = 0, 1, 2, 3, . . .
x!

(6.11)

where, in terms of Eq (6.6),


= t

(6.12)

In many cases, however, the cell culture characteristic of interest is not so


much the number of times that each cell divides as it is the number of cells,
Y , in the culture after the passage of a specic amount of time. For each cell
in the culture, the relationship between these two random variables is given
by:
(6.13)
Y = 2X
The problem of interest is now to nd fY (y), the pdf of the number of cells
in the culture, given fX (x).
As with the simple example given above, note that the transformation in
(6.13), even though nonlinear, is one-to-one, mapping VX = {0, 1, 2, 3, . . .}
onto VY = {1, 2, 4, 8, . . .}; the inverse transformation is:
X = log2 Y

(6.14)

From here, we easily obtain the required fY (y) as:


fY (y) =

e log2 y
; y = 1, 2, 4, 8, . . .
(log2 y)!

(6.15)

a somewhat unwieldy-looking, but nonetheless valid, pdf that can be simplied


a bit by noting that:
log2 y = y log2 = y
(6.16)
if we dene
= log2

(6.17)

a logarithmic transformation of the Poisson parameter . Thus:


e(2 ) y
; y = 1, 2, 4, 8, . . .
(log2 y)!

fY (y) =

(6.18)

It is possible to conrm that the pdf obtained in Eq (6.18) for Y , the number
of cells in the culture after a time interval t is a valid pdf for which:

fY (y) = 1
(6.19)
y

Random Variable Transformations

175

since, from Eq (6.18)




fY (y)

= e

(2 )


22
23
2
+
+
+ ...
1+
1
2!
3!

= e(2 ) e(2

=1

(6.20)

The mean number of cells in the culture after time t, E[Y ], can be shown (see
end-of-chapter Exercise 6.2) to be:
E[Y ] = e

(6.21)

which should be compared with E[X] = .

6.2.2

Continuous Case

When X is a continuous random variable, things are slightly dierent. In


addition to the inverse transformation given in Eq (23.50), let us dene the
function:
d
[(y)] =  (y)
J=
(6.22)
dy
known as the Jacobian of the inverse transformation. If the transformation is
such that it is strictly monotonic (increasing or decreasing), then it can be
shown that:
fY (y) = fX [(y)]|J|; Y VY
(6.23)
The argument goes as follows: If FY (y) is the cdf for the new variable Y , then
FY (y) = P (Y y) = P ((X) y) = P (X (y)) = FX [(y)]

(6.24)

and by dierentiation, we obtain:


fY (y) =

dFY (y)
d
d
=
{FX [(y)]} = fX [(y)] {(y)}
dy
dy
dy

(6.25)

with the derivative on the RHS positive for a strictly monotonic increasing
function. It can be shown that if were monotonically decreasing, the expression in (6.24) will yield:
fY (y) = fX [(y)]

d
{(y)}
dy

(6.26)

with the derivative on the RHS as a negative quantity. Both results may be
combined into one as



d

fY (y) = fX [(y)]  {(y)}
(6.27)
dy
as presented in Eq (6.23). Let us illustrate this with another example.

176

Random Phenomena
Example 6.2 LOG TRANSFORMATION OF A UNIFORM
RANDOM VARIABLE
The random variable X with the following pdf:

1; 0 < x < 1
(6.28)
fX (x) =
0; otherwise
is identied in Part III as the uniform random variable. Determine the
pdf for the random variable Y obtained via the transformation:
Y = ln X

(6.29)

Solution:
The transformation is one-to-one, maps VX = {0 < x < 1} onto VY =
{0 < y < }, and the inverse transformation is given by:
X = (y) = ey/ ; 0 < y < .

(6.30)

The Jacobian of the inverse transformation is:


1
J =  (y) = ey/

(6.31)

Thus, from Eq (6.23) or Eq (6.27), we obtain the required pdf as:



fY (y) = fX [(y)]|J| =

1 y/
e
;

0;

0<y<
otherwise

(6.32)

These two random variables and their corresponding models are discussed
more fully in Part III.

6.2.3

General Continuous Case

When the transformation Y = (X) is not strictly monotone, the result


given above is modied as follows: Let the function y = (x) possess a countable number of roots, xi , represented as a function of y as:
xi = 1
i (y) = i (y); i = 1, 2, 3, . . . , k

(6.33)

with corresponding Jacobians:


Ji =

d
{i (y)}
dy

(6.34)

then it can be shown that Eq (6.23) (or equivalently Eq (6.27)) becomes:


fY (y) =

k

i=1

Let us illustrate with an example.

fX [i (y)]|Ji |

(6.35)

Random Variable Transformations

177

Example 6.3 THE SQUARE OF A STANDARD NORMAL


RANDOM VARIABLE
The random variable X has the following pdf:
2
1
fX (x) = ex /2 ; < x <
2

(6.36)

Determine the pdf for the random variable Y obtained via the transformation:
(6.37)
y = x2
Solution:
Observe that this transformation, which maps the space VX =
< x < onto VY = 0 < y < , is not one-to-one; for all y > 0
there are two xs corresponding to each y, since the inverse transformation is given by:

(6.38)
x= y
The transformation thus has 2 roots for x:

x1

1 (y) =

x2

2 (y) = y

(6.39)

and upon computing the corresponding derivatives, Eq (6.35) becomes


y 1/2
y 1/2
+ fX ( y)
fY (y) = fX ( y)
2
2

(6.40)

which simplies to:


1
fY (y) = ey/2 y 1/2 ; 0 < y <
2

(6.41)

This important result is used later.

6.2.4

Random Variable Sums

Let us consider rst the case where the random variable transformation
involves the sum of two independent random variables, i.e.,
Y = (X1 , X2 ) = X1 + X2

(6.42)

where f1 (x1 ) and f2 (x2 ), are, respectively, the known pdfs of the random
variables X1 and X2 . Two approaches are typically employed in nding the
desired fY (y):
The cumulative distribution function approach;
The characteristic function approach.

178

Random Phenomena
x2

y= x1 + x2
Vy={(x1,x2): x1 + x2 y}
x1

FIGURE 6.1: Region of interest, VY , for computing the cdf of the random variable Y
dened as a sum of 2 independent random variables X1 and X2

The Cumulative Distribution Function Approach


This approach requires rst obtaining the cdf FY (y) (as argued in Eq
(6.24)), from where the desired pdf is obtained by dierentiation when Y is
continuous. In this case, the cdf FY (y) is obtained as:
 
FY (y) = P (Y y) =
f (x1 , x2 )dx1 dx2
(6.43)
VY

where f (x1 , x2 ) is the joint pdf of X1 and X2 , and, most importantly, the
region over which the double integration is being carried out, VY , is given by:
VY = {(x1 , x2 ) : x1 + x2 y}

(6.44)

as shown in Fig 6.1. Observe from this gure that the integration may be
carried out several dierent ways: if we integrate rst with respect to x1 , the
limits go from until we reach the line, at which point x1 = y x2 ; we then
integrate with respect to x2 from to . In this case, Eq (6.43) becomes:
  yx2
f (x1 , x2 )dx1 dx2
(6.45)
FY (y) =

from where we may dierentiate with respect to y to obtain:



fY (y) =
f (y x2 , x2 )dx2

(6.46)

In particular, if X1 and X2 are independent so that the joint pdf is a product


of the individual marginal pdfs, we obtain:

f1 (y x2 )f2 (x2 )dx2
(6.47)
fY (y) =

Random Variable Transformations

179

If, instead, the integration in Eq (6.43) had been done rst with respect to x2
and then with respect to x1 , the resulting dierentiation would have resulted
in the alternative, and entirely equivalent, expression:

fY (y) =

f2 (y x1 )f1 (x1 )dx1

(6.48)

Integrals of this nature are known as convolutions of the functions f1 (x1 ) and
f2 (x2 ) and this is as far as we can go with a general discussion.
Thus, we have the general result that the pdf of the random variable
Y obtained as a sum of two independent random variables X1 and X2 is a
convolution of the two contributing pdfs f1 (x1 ) and f2 (x2 ) as shown in Eqs
(6.47) and (6.48).
Let us illustrate this with a classic example.
Example 6.4 THE SUM OF TWO EXPONENTIAL RANDOM VARIABLES
Given two stochastically independent random variables X1 and X2 with
pdfs:
1
f1 (x1 ) = ex1 / ; 0 < x1 <
(6.49)

f2 (x2 ) =

1 x2 /
e
; 0 < x2 <

(6.50)

Determine the pdf of the random variable Y = X1 + X2 .


Solution:
In this case, the required pdf is obtained from the convolution:
fY (y) =

1
2

e(yx2 )/ ex2 / dx2 ; 0 < y <

(6.51)

However, because x2 is non-negative, as x1 = y x2 must also be, the


limits on the integral have to be restricted to go from x2 = 0 to x2 = y;
so that:
 y
1
fY (y) = 2
e(yx2 )/ ex2 / dx2 ; 0 < y <
(6.52)
0
Upon carrying out the indicated integral, we obtain the nal result:
fY (y) =

1 y/
ye
;0 < y <
2

(6.53)

Observe that the result presented above for the sum of two random variables extends directly to the sum of more than two random variables by successive additions. However, this procedure becomes rapidly more tedious as we
must carry out repeated convolution integrals over increasingly more complex
regions.

180

Random Phenomena

The Characteristic Function Approach


It is far more convenient to employ the characteristic function to determine the pdf of random variable sums, continuous or discrete. The pertinent
result is from a property discussed earlier in Chapter 4: for independent random variables X1 and X2 with respective characteristic functions X1 (t) and
X2 (t), the characteristic function of their sum Y = X1 + X2 is given by:
Y (t) = X1 (t)X2 (t)

(6.54)

In general, for n independent random variables Xi ; i = 1, 2, . . . , n, each with


respective characteristic functions, Xi (t), if
Y = X1 + X2 + . . . + Xn

(6.55)

Y (t) = X1 (t)X2 (t) Xn (t)

(6.56)

then
The utility of this result lies in the fact that Y (t) is easily obtained from each
contributing Xi (t); the desired fY (y) is then recovered from Y (t) either by
inspection (when this is obvious), or else by the inversion formula presented
in Chapter 4.
Let us illustrate this with the same example used above.
Example 6.5 THE SUM OF TWO EXPONENTIAL RANDOM VARIABLES REVISITED
Using characteristic functions, determine the pdf of the random variable
Y = X1 + X2 , where the pdfs of the two stochastically independent random variables X1 and X2 are as given in Example 6.4 above and their
characteristic functions are given as:
X1 (t) = X2 (t) =

1
(1 jt)

(6.57)

Solution:
From Eq (6.54), the required characteristic function for the sum is:
Y (t) =

1
(1 jt)2

(6.58)

At this point, anyone familiar with specic random variable pdfs and
their characteristic functions will recognize this particular form right
away: it is the pdf of a gamma random variable, specically (2, ),
as Chapter 9 shows. However, since we have not yet introduced these
important random variables, their pdfs and characteristic functions (see
Chapter 9), we therefore do not expect the reader to be able to deduce
the pdf corresponding to Y (t) above by inspection. In this case we can
invoke the inversion formula of Chapter 4 to obtain:

1
ejyt Y (t)dt
fY (y) =
2

1
ejyt
=
dt
(6.59)
2 (1 jt)2

Random Variable Transformations

181

Upon carrying out the indicated integral, we obtain the nal result:
fY (y) =

1 y/
ye
;0 < y <
2

(6.60)

In general, it is not necessary to carry out the inversion integration explicitly


once one becomes familiar with characteristic functions of various pdfs. (To
engineers familiar with the application of Laplace transforms to the solution of
ordinary dierential equations, this is identical to how tables of inverse transforms have eliminated the need for explicitly carrying out Laplace inversions.)
This point is illustrated in the next anticipatory example (and in subsequent
chapters).
Example 6.6 REPRODUCTIVE PROPERTY OF GAMMA
RANDOM VARIABLE
A random variable, X, with the following pdf
f (x) =

1
ex/ x1 ; 0 < x <
()

(6.61)

is identied in Chapter 9 as a gamma random variable with parameters


and . Its characteristic function is:
X (t) =

1
(1 jt)

(6.62)

Find the pdf of the random variable Y dened as the sum of the n
independent such random variables, Xi , each with dierent parameters
i but with the same parameter .
Solution:
The desired transformation is
Y =

n


Xi

(6.63)

i=1

and from the given individual characteristic functions for each Xi , we


obtain the required characteristic function for the sum Y as:
Y (t) =

n

i=1

Xi (t) =

1
(1 jt)

(6.64)


where = n
i=1 i . Now, by comparing Eq (6.62) with Eq (6.64), we
see immediately the important result that Y is also a gamma random
variable, with parameters and . Thus, this sum of gamma random
variables begets another gamma random variable, a result generally
known as the reproductive property of the gamma random variable.

182

6.3

Random Phenomena

Bivariate Transformations

Because of its many practical applications, it is instructive to consider rst


the bivariate case before taking on the full multivariate problem. In this case
we are concerned with determining the joint pdf fY (y) for the 2-dimensional
random variable Y = (Y1 , Y2 ) obtained from the 2-dimensional random variable X = (X1 , X2 ) with the known joint pdf fX (x), via the following bivariate
transformation:
Y1 = 1 (X1 , X2 )
Y2 = 2 (X1 , X2 )

(6.65)

Y = (X)

(6.66)

written more compactly as:


As in the single variable case, we consider rst the case where these functions
are continuous and collectively dene a one-to-one transformation that maps
the two-dimensional space VX in the x1 x2 plane to the two-dimensional
space VY in the y1 y2 plane. The inverse bivariate transformation is given
by:
X1 = 1 (Y1 , Y2 )
X2 = 2 (Y1 , Y2 )

(6.67)

X = (Y)

(6.68)

or, more compactly,


The 2 2 determinant given by:



J = 


x1
y1

x1
y2

x2
y1

x2
y2








(6.69)

is the Jacobian of this bivariate inverse transformation, and so long as J does


not vanish identically in VY , it can be shown that the desired joint pdf for Y
is given by:
(6.70)
fY (y) = fX [(y)]|J|; y VY
where the similarity with Eq (6.27) should not be lost on the reader. The
following is a classic example typically used to illustrate this result.
Example 6.7 RELATING GAMMA AND BETA RANDOM
VARIABLES
Given two stochastically independent random variables X1 and X2 with
pdfs:
1
(6.71)
f1 (x1 ) =
x1 ex1 ; 0 < x1 <
() 1

Random Variable Transformations

183

1
x1 ex2 ; 0 < x2 <
(6.72)
() 1
Determine both the joint and the marginal pdfs for the two random
variables Y1 and Y2 obtained via the transformation:
f2 (x2 ) =

Y1 = X1 + X2
X1
Y2 =
X1 + X2

(6.73)

Solution:
First, by independence, the joint pdf for X1 and X2 is:
1
x1 x11 ex1 ex2 ; 0 < x1 < ; 0 < x2 <
()() 1
(6.74)
Next, observe that the transformation in Eq (6.73) is a one-to-one
mapping of VX , the positive quadrant of the x1 x2 plane, onto
VY = {(y1 , y2 ); 0 < y1 < , 0 < y2 < 1}; the inverse transformation is
given by:
fX (x1 , x2 ) =

x1

y1 y2

x2

y1 (1 y2 )

and the Jacobian is obtained as



 y2
J = 
1 y2


y1 
= y1
y1 

(6.75)

(6.76)

It vanishes at the point y1 = 0, but this is a point of probability measure


0 that can be safely excluded from the space VY . Thus, from Eq (14.32),
the joint pdf for Y1 and Y2 is:

1
1
[y1 (1 y2 )]1 ey1 y1 ; 0 < y1 < ;
()() (y1 y2 )
fY (y1 , y2 ) =
0 < y2 < 1;

0
otherwise
(6.77)
This may be rearranged to give:



 1
1

y2 (1 y2 )1 ) ey1 y1+1 ; 0 < y1 < ;


()()
fY (y1 , y2 ) =
0 < y2 < 1;

0
otherwise
(6.78)
an equation which, apart from the constant, factors out into separate
and distinct functions of y1 and y2 , indicating that the random variables
Y1 and Y2 are independent.
By denition, the marginal pdf for Y2 is obtained by integrating out
y1 in Eq (6.78) to obtain

1
ey1 y1+1 dy1
(6.79)
f2 (y2 ) =
y21 (1 y2 )1 )
()()
0
Recognizing the integral as the gamma function, i.e.,

(a) =
ey y a1 dy
0

(6.80)

184

Random Phenomena
we obtain:
f2 (y2 ) =

( + ) 1
y
(1 y2 )1 ; 0 < y2 < 1
()() 2

(6.81)

Since, by independence,
fY (y1 , y2 ) = f1 (y1 )f2 (y2 )

(6.82)

it follows from Eqs (6.78), (6.71) or (6.72), and Eq (15.82) that the
marginal pdf for Y1 is given by:
f1 (y1 ) =

1
ey1 y1+1 ; 0 < y1 <
( + )

(6.83)

Again, we refer to these results later in Part III.

6.4

General Multivariate Transformations

As introduced briey earlier, the general multivariate case is concerned


with determining the joint pdf fY (y) for the m-dimensional random variable Y= (Y1 , Y2 , . . . , Ym ) arising from a transformation of the n-dimensional
random variable X= (X1 , X2 , . . . , Xn ) according to:
=
=

1 (X1 , X2 , . . . , Xn );
2 (X1 , X2 , . . . , Xn );

... =
Ym =

...
m (X1 , X2 , . . . , Xn )

Y1
Y2

given the joint pdf fX (x).

6.4.1

Square Transformations

When n = m and the transformation is one-to-one, and the inverse transformation:


=
=

1 (y1 , y2 , . . . , yn );
2 (y1 , y2 , . . . , yn );

... =
xn =

...
n (y1 , y2 , . . . , yn )

x1
x2

(6.84)

or, more compactly,


X = (Y)

(6.85)

Random Variable Transformations


yields the square n n determinant:
 x
x1
1

 y1 y2


 x2 x2
J =  y1 y2
..
 ..
 .
.
 xn xn
 y
y
1

x1
yn

..
.

x2
yn

..
.

xn
yn














185

(6.86)

And now, as in the bivariate case, it can be shown that for a J that is non-zero
anywhere in VY , the desired joint pdf for Y is given by:
fY (y) = fX [(y)]|J|; y VY

(6.87)

an expression that is identical in every way to Eq (14.32) except for the dimensionality, and similar to the single variate result in Eq (6.23). Thus for
the square transformation in which n = m, the required result is a direct
generalization of the bivariate result, identical in structure, diering only in
dimensionality.

6.4.2

Non-Square Transformations

The case with n = m presents two dierent problems:


1. n < m; the overdened transformation in which there are more new
variables Y than the original X variables;
2. n > m; the underdened transformation in which there are fewer new
variables Y than the original X variables.
In the overdened problem, it should be easy to see that there can be
no exact inverse transformation except under some special, very restrictive
circumstances, in which the extra (m n) Y variables are merely redundant
and can be expressed as functions of the other n. This problem is therefore of
no practical interest: the general case has no exact solution; the special case
reverts to the already solved square n n problem.
With the underdened problem the more common of the two the
strategy is to augment the m equations with an additional (m n), usually
simple, variable transformations chosen such that an inverse transformation
exists. Having thus squared the problem, the result in Eq (6.87) may then
be applied to obtain a joint pdf for the augmented Y variables. The nal step
involves integrating out the extraneous variables. This is best illustrated with
some examples.
Example 6.8 SUM OF TWO STANDARD NORMAL RANDOM VARIABLES
Given two stochastically independent random variables X1 and X2 with
pdfs:
x2
1
1
f1 (x1 ) = e 2 ; < x1 <
(6.88)
2

186

Random Phenomena
2

x2
1
f2 (x2 ) = e 2 ; < x2 <
2

(6.89)

determine the pdf of the random variable Y obtained from their sum,
Y = X1 + X2

(6.90)

Solution:
First, observe that even though this is a sum, so that we could invoke
earlier results to handle this problem, Eq (6.90) is also an underdetermined transformation from two dimensions in X1 and X2 to one in Y .
To square the transformation, let the variable in Eq (6.90) now be Y1
and add another one, say Y2 = X1 X2 , to give:
Y1

X1 + X2

Y2

X1 X2

(6.91)

which is now square, and one-to-one. The inverse transformation is:


x1

x2

1
(y1 + y2 )
2
1
(y1 y2 )
2

(6.92)

and a Jacobian, J = 1/2.


By independence, the joint pdf for X1 and X2 is given by:
1
e
fX (x1 , x2 ) =
2

 2

x +x2
1

; < x1 < ; < x2 <

(6.93)

and from Eq (6.87), the joint pdf for Y1 and Y2 is obtained as:


fY (y1 , y2 ) =

1 1
e
2 2

(y1 +y2 )2 +(y1 y2 )2


8

; < y1 < ; < y2 <


(6.94)

which rearranges easily to:


1
e
fY (y1 , y2 ) =
4

 2
y
1
4

 2
y
2
4

; < y1 < ; < y2 < (6.95)

And now, either by inspection (this is a product of two clearly identiable, separate and distinct functions of y1 and y2 , indicating that the
two variables are independent), or by integrating out y2 in Eq (6.95),
one easily obtains the required marginal pdf for Y1 as:
2

y1
1
f1 (y1 ) = e 4 ; < y1 <
2

(6.96)

In the next example we derive one more important result and illustrate the
seriousness of the requirement that the Jacobian of the inverse transformation
not vanish anywhere in VY .

Random Variable Transformations

187

Example 6.9 RATIO OF TWO STANDARD NORMAL RANDOM VARIABLES


Given two stochastically independent random variables X1 and X2 with
pdfs:
x2
1
1
f1 (x1 ) = e 2 ; < x1 <
(6.97)
2
2

x2
1
f2 (x2 ) = e 2 ; < x2 <
2

(6.98)

determine the pdf of the random variable Y obtained from their ratio,
Y = X1 /X2

(6.99)

Solution:
Again, because this is an underdetermined transformation, we must rst
augment it with another one, say Y2 = X2 , to give:
Y1

Y2

X1
X2
X2

(6.100)

which is now square, one-to-one, and with the inverse transformation:

The Jacobian,

x1

y1 y2

x2

y2


 y2
J = 
0

(6.101)


y1 
= y2
1 

(6.102)

vanishes at the single point y2 = 0, however; and even though this is a


point of probability measure zero, the observation is worth keeping in
mind.
From Example 6.8 above, the joint pdf for X1 and X2 is given by:
fX (x1 , x2 ) =

1
e
2


 2
x1 +x2
2
2

; < x1 < ; < x2 <

(6.103)

from where we now obtain the joint pdf for Y1 and Y2 as:
fY (y1 , y2 ) =

1
|y2 |e
2

 2 2
2
y1 y2 +y2
2

< y1 < ;

(6.104)

< y2 < 0; 0 < y2 <

The careful reader will notice two things: (i) the expression for fY involves not just y2 , but its absolute value |y2 |; and (ii) that we have
excluded the troublesome point y2 = 0 from the space VY . These two
points are related: to the left of the point y2 = 0, |y2 | = y2 ; to the
right, |y2 | = y2 , so that these two regions must be treated dierently in
evaluating the integral.

188

Random Phenomena
To obtain the marginal pdf for y1 we now integrate out y2 in Eq
(6.104) over the appropriate region in VY as follows:



0
(y12 +1)y22
(y12 +1)y22
1

2
2
f1 (y1 ) =
y2 e
dy2 +
y2 e
dy2
2

0
(6.105)
which simplies to:


1
1
; < y1 <
(6.106)
f1 (y1 ) =
(1 + y12 )
as the required pdf. It is important to note that in carrying out the
integration implied in (6.105), the nature of the absolute value function, |y2 |, naturally forced us to exclude the point y2 = 0 because it
made it impossible for us to carry out the integration from to
under a single integral. (Had the integral involved not |y2 |, but y2 , as
an instructive exercise, the reader should try to evaluate the resulting
integral from to . See Exercise 6.9.)

6.4.3

Non-Monotone Transformations

In general, when the multivariate transformation y = (x) may be nonmonotone but has a countable number of roots k, when written as the matrix
version of Eq (6.33), i.e.,
xi = 1
i (y) = i (y); i = 1, 2, 3, . . . , k

(6.107)

if each inverse transformation i is square, with a non-zero Jacobian Ji , then


it can be shown that:
fY (y) =

k


fX [i (y)]|Ji |

(6.108)

i=1

which is a multivariate extension of Eq (6.35).

6.5

Summary and Conclusions

We have focussed attention in this chapter on the single problem of determining the pdf, fY (y), of a random variable Y that has been dened as
a function of another random variable, X, whose pdf fX (x) is known. As is
common with problems of such general construct, the approach used to determine the desired pdf depends on the nature of the random variable, as
well as the nature of the problem itselfin this particular case, the problem

Random Variable Transformations

189

being generally more straightforward to solve for discrete random variables


that for continuous ones. When the transformation involves random variable
sums, it is much easier to employ the method of characteristic functions, regardless of whether the random variables involved are discrete or continuous.
But beyond the special care that must be taken for continuous non-monotone
transformations, the underlying principle is the same for all cases and is fairly
straightforward.
The primary importance of this chapter lies in the fact that it provides one
of the tools (and much of the foundational results) employed routinely in deriving probability models for some of the more complex random phenomena.
We will therefore rely on much of this chapters material in subsequent chapters, especially in Chapters 8 and 9 where we derive models for a wide variety
of specic randomly varying phenomena. As such, the reader is encouraged
to tackle a good number of the exercises and problems found at the end of
this chapter; solving these problems will make the upcoming discussions much
easier to grasp at a fundamental level.
Here are some of the main points of the chapter again.
Given a random variable X with pdf fX (x), and the random variable
transformation, Y = (X), the corresponding pdf fY (y) for the random
variable Y is obtained directly from the inverse transformation, X =
(Y ) for the discrete random variable; for continuous random variables,
d
[(Y )], is required in
the Jacobian of the inverse transformation, J = dy
addition.
When the transformation (X) involves sums, it is more convenient to
employ the characteristic function of X to determine fY (y).
When the transformation (X) is non-monotone, fY (y) will consist of a
sum of k components, where k is the total number of roots of the inverse
transformation.
When multivariate transformations are represented in matrix form, the
required results are matrix versions of the results obtained for single
variable transformations.

REVIEW QUESTIONS
1. State, in mathematical terms, the problem of primary interest in this chapter.
2. What are the results of this chapter useful for?
3. In single variable transformations, where Y = (X) is given along with fX (x),
and fY (y) is to be determined, what is the dierence between the discrete case of
this problem and the continuous counterpart?

190

Random Phenomena

4. What is the Jacobian of the single variable inverse transformation?


5. In determining fY (y) given fX (x) and the transformation Y = (X), how does
one handle the case where the transformation is not strictly monotone?
6. Which two approaches were presented in this chapter for nding pdfs of random
variable sums? Which of the two is more convenient?
7. What is meant by the convolution of two functions, f1 (x1 ) and f2 (x2 )?
8. Upon what property of characteristic functions is the characteristic function approach to the determination of the pdf of random variable sums based?
9. What is the Jacobian of a multivariate inverse transformation?
10. How are non-square transformations handled?

EXERCISES
6.1 The pdf of a random variable X is given as:
f (x) = p(1 p)x1 ; x = 1, 2, 3, . . . ,

(6.109)

(i) Obtain the pdf for the random variable Y dened as


Y =

1
X

(6.110)

(ii) Given that E(X) = 1/p, obtain E(Y ) and compare it to E(X).
6.2 Given the pdf shown in Eq (6.18) for the transformed variable, Y , i.e.,

fY (y) =

e(2 ) y
; y = 1, 2, 4, 8, . . .
(log2 y)!

show that E(Y ) = e and hence conrm Eq (6.21).


6.3 Consider the random variable, X, with the following pdf:
 1 x/
e
; 0<x<

fX (x) =
0;
elsewhere

(6.111)

Determine the pdf for the random variable Y obtained via the transformation
Y =

1 X/
e

(6.112)

Compare this result to the one obtained in Example 6.2 in the text.
6.4 Given a random variable, X, with the following pdf:
 1
(x + 1); 1 < x < 1
2
fX (x) =
0;
elsewhere

(6.113)

Random Variable Transformations

191

(i) Determine the pdf for the random variable Y obtained via the transformation
Y = X2

(6.114)

(ii) Determine E(X) and E(Y ).


6.5 Given the pdf for two stochastically independent random variables X1 and X2
as
ei xi i
(6.115)
f (xi ) =
; xi = 0, 1, 2, . . .
xi !
for i = 1, 2, and given the corresponding characteristic function as:
jt

Xi (t) = e[i (e

1)]

(6.116)

(i) Obtain the pdf fY (y) of the random variable Y dened as the sum of these two
random variables, i.e.,
Y = X1 + X2
(ii) Extend the result to a sum of n such random variables, i.e.,
Y = X1 + X2 + + Xn
with each distribution given in Eq (6.115). Hence, establish that the random variable
X also possesses the reproductive property illustrated in Example 6.6 in the text.
(iii) Obtain the pdf fZ (z) of the random variable Z dened as the average of n such
random variables, i.e.,
1
Z = (X1 + X2 + + Xn )
n

6.6 In Example 6.3 in the text, it was established that if the random variable X has
the following pdf:
2
1
(6.117)
fX (x) = ex /2 ; < x <
2
then the pdf for the random variable Y = X 2 is:
1
fY (y) = ey/2 y 1/2 ; 0 < y <
2

(6.118)

Given that the characteristic function of this random variable Y is:


Y (t) =

1
(1 j2t)1/2

(6.119)

by re-writing 2 as 21/2 , and as (1/2) (or otherwise), obtain the pdf fZ (z) of
the random variable dened as:
Z = X12 + X22 + Xr2

(6.120)

where the random variables, Xi , are all mutually stochastically independent, and
each has the distribution shown in Eq (6.117).

192

Random Phenomena

6.7 Revisit Example 6.8 in the text, but this time, instead of Eq (6.91), use the
following alternative squaring transformation,
Y2 = X2

(6.121)

You should obtain the same result.


6.8 Revisit Example 6.9 in the text, but this time, instead of Eq (6.100), use the
following alternative squaring transformation,
Y2 = X1

(6.122)

Which augmenting squaring transformation leads to an easier problemthis one, or


the one in Eq (6.100) used in Example 6.8?
6.9 Revisit Eq (6.104), this time, replace |y2 | with y2 , and integrate the resulting
joint pdf fY (y1 , y2 ) with respect to y2 over the entire range < y2 < . Compare
your result with Eq (6.106) and comment on the importance of making sure to use
the absolute value of the Jacobian of the inverse transformation in deriving deriving
pdfs of continuous transformed variables.

APPLICATION PROBLEMS
6.10 In a commercial process for manufacturing the extruded polymer lm Mylar ,
each roll of the product is characterized in terms of its gage, the lm thickness,
X. For a series of rolls that meet the desired mean thickness target of 350 m, the
thickness of a section of lm sampled randomly from a particular roll has the pdf

(x 350)2
1
(6.123)
f (x) = exp
2i2
i 2
where i2 is the variance associated with the average thickness for each roll, i. In
reality, the product property that is of importance to the end-user is not so much
the lm thickness, or even the average lm thickness, but a roll-to-roll consistency,
quantied in terms of a relative thickness variability measure dened as
2

X 350
(6.124)
Y =
i
Obtain the pdf fY (y) that is used to characterize the roll-to-roll variability observed
in this product quality variable.
6.11 Consider an experimental, electronically controlled, mechanical tennis ball
launcher designed to be used to train tennis players. One such machine is positioned at a xed launch point, L, located a distance of 1 m from a wall as shown in
Fig 6.2. The launch mechanism is programmed to launch the ball in an essentially
straight line, at an angle that varies randomly according to the pdf:

c; 2 < < 2
(6.125)
f () =
0; elsewhere
where c is a constant. The point of impact on the wall, at a distance y from the

Random Variable Transformations

193

y
1

FIGURE 6.2: Schematic diagram of the tennis ball launcher of Problem 6.11
center, will therefore be a random variable whose specic value depends on . First
show that c = , and then obtain fY (y).
6.12 The distribution of residence times in a single continuous stirred tank reactor
(CSTR), whose volume is V liters and through which reactants ow at rate F
liters/hr, was established in Chapter 2 as the pdf:
f (x) =

1 x/
;0 < x <
e

(6.126)

where = V /F .
(i) Find the pdf fY (y) of the residence time, Y , in a reactor that is 5 times as large,
given that in this case,
Y = 5X
(6.127)
(ii) Find the pdf fZ (z) of the residence time, Z, in an ensemble of 5 reactors in
series, given that:
(6.128)
Z = X1 + X2 + + X5
where each reactors pdf is as given in Eq (6.126), with parameter, i ; i = 1, 2, . . . , 5.
(Hint: Use results of Examples 6.5 and 6.6).
(iii) Show that even if 1 = 2 = = 5 = for the ensemble of 5 reactors in series,
fZ (z) will still not be the same as fY (y).
6.13 The total number of aws (dents, scratches, paint blisters, etc) found on the
various sets of doors installed on brand new minivans in an assembly plant is a
random variable with the pdf:
f (x) =

e x
; x = 0, 1, 2, . . .
x!

(6.129)

The value of the pdf parameter, , depends on the door in question as follows:
= 0.5 for the driver and front passenger doors; = 0.75 for the two bigger midsection passenger doors, and = 1.0 for the fth, rear trunk/tailgate door. If the
total number of aws per completely assembled minivan is Y , obtain the pdf fY (y)
and from it, compute the probability of assembling a minivan with more than a total
number of 2 aws on all its doors.
6.14 Let the uorescence signals obtained from a test spot and the reference spot

194

Random Phenomena

on a microarray be represented as random variables X1 and X2 respectively. Within


reason, these variables can be assumed to be independent, with the following pdfs:
f1 (x1 ) =

1
x1 ex1 ; 0 < x1 <
() 1

(6.130)

f2 (x2 ) =

1
x1 ex2 ; 0 < x2 <
() 1

(6.131)

It is customary to analyze such microarray data in terms of the fold change ratio,
Y =

X1
X2

(6.132)

indicative of the fold increase (or decrease) in the signal intensity between test
and reference conditions. Show that the pdf of Y is given by:
f (y) =

( + )
y 1
; y > 0; > 0; > 0
()() (1 + y)+

(6.133)

6.15 The following expression is used to calibrate a thermocouple whose natural


output is V volts; X is the corresponding temperature, in degrees Celsius.
X = 0.4V + 100

(6.134)

in a range from 50 to 500 volts and 100 to 250 C. If the voltage output is subject
to random variability around the true value V , such that

1
(v V )2
exp
(6.135)
f (v) =
2V2
V 2
where the mean (i.e., expected) value for Voltage, E(V ) = V and the variance,
V ar(V ) = V2 , (i) Show that:
E(X)
V ar(X)

0.4V + 100

(6.136)

0.16V2

(6.137)

2
(ii) In terms of E(X) = X and V ar(x) = X
, obtain an expression for the pdf
fX (x) representing the variability propagated to the temperature values.

6.16 Propagation-of-errors studies are concerned with determining how the errors from one variable are transmitted to another when the two variables are related
according to a known expression. When the relationships are linear, it is often possible to obtain complete probability distribution functions for the dependent variable
given the pdf for the independent variable (see Problem 6.15). When the relationships are nonlinear, closed form expressions are not always possible; in terms of
general results, the best one can hope for are approximate expressions for the expected value and variance of the dependent variable, typically in a local region,
upon linearizing the nonlinear expression. The following is an application of these
principles.
One of the best known laws of bioenergetics, Kleibers law, states that the Resting Energy Expenditure of an animal, Q0 , (essentially the animals metabolic rate,

Random Variable Transformations

195

in kcal/day), is proportional to M 3/4 , where M is the animals mass (in kg). Specifically for mature homeotherms, the expression is:
Q0 = 70M 3/4

(6.138)

Consider a particular population of homeotherms for which the variability in mass


is characterized by the random variable M with the distribution:

(m M )2
1
exp
(6.139)
f (m) =
2
2M
M 2
2
with a mean value, M , and variance M
. The pdf representing the corresponding
variation in Q0 can be obtained using the usual transformation techniques, but the
result does not have a convenient, recognizable, closed form. However, it is possible
to obtain approximate values for E(Q0 ) and V ar(Q0 ) in the neighborhood around
the mean mass, M , and the corresponding metabolic rate, Q0 .
Given that a rst-order (linear) Taylor series approximation of the expression in
Eq (6.138) is dened as:

Q0 
(M M )
(6.140)
Q0 Q0 +
M M =M

rst obtain the approximate linearized expression for Q0 when M = 75 kg, and
then determine E(Q0 ) and V ar(Q0 ) for a population with M = 12.5 kg under
these conditions.

196

Random Phenomena

Chapter 7
Application Case Studies I:
Probability

7.1
7.2

7.3

7.4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mendel and Heredity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Background and Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Single Trait Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.3 Single trait analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The First Generation Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Probability and The Second Generation Traits . . . . . . . . . . . . . . . .
7.2.4 Multiple Traits and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pairwise Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.5 Subsequent Experiments and Conclusions . . . . . . . . . . . . . . . . . . . . . . . .
World War II Warship Tactical Response Under Attack . . . . . . . . . . . . . . . . .
7.3.1 Background and Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2 Approach and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

198
199
199
200
201
203
204
205
205
208
209
209
209
212
212

But to us, probability is the very guide of life.


Bishop Butler (16921752)

To many scientists and engineers, a rst encounter with the theory of probability in its modern axiomatic form often leaves the impression of a subject
matter so abstract and esoteric in nature as to be entirely suited to nothing
but the most contrived applications. Nothing could be further from the truth.
In reality, the application of probability theory features prominently in many
modern elds of study: from nance, economics, sociology and psychology to
various branches of physics, chemistry, biology and engineering, providing a
perfect illustration of the aphorism that there is nothing so practical as a
good theory.
This chapter showcases the applicability of probability theory through
two specic case studies involving real-world problems whose practical importance can hardly be overstated. The rst, Mendels deduction of the laws
of hereditythe basis for the modern science of geneticsshows how Mendel
employed probability (and the concept of stochastic independence) to establish the principles underlying a phenomenon which, until then, was considered
essentially unpredictable and hence not susceptible to systematic analysis.
The second is from a now-declassied US Navy study during World War
II and involves decision-making in the face of uncertainty, using past data. It
197

198

Random Phenomena

illustrates the application of frequency-of-occurrence information, viewed as


approximate total and conditional probabilities, to solve an important tactical
military problem.

7.1

Introduction

The elegant, well-established and fruitful tree we now see as modern probability theory has roots that reach back to 16th and 17th century gamblers
and the very realand very practicalneed for reliable solutions to numerous
gambling problems. Referring to these gambling problems by the somewhat
less morally questionable term problems on games of chance, some of the
most famous and most gifted mathematicians of the day devoted considerable
energy rst to solving specic problems (most notably the Italian mathematician, Cardano, in the 16th century), and later to developing the foundational
basis for systematic mathematical analysis (most notably the Dutch scientist,
Huygens, and the French mathematicians, Pascal and Fermat, in the 17th
century). However, despite subsequent major contributions in the 18th century from the likes of Jakob Bernoulli (1654-1705) and Abraham de Moivre
(1667-1754), it was not until the 19th ce