Probability and Statistics (4th Edition)

Probability and Statistics Fourth Edition This page intentionally left blank Probability and Statistics Fourth Editi

12,197 3,661 5MB

Pages 911 Page size 252 x 321.48 pts Year 2011

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

Probability and Statistics (2nd Edition)

670 275 8MB Read more

Probability and Statistics

972 150 7MB Read more

Introduction to probability and statistics

4,832 195 8MB Read more

Probability theory and mathematical statistics

2,868 1,882 5MB Read more

An Introduction to Probability and Statistics (Wiley Series in Probability and Statistics)

1,694 109 28MB Read more

Probability and Statistics for Engineers and Scientists

This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may

3,036 237 35MB Read more

ectures on Probability Theory and Statistics

Lecture Notes in Mathematics Editors: J.--M. Morel, Cachan F. Takens, Groningen B. Teissier, Paris 1837 3 Berlin Heid

903 39 2MB Read more

Schaum's Outline of Business Statistics, 4th Edition

4,624 975 3MB Read more

Lectures in Probability Theory and Statistics 1995

Lecture Notes in Mathematics Editors: A. Dold, Heidelberg E Takens, Groningen B. Teissier, Paris 1690 Springer Berli

1,444 607 9MB Read more

Schaum's Outline of Probability and Statistics

A-PDF MERGER DEMO

6,958 812 23MB Read more

File loading please wait...

Citation preview

Probability and Statistics Fourth Edition

This page intentionally left blank

Probability and Statistics Fourth Edition

Morris H. DeGroot Carnegie Mellon University

Mark J. Schervish Carnegie Mellon University

Addison-Wesley Boston Columbus Indianapolis New York San Francisco Upper Saddle River ´ Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto ˜ Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Delhi Mexico City Sao

Editor in Chief: Deirdre Lynch Acquisitions Editor: Christopher Cummings Associate Content Editors: Leah Goldberg, Dana Jones Bettez Associate Editor: Christina Lepre Senior Managing Editor: Karen Wernholm Production Project Manager: Patty Bergin Cover Designer: Heather Scott Design Manager: Andrea Nix Senior Marketing Manager: Alex Gay Marketing Assistant: Kathleen DeChavez Senior Author Support/Technology Specialist: Joe Vetere Rights and Permissions Advisor: Michael Joyce Manufacturing Manager: Carol Melville Project Management, Composition: Windfall Software, using ZzTEX Cover Photo: Shutterstock/© Marilyn Volan The programs and applications presented in this book have been included for their instructional value. They have been tested with care, but are not guaranteed for any particular purpose. The publisher does not offer any warranties or representations, nor does it accept any liabilities with respect to the programs or applications. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Pearson Education was aware of a trademark claim, the designations have been printed in initial caps or all caps. Library of Congress Cataloging-in-Publication Data DeGroot, Morris H., 1931–1989. Probability and statistics / Morris H. DeGroot, Mark J. Schervish.—4th ed. p. cm. ISBN 978-0-321-50046-5 1. Probabilities—Textbooks. 2. Mathematical statistics—Textbooks. I. Schervish, Mark J. II. Title. QA273.D35 2012 519.2—dc22 2010001486 Copyright © 2012, 2002 Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. For information on obtaining permission for use of material in this work, please submit a written request to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300, Boston, MA 02116, fax your request to 617-848-7047, or e-mail at http://www.pearsoned.com/legal/permissions.htm. 1 2 3 4 5 6 7 8 9 10—EB—14 13 12 11 10

ISBN 10: 0-321-50046-6 www.pearsonhighered.com

ISBN 13: 978-0-321-50046-5

To the memory of Morrie DeGroot. MJS

This page intentionally left blank

Contents

Preface

1

xi

Introduction to Probability 1.1

The History of Probability

1.2

Interpretations of Probability

1.3

Experiments and Events

1.4

Set Theory

1.5

The Deﬁnition of Probability

1.6

Finite Sample Spaces

1.7

Counting Methods

1.8

Combinatorial Methods

1.9

Multinomial Coefﬁcients

1

1 2 5

6 16

22 25 32 42

1.10 The Probability of a Union of Events 1.11 Statistical Swindles

51

1.12 Supplementary Exercises

2

53

Conditional Probability

55

2.1

The Deﬁnition of Conditional Probability

2.2

Independent Events

2.3

Bayes’ Theorem

2.4 2.5

3

46

55

66 76

The Gambler’s Ruin Problem Supplementary Exercises

86 90

Random Variables and Distributions 3.1

Random Variables and Discrete Distributions

3.2

Continuous Distributions

3.3

The Cumulative Distribution Function

3.4

Bivariate Distributions

118

3.5

Marginal Distributions

130

3.6

Conditional Distributions

141

3.7

Multivariate Distributions

152

3.8

Functions of a Random Variable

3.9

Functions of Two or More Random Variables

3.10 Markov Chains

100

188

3.11 Supplementary Exercises vii

93

202

107

167 175

93

viii

Contents

4

Expectation 4.1

The Expectation of a Random Variable

4.2

Properties of Expectations

4.3

Variance

4.4

Moments

4.5

The Mean and the Median

241

4.6

Covariance and Correlation

248

4.7

Conditional Expectation

4.8 4.9

5

207

Utility

217

225 234

256

265

Supplementary Exercises

272

Special Distributions

275

5.1

Introduction

275

5.2

The Bernoulli and Binomial Distributions

5.3

The Hypergeometric Distributions

5.4

The Poisson Distributions

5.5

The Negative Binomial Distributions

5.6

The Normal Distributions

302

5.7

The Gamma Distributions

316

5.8

The Beta Distributions

5.9

The Multinomial Distributions

287 297

327

5.11 Supplementary Exercises

7

333 337

345

Large Random Samples

347

6.1

Introduction

6.2

The Law of Large Numbers

348

6.3

The Central Limit Theorem

360

6.4

The Correction for Continuity

6.5

Supplementary Exercises

Estimation

275

281

5.10 The Bivariate Normal Distributions

6

207

347

371 375

376

7.1

Statistical Inference

376

7.2

Prior and Posterior Distributions

7.3

Conjugate Prior Distributions

7.4

Bayes Estimators

408

385 394

Contents

7.5

Maximum Likelihood Estimators

7.6

Properties of Maximum Likelihood Estimators

7.7

Sufﬁcient Statistics

7.8

Jointly Sufﬁcient Statistics

7.9

Improving an Estimator

449 455 461

Sampling Distributions of Estimators The Sampling Distribution of a Statistic

8.2

The Chi-Square Distributions

8.3

Joint Distribution of the Sample Mean and Sample Variance

8.4

The t Distributions

8.5

Conﬁdence Intervals

8.7 8.8 8.9

464

469

485

Bayesian Analysis of Samples from a Normal Distribution Unbiased Estimators Fisher Information

473

480 495

506 514

Supplementary Exercises

528

Testing Hypotheses 9.1

530

Problems of Testing Hypotheses

9.2

Testing Simple Hypotheses

9.3

Uniformly Most Powerful Tests

9.4

Two-Sided Alternatives

530

550 559

567

9.5

The t Test

576

9.6

Comparing the Means of Two Normal Distributions

9.7

The F Distributions

9.8

Bayes Test Procedures

9.9

Foundational Issues

587

597 605 617

9.10 Supplementary Exercises

10

464

8.1

8.6

9

426

443

7.10 Supplementary Exercises

8

417

621

Categorical Data and Nonparametric Methods 10.1 Tests of Goodness-of-Fit

624

10.2 Goodness-of-Fit for Composite Hypotheses 10.3 Contingency Tables

641

10.4 Tests of Homogeneity 10.5 Simpson’s Paradox

647 653

10.6 Kolmogorov-Smirnov Tests

657

633

624

ix

x

Contents

10.7 Robust Estimation 10.8 Sign and Rank Tests

666 678

10.9 Supplementary Exercises

11

686

Linear Statistical Models 11.1 The Method of Least Squares 11.2 Regression

689

689

698

11.3 Statistical Inference in Simple Linear Regression 11.4 Bayesian Inference in Simple Linear Regression 11.5 The General Linear Model and Multiple Regression 11.6 Analysis of Variance 754 11.7 The Two-Way Layout 763 11.8 The Two-Way Layout with Replications 11.9 Supplementary Exercises

12

Simulation

783

787

12.1 What Is Simulation?

787

12.2 Why Is Simulation Useful?

791

12.3 Simulating Speciﬁc Distributions

804

12.4 Importance Sampling 816 12.5 Markov Chain Monte Carlo 823 12.6 The Bootstrap

839

12.7 Supplementary Exercises

Tables

850

853

Answers to Odd-Numbered Exercises References Index

879 885

865

772

707 729 736

Preface

Changes to the Fourth Edition .

.

.

.

.

.

.

.

I have reorganized many main results that were included in the body of the text by labeling them as theorems in order to facilitate students in ﬁnding and referencing these results. I have pulled the important deﬁntions and assumptions out of the body of the text and labeled them as such so that they stand out better. When a new topic is introduced, I introduce it with a motivating example before delving into the mathematical formalities. Then I return to the example to illustrate the newly introduced material. I moved the material on the law of large numbers and the central limit theorem to a new Chapter 6. It seemed more natural to deal with the main large-sample results together. I moved the section on Markov chains into Chapter 3. Every time I cover this material with my own students, I stumble over not being able to refer to random variables, distributions, and conditional distributions. I have actually postponed this material until after introducing distributions, and then gone back to cover Markov chains. I feel that the time has come to place it in a more natural location. I also added some material on stationary distributions of Markov chains. I have moved the lengthy proofs of several theorems to the ends of their respective sections in order to improve the ﬂow of the presentation of ideas. I rewrote Section 7.1 to make the introduction to inference clearer. I rewrote Section 9.1 as a more complete introduction to hypothesis testing, including likelihood ratio tests. For instructors not interested in the more mathematical theory of hypothesis testing, it should now be easier to skip from Section 9.1 directly to Section 9.5. Some other changes that readers will notice:

.

.

.

.

.

.

.

I have replaced the notation in which the intersection of two sets A and B had been represented AB with the more popular A ∩ B. The old notation, although mathematically sound, seemed a bit arcane for a text at this level. I added the statements of Stirling’s formula and Jensen’s inequality. I moved the law of total probability and the discussion of partitions of a sample space from Section 2.3 to Section 2.1. I deﬁne the cumulative distribution function (c.d.f.) as the prefered name of what used to be called only the distribution function (d.f.). I added some discussion of histograms in Chapters 3 and 6. I rearranged the topics in Sections 3.8 and 3.9 so that simple functions of random variables appear ﬁrst and the general formulations appear at the end to make it easier for instructors who want to avoid some of the more mathematically challenging parts. I emphasized the closeness of a hypergeometric distribution with a large number of available items to a binomial distribution. xi

xii

Preface .

.

.

.

.

.

.

I gave a brief introduction to Chernoff bounds. These are becoming increasingly important in computer science, and their derivation requires only material that is already in the text. I changed the deﬁnition of conﬁdence interval to refer to the random interval rather than the observed interval. This makes statements less cumbersome, and it corresponds to more modern usage. I added a brief discussion of the method of moments in Section 7.6. I added brief introductions to Newton’s method and the EM algorithm in Chapter 7. I introduced the concept of pivotal quantity to facilitate construction of conﬁdence intervals in general. I added the statement of the large-sample distribution of the likelihood ratio test statistic. I then used this as an alternative way to test the null hypothesis that two normal means are equal when it is not assumed that the variances are equal. I moved the Bonferroni inequality into the main text (Chapter 1) and later (Chapter 11) used it as a way to construct simultaneous tests and conﬁdence intervals.

How to Use This Book The text is somewhat long for complete coverage in a one-year course at the undergraduate level and is designed so that instructors can make choices about which topics are most important to cover and which can be left for more in-depth study. As an example, many instructors wish to deemphasize the classical counting arguments that are detailed in Sections 1.7–1.9. An instructor who only wants enough information to be able to cover the binomial and/or multinomial distributions can safely discuss only the deﬁnitions and theorems on permutations, combinations, and possibly multinomial coefﬁcients. Just make sure that the students realize what these values count, otherwise the associated distributions will make no sense. The various examples in these sections are helpful, but not necessary, for understanding the important distributions. Another example is Section 3.9 on functions of two or more random variables. The use of Jacobians for general multivariate transformations might be more mathematics than the instructors of some undergraduate courses are willing to cover. The entire section could be skipped without causing problems later in the course, but some of the more straightforward cases early in the section (such as convolution) might be worth introducing. The material in Sections 9.2–9.4 on optimal tests in one-parameter families is pretty mathematics, but it is of interest primarily to graduate students who require a very deep understanding of hypothesis testing theory. The rest of Chapter 9 covers everything that an undergraduate course really needs. In addition to the text, the publisher has an Instructor’s Solutions Manual, available for download from the Instructor Resource Center at www.pearsonhighered .com/irc, which includes some speciﬁc advice about many of the sections of the text. I have taught a year-long probability and statistics sequence from earlier editions of this text for a group of mathematically well-trained juniors and seniors. In the ﬁrst semester, I covered what was in the earlier edition but is now in the ﬁrst ﬁve chapters (including the material on Markov chains) and parts of Chapter 6. In the second semester, I covered the rest of the new Chapter 6, Chapters 7–9, Sections 11.1–11.5, and Chapter 12. I have also taught a one-semester probability and random processes

Preface

xiii

course for engineers and computer scientists. I covered what was in the old edition and is now in Chapters 1–6 and 12, including Markov chains, but not Jacobians. This latter course did not emphasize mathematical derivation to the same extent as the course for mathematics students. A number of sections are designated with an asterisk (*). This indicates that later sections do not rely materially on the material in that section. This designation is not intended to suggest that instructors skip these sections. Skipping one of these sections will not cause the students to miss deﬁnitions or results that they will need later. The sections are 2.4, 3.10, 4.8, 7.7, 7.8, 7.9, 8.6, 8.8, 9.2, 9.3, 9.4, 9.8, 9.9, 10.6, 10.7, 10.8, 11.4, 11.7, 11.8, and 12.5. Aside from cross-references between sections within this list, occasional material from elsewhere in the text does refer back to some of the sections in this list. Each of the dependencies is quite minor, however. Most of the dependencies involve references from Chapter 12 back to one of the optional sections. The reason for this is that the optional sections address some of the more difﬁcult material, and simulation is most useful for solving those difﬁcult problems that cannot be solved analytically. Except for passing references that help put material into context, the dependencies are as follows: .

.

.

The sample distribution function (Section 10.6) is reintroduced during the discussion of the bootstrap in Section 12.6. The sample distribution function is also a useful tool for displaying simulation results. It could be introduced as early as Example 12.3.7 simply by covering the ﬁrst subsection of Section 10.6. The material on robust estimation (Section 10.7) is revisited in some simulation exercises in Section 12.2 (Exercises 4, 5, 7, and 8). Example 12.3.4 makes reference to the material on two-way analysis of variance (Sections 11.7 and 11.8).

Supplements The text is accompanied by the following supplementary material: .

.

Instructor’s Solutions Manual contains fully worked solutions to all exercises in the text. Available for download from the Instructor Resource Center at www.pearsonhighered.com/irc. Student Solutions Manual contains fully worked solutions to all odd exercises in the text. Available for purchase from MyPearsonStore at www.mypearsonstore .com. (ISBN-13: 978-0-321-71598-2; ISBN-10: 0-321-71598-5)

Acknowledgments There are many people that I want to thank for their help and encouragement during this revision. First and foremost, I want to thank Marilyn DeGroot and Morrie’s children for giving me the chance to revise Morrie’s masterpiece. I am indebted to the many readers, reviewers, colleagues, staff, and people at Addison-Wesley whose help and comments have strengthened this edition. The reviewers were: Andre Adler, Illinois Institute of Technology; E. N. Barron, Loyola University; Brian Blank, Washington University in St. Louis; Indranil Chakraborty, University of Oklahoma; Daniel Chambers, Boston College; Rita Chattopadhyay, Eastern Michigan University; Stephen A. Chiappari, Santa Clara University; Sheng-Kai Chang, Wayne State University; Justin Corvino, Lafayette College; Michael Evans, University of

xiv

Preface

Toronto; Doug Frank, Indiana University of Pennsylvania; Anda Gadidov, Kennesaw State University; Lyn Geisler, Randolph–Macon College; Prem Goel, Ohio State University; Susan Herring, Sonoma State University; Pawel Hitczenko, Drexel University; Lifang Hsu, Le Moyne College; Wei-Min Huang, Lehigh University; Syed Kirmani, University of Northern Iowa; Michael Lavine, Duke University; Rich Levine, San Diego State University; John Liukkonen, Tulane University; Sergio Loch, Grand View College; Rosa Matzkin, Northwestern University; Terry McConnell, Syracuse University; Hans-Georg Mueller, University of California–Davis; Robert Myers, Bethel College; Mario Peruggia, The Ohio State University; Stefan Ralescu, Queens University; Krishnamurthi Ravishankar, SUNY New Paltz; Diane Saphire, Trinity University; Steven Sepanski, Saginaw Valley State University; HenSiong Tan, Pennsylvania University; Kanapathi Thiru, University of Alaska; Kenneth Troske, Johns Hopkins University; John Van Ness, University of Texas at Dallas; Yehuda Vardi, Rutgers University; Yelena Vaynberg, Wayne State University; Joseph Verducci, Ohio State University; Mahbobeh Vezveai, Kent State University; Brani Vidakovic, Duke University; Karin Vorwerk, Westﬁeld State College; Bette Warren, Eastern Michigan University; Calvin L. Williams, Clemson University; Lori Wolff, University of Mississippi. The person who checked the accuracy of the book was Anda Gadidov, Kennesaw State University. I would also like to thank my colleagues at Carnegie Mellon University, especially Anthony Brockwell, Joel Greenhouse, John Lehoczky, Heidi Sestrich, and Valerie Ventura. The people at Addison-Wesley and other organizations that helped produce the book were Paul Anagnostopoulos, Patty Bergin, Dana Jones Bettez, Chris Cummings, Kathleen DeChavez, Alex Gay, Leah Goldberg, Karen Hartpence, and Christina Lepre. If I left anyone out, it was unintentional, and I apologize. Errors inevitably arise in any project like this (meaning a project in which I am involved). For this reason, I shall post information about the book, including a list of corrections, on my Web page, http://www.stat.cmu.edu/~mark/, as soon as the book is published. Readers are encouraged to send me any errors that they discover. Mark J. Schervish October 2010

Chapter

Introduction to Probability 1.1 1.2 1.3 1.4 1.5 1.6

1

The History of Probability Interpretations of Probability Experiments and Events Set Theory The Deﬁnition of Probability Finite Sample Spaces

1.7 1.8 1.9 1.10 1.11 1.12

Counting Methods Combinatorial Methods Multinomial Coefﬁcients The Probability of a Union of Events Statistical Swindles Supplementary Exercises

1.1 The History of Probability The use of probability to measure uncertainty and variability dates back hundreds of years. Probability has found application in areas as diverse as medicine, gambling, weather forecasting, and the law. The concepts of chance and uncertainty are as old as civilization itself. People have always had to cope with uncertainty about the weather, their food supply, and other aspects of their environment, and have striven to reduce this uncertainty and its effects. Even the idea of gambling has a long history. By about the year 3500 b.c., games of chance played with bone objects that could be considered precursors of dice were apparently highly developed in Egypt and elsewhere. Cubical dice with markings virtually identical to those on modern dice have been found in Egyptian tombs dating from 2000 b.c. We know that gambling with dice has been popular ever since that time and played an important part in the early development of probability theory. It is generally believed that the mathematical theory of probability was started by the French mathematicians Blaise Pascal (1623–1662) and Pierre Fermat (1601–1665) when they succeeded in deriving exact probabilities for certain gambling problems involving dice. Some of the problems that they solved had been outstanding for about 300 years. However, numerical probabilities of various dice combinations had been calculated previously by Girolamo Cardano (1501–1576) and Galileo Galilei (1564– 1642). The theory of probability has been developed steadily since the seventeenth century and has been widely applied in diverse ﬁelds of study. Today, probability theory is an important tool in most areas of engineering, science, and management. Many research workers are actively engaged in the discovery and establishment of new applications of probability in ﬁelds such as medicine, meteorology, photography from satellites, marketing, earthquake prediction, human behavior, the design of computer systems, ﬁnance, genetics, and law. In many legal proceedings involving antitrust violations or employment discrimination, both sides will present probability and statistical calculations to help support their cases. 1

2

Chapter 1 Introduction to Probability

References The ancient history of gambling and the origins of the mathematical theory of probability are discussed by David (1988), Ore (1960), Stigler (1986), and Todhunter (1865). Some introductory books on probability theory, which discuss many of the same topics that will be studied in this book, are Feller (1968); Hoel, Port, and Stone (1971); Meyer (1970); and Olkin, Gleser, and Derman (1980). Other introductory books, which discuss both probability theory and statistics at about the same level as they will be discussed in this book, are Brunk (1975); Devore (1999); Fraser (1976); Hogg and Tanis (1997); Kempthorne and Folks (1971); Larsen and Marx (2001); Larson (1974); Lindgren (1976); Miller and Miller (1999); Mood, Graybill, and Boes (1974); Rice (1995); and Wackerly, Mendenhall, and Schaeffer (2008).

1.2 Interpretations of Probability This section describes three common operational interpretations of probability. Although the interpretations may seem incompatible, it is fortunate that the calculus of probability (the subject matter of the ﬁrst six chapters of this book) applies equally well no matter which interpretation one prefers. In addition to the many formal applications of probability theory, the concept of probability enters our everyday life and conversation. We often hear and use such expressions as “It probably will rain tomorrow afternoon,” “It is very likely that the plane will arrive late,” or “The chances are good that he will be able to join us for dinner this evening.” Each of these expressions is based on the concept of the probability, or the likelihood, that some speciﬁc event will occur. Despite the fact that the concept of probability is such a common and natural part of our experience, no single scientiﬁc interpretation of the term probability is accepted by all statisticians, philosophers, and other authorities. Through the years, each interpretation of probability that has been proposed by some authorities has been criticized by others. Indeed, the true meaning of probability is still a highly controversial subject and is involved in many current philosophical discussions pertaining to the foundations of statistics. Three different interpretations of probability will be described here. Each of these interpretations can be very useful in applying probability theory to practical problems.

The Frequency Interpretation of Probability In many problems, the probability that some speciﬁc outcome of a process will be obtained can be interpreted to mean the relative frequency with which that outcome would be obtained if the process were repeated a large number of times under similar conditions. For example, the probability of obtaining a head when a coin is tossed is considered to be 1/2 because the relative frequency of heads should be approximately 1/2 when the coin is tossed a large number of times under similar conditions. In other words, it is assumed that the proportion of tosses on which a head is obtained would be approximately 1/2. Of course, the conditions mentioned in this example are too vague to serve as the basis for a scientiﬁc deﬁnition of probability. First, a “large number” of tosses of the coin is speciﬁed, but there is no deﬁnite indication of an actual number that would

1.2 Interpretations of Probability

3

be considered large enough. Second, it is stated that the coin should be tossed each time “under similar conditions,” but these conditions are not described precisely. The conditions under which the coin is tossed must not be completely identical for each toss because the outcomes would then be the same, and there would be either all heads or all tails. In fact, a skilled person can toss a coin into the air repeatedly and catch it in such a way that a head is obtained on almost every toss. Hence, the tosses must not be completely controlled but must have some “random” features. Furthermore, it is stated that the relative frequency of heads should be “approximately 1/2,” but no limit is speciﬁed for the permissible variation from 1/2. If a coin were tossed 1,000,000 times, we would not expect to obtain exactly 500,000 heads. Indeed, we would be extremely surprised if we obtained exactly 500,000 heads. On the other hand, neither would we expect the number of heads to be very far from 500,000. It would be desirable to be able to make a precise statement of the likelihoods of the different possible numbers of heads, but these likelihoods would of necessity depend on the very concept of probability that we are trying to deﬁne. Another shortcoming of the frequency interpretation of probability is that it applies only to a problem in which there can be, at least in principle, a large number of similar repetitions of a certain process. Many important problems are not of this type. For example, the frequency interpretation of probability cannot be applied directly to the probability that a speciﬁc acquaintance will get married within the next two years or to the probability that a particular medical research project will lead to the development of a new treatment for a certain disease within a speciﬁed period of time.

The Classical Interpretation of Probability The classical interpretation of probability is based on the concept of equally likely outcomes. For example, when a coin is tossed, there are two possible outcomes: a head or a tail. If it may be assumed that these outcomes are equally likely to occur, then they must have the same probability. Since the sum of the probabilities must be 1, both the probability of a head and the probability of a tail must be 1/2. More generally, if the outcome of some process must be one of n different outcomes, and if these n outcomes are equally likely to occur, then the probability of each outcome is 1/n. Two basic difﬁculties arise when an attempt is made to develop a formal deﬁnition of probability from the classical interpretation. First, the concept of equally likely outcomes is essentially based on the concept of probability that we are trying to deﬁne. The statement that two possible outcomes are equally likely to occur is the same as the statement that two outcomes have the same probability. Second, no systematic method is given for assigning probabilities to outcomes that are not assumed to be equally likely. When a coin is tossed, or a well-balanced die is rolled, or a card is chosen from a well-shufﬂed deck of cards, the different possible outcomes can usually be regarded as equally likely because of the nature of the process. However, when the problem is to guess whether an acquaintance will get married or whether a research project will be successful, the possible outcomes would not typically be considered to be equally likely, and a different method is needed for assigning probabilities to these outcomes.

The Subjective Interpretation of Probability According to the subjective, or personal, interpretation of probability, the probability that a person assigns to a possible outcome of some process represents her own

4

Chapter 1 Introduction to Probability

judgment of the likelihood that the outcome will be obtained. This judgment will be based on each person’s beliefs and information about the process. Another person, who may have different beliefs or different information, may assign a different probability to the same outcome. For this reason, it is appropriate to speak of a certain person’s subjective probability of an outcome, rather than to speak of the true probability of that outcome. As an illustration of this interpretation, suppose that a coin is to be tossed once. A person with no special information about the coin or the way in which it is tossed might regard a head and a tail to be equally likely outcomes. That person would then assign a subjective probability of 1/2 to the possibility of obtaining a head. The person who is actually tossing the coin, however, might feel that a head is much more likely to be obtained than a tail. In order that people in general may be able to assign subjective probabilities to the outcomes, they must express the strength of their belief in numerical terms. Suppose, for example, that they regard the likelihood of obtaining a head to be the same as the likelihood of obtaining a red card when one card is chosen from a well-shufﬂed deck containing four red cards and one black card. Because those people would assign a probability of 4/5 to the possibility of obtaining a red card, they should also assign a probability of 4/5 to the possibility of obtaining a head when the coin is tossed. This subjective interpretation of probability can be formalized. In general, if people’s judgments of the relative likelihoods of various combinations of outcomes satisfy certain conditions of consistency, then it can be shown that their subjective probabilities of the different possible events can be uniquely determined. However, there are two difﬁculties with the subjective interpretation. First, the requirement that a person’s judgments of the relative likelihoods of an inﬁnite number of events be completely consistent and free from contradictions does not seem to be humanly attainable, unless a person is simply willing to adopt a collection of judgments known to be consistent. Second, the subjective interpretation provides no “objective” basis for two or more scientists working together to reach a common evaluation of the state of knowledge in some scientiﬁc area of common interest. On the other hand, recognition of the subjective interpretation of probability has the salutary effect of emphasizing some of the subjective aspects of science. A particular scientist’s evaluation of the probability of some uncertain outcome must ultimately be that person’s own evaluation based on all the evidence available. This evaluation may well be based in part on the frequency interpretation of probability, since the scientist may take into account the relative frequency of occurrence of this outcome or similar outcomes in the past. It may also be based in part on the classical interpretation of probability, since the scientist may take into account the total number of possible outcomes that are considered equally likely to occur. Nevertheless, the ﬁnal assignment of numerical probabilities is the responsibility of the scientist herself. The subjective nature of science is also revealed in the actual problem that a particular scientist chooses to study from the class of problems that might have been chosen, in the experiments that are selected in carrying out this study, and in the conclusions drawn from the experimental data. The mathematical theory of probability and statistics can play an important part in these choices, decisions, and conclusions.

Note: The Theory of Probability Does Not Depend on Interpretation. The mathematical theory of probability is developed and presented in Chapters 1–6 of this book without regard to the controversy surrounding the different interpretations of

1.3 Experiments and Events

5

the term probability. This theory is correct and can be usefully applied, regardless of which interpretation of probability is used in a particular problem. The theories and techniques that will be presented in this book have served as valuable guides and tools in almost all aspects of the design and analysis of effective experimentation.

1.3 Experiments and Events Probability will be the way that we quantify how likely something is to occur (in the sense of one of the interpretations in Sec. 1.2). In this section, we give examples of the types of situations in which probability will be used.

Types of Experiments The theory of probability pertains to the various possible outcomes that might be obtained and the possible events that might occur when an experiment is performed. Deﬁnition 1.3.1

Experiment and Event. An experiment is any process, real or hypothetical, in which the possible outcomes can be identiﬁed ahead of time. An event is a well-deﬁned set of possible outcomes of the experiment. The breadth of this deﬁnition allows us to call almost any imaginable process an experiment whether or not its outcome will ever be known. The probability of each event will be our way of saying how likely it is that the outcome of the experiment is in the event. Not every set of possible outcomes will be called an event. We shall be more speciﬁc about which subsets count as events in Sec. 1.4. Probability will be most useful when applied to a real experiment in which the outcome is not known in advance, but there are many hypothetical experiments that provide useful tools for modeling real experiments. A common type of hypothetical experiment is repeating a well-deﬁned task inﬁnitely often under similar conditions. Some examples of experiments and speciﬁc events are given next. In each example, the words following “the probability that” describe the event of interest. 1. In an experiment in which a coin is to be tossed 10 times, the experimenter might want to determine the probability that at least four heads will be obtained. 2. In an experiment in which a sample of 1000 transistors is to be selected from a large shipment of similar items and each selected item is to be inspected, a person might want to determine the probability that not more than one of the selected transistors will be defective. 3. In an experiment in which the air temperature at a certain location is to be observed every day at noon for 90 successive days, a person might want to determine the probability that the average temperature during this period will be less than some speciﬁed value. 4. From information relating to the life of Thomas Jefferson, a person might want to determine the probability that Jefferson was born in the year 1741. 5. In evaluating an industrial research and development project at a certain time, a person might want to determine the probability that the project will result in the successful development of a new product within a speciﬁed number of months.

6

Chapter 1 Introduction to Probability

The Mathematical Theory of Probability As was explained in Sec. 1.2, there is controversy in regard to the proper meaning and interpretation of some of the probabilities that are assigned to the outcomes of many experiments. However, once probabilities have been assigned to some simple outcomes in an experiment, there is complete agreement among all authorities that the mathematical theory of probability provides the appropriate methodology for the further study of these probabilities. Almost all work in the mathematical theory of probability, from the most elementary textbooks to the most advanced research, has been related to the following two problems: (i) methods for determining the probabilities of certain events from the speciﬁed probabilities of each possible outcome of an experiment and (ii) methods for revising the probabilities of events when additional relevant information is obtained. These methods are based on standard mathematical techniques. The purpose of the ﬁrst six chapters of this book is to present these techniques, which, together, form the mathematical theory of probability.

1.4 Set Theory This section develops the formal mathematical model for events, namely, the theory of sets. Several important concepts are introduced, namely, element, subset, empty set, intersection, union, complement, and disjoint sets.

The Sample Space Deﬁnition 1.4.1

Sample Space. The collection of all possible outcomes of an experiment is called the sample space of the experiment. The sample space of an experiment can be thought of as a set, or collection, of different possible outcomes; and each outcome can be thought of as a point, or an element, in the sample space. Similarly, events can be thought of as subsets of the sample space.

Example 1.4.1

Rolling a Die. When a six-sided die is rolled, the sample space can be regarded as containing the six numbers 1, 2, 3, 4, 5, 6, each representing a possible side of the die that shows after the roll. Symbolically, we write S = {1, 2, 3, 4, 5, 6}. One event A is that an even number is obtained, and it can be represented as the subset A = {2, 4, 6}. The event B that a number greater than 2 is obtained is deﬁned by the subset B = {3, 4, 5, 6}. Because we can interpret outcomes as elements of a set and events as subsets of a set, the language and concepts of set theory provide a natural context for the development of probability theory. The basic ideas and notation of set theory will now be reviewed.

1.4 Set Theory

7

Relations of Set Theory Let S denote the sample space of some experiment. Then each possible outcome s of the experiment is said to be a member of the space S, or to belong to the space S. The statement that s is a member of S is denoted symbolically by the relation s ∈ S. When an experiment has been performed and we say that some event E has occurred, we mean two equivalent things. One is that the outcome of the experiment satisﬁed the conditions that speciﬁed that event E. The other is that the outcome, considered as a point in the sample space, is an element of E. To be precise, we should say which sets of outcomes correspond to events as deﬁned above. In many applications, such as Example 1.4.1, it will be clear which sets of outcomes should correspond to events. In other applications (such as Example 1.4.5 coming up later), there are too many sets available to have them all be events. Ideally, we would like to have the largest possible collection of sets called events so that we have the broadest possible applicability of our probability calculations. However, when the sample space is too large (as in Example 1.4.5) the theory of probability simply will not extend to the collection of all subsets of the sample space. We would prefer not to dwell on this point for two reasons. First, a careful handling requires mathematical details that interfere with an initial understanding of the important concepts, and second, the practical implications for the results in this text are minimal. In order to be mathematically correct without imposing an undue burden on the reader, we note the following. In order to be able to do all of the probability calculations that we might ﬁnd interesting, there are three simple conditions that must be met by the collection of sets that we call events. In every problem that we see in this text, there exists a collection of sets that includes all the sets that we will need to discuss and that satisﬁes the three conditions, and the reader should assume that such a collection has been chosen as the events. For a sample space S with only ﬁnitely many outcomes, the collection of all subsets of S satisﬁes the conditions, as the reader can show in Exercise 12 in this section. The ﬁrst of the three conditions can be stated immediately. Condition 1

The sample space S must be an event. That is, we must include the sample space S in our collection of events. The other two conditions will appear later in this section because they require additional deﬁnitions. Condition 2 is on page 9, and Condition 3 is on page 10.

Deﬁnition 1.4.2

Containment. It is said that a set A is contained in another set B if every element of the set A also belongs to the set B. This relation between two events is expressed symbolically by the expression A ⊂ B, which is the set-theoretic expression for saying that A is a subset of B. Equivalently, if A ⊂ B, we may say that B contains A and may write B ⊃ A. For events, to say that A ⊂ B means that if A occurs then so does B. The proof of the following result is straightforward and is omitted.

Theorem 1.4.1

Let A, B, and C be events. Then A ⊂ S. If A ⊂ B and B ⊂ A, then A = B. If A ⊂ B and B ⊂ C, then A ⊂ C.

Example 1.4.2

Rolling a Die. In Example 1.4.1, suppose that A is the event that an even number is obtained and C is the event that a number greater than 1 is obtained. Since A = {2, 4, 6} and C = {2, 3, 4, 5, 6}, it follows that A ⊂ C.

8

Chapter 1 Introduction to Probability

The Empty Set Some events are impossible. For example, when a die is rolled, it is impossible to obtain a negative number. Hence, the event that a negative number will be obtained is deﬁned by the subset of S that contains no outcomes. Deﬁnition 1.4.3

Empty Set. The subset of S that contains no elements is called the empty set, or null set, and it is denoted by the symbol ∅. In terms of events, the empty set is any event that cannot occur.

Theorem 1.4.2

Let A be an event. Then ∅ ⊂ A. Proof Let A be an arbitrary event. Since the empty set ∅ contains no points, it is logically correct to say that every point belonging to ∅ also belongs to A, or ∅ ⊂ A.

Finite and Inﬁnite Sets Some sets contain only ﬁnitely many elements, while others have inﬁnitely many elements. There are two sizes of inﬁnite sets that we need to distinguish. Deﬁnition 1.4.4

Countable/Uncountable. An inﬁnite set A is countable if there is a one-to-one correspondence between the elements of A and the set of natural numbers {1, 2, 3, . . .}. A set is uncountable if it is neither ﬁnite nor countable. If we say that a set has at most countably many elements, we mean that the set is either ﬁnite or countable. Examples of countably inﬁnite sets include the integers, the even integers, the odd integers, the prime numbers, and any inﬁnite sequence. Each of these can be put in one-to-one correspondence with the natural numbers. For example, the following function f puts the integers in one-to-one correspondence with the natural numbers: n−1 if n is odd, 2 f (n) = − n2 if n is even. Every inﬁnite sequence of distinct items is a countable set, as its indexing puts it in one-to-one correspondence with the natural numbers. Examples of uncountable sets include the real numbers, the positive reals, the numbers in the interval [0, 1], and the set of all ordered pairs of real numbers. An argument to show that the real numbers are uncountable appears at the end of this section. Every subset of the integers has at most countably many elements.

Operations of Set Theory Deﬁnition 1.4.5

Complement. The complement of a set A is deﬁned to be the set that contains all elements of the sample space S that do not belong to A. The notation for the complement of A is Ac . In terms of events, the event Ac is the event that A does not occur.

Example 1.4.3

Rolling a Die. In Example 1.4.1, suppose again that A is the event that an even number is rolled; then Ac = {1, 3, 5} is the event that an odd number is rolled. We can now state the second condition that we require of the collection of events.

1.4 Set Theory

Figure 1.1 The event Ac .

9

S

A Ac

Figure 1.2 The set A ∪ B.

S A

B A

Condition 2

If A is an event, then Ac is also an event. That is, for each set A of outcomes that we call an event, we must also call its complement Ac an event. A generic version of the relationship between A and Ac is sketched in Fig. 1.1. A sketch of this type is called a Venn diagram. Some properties of the complement are stated without proof in the next result.

Theorem 1.4.3

Let A be an event. Then (Ac )c = A,

∅c = S,

S c = ∅.

The empty event ∅ is an event. Deﬁnition 1.4.6

Union of Two Sets. If A and B are any two sets, the union of A and B is deﬁned to be the set containing all outcomes that belong to A alone, to B alone, or to both A and B. The notation for the union of A and B is A ∪ B. The set A ∪ B is sketched in Fig. 1.2. In terms of events, A ∪ B is the event that either A or B or both occur. The union has the following properties whose proofs are left to the reader.

Theorem 1.4.4

For all sets A and B, A ∪ B = B ∪ A, A ∪ ∅ = A,

A ∪ Ac = S,

A ∪ A = A, A ∪ S = S.

Furthermore, if A ⊂ B, then A ∪ B = B. The concept of union extends to more than two sets. Deﬁnition 1.4.7

Union of Many Sets. The union of n sets A1, . . . , An is deﬁned to be the set that contains all outcomes that belong to at least one of these n sets. The notation for this union is either of the following: A1 ∪ A2 ∪ . . . ∪ An or

n i=1

Ai .

10

Chapter 1 Introduction to Probability

Similarly, the union of an inﬁnite sequence of sets A1, A2 , . . . is the set that contains all outcomes that belong to at least one of the events in the sequence. The inﬁnite union is denoted by ∞ i=1 Ai .

Condition 3

In terms of events, the union of a collection of events is the event that at least one of the events in the collection occurs. We can now state the ﬁnal condition that we require for the collection of sets that we call events. If A1, A2 , . . . is a countable collection of events, then ∞ i=1 Ai is also an event. In other words, if we choose to call each set of outcomes in some countable collection an event, we are required to call their union an event also. We do not require that the union of an arbitrary collection of events be an event. To be clear, let I be an arbitrary set that we use to index a general collection of events {Ai : i ∈ I }. The union of the events in this collection is the set of outcomes that are in at least one of the events in the collection. The notation for this union is i∈I Ai . We do not require that i∈I Ai be an event unless I is countable. Condition 3 refers to a countable collection of events. We can prove that the condition also applies to every ﬁnite collection of events.

Theorem 1.4.5

The union of a ﬁnite number of events A1, . . . , An is an event. Proof For each m = n + 1, n + 2, . . ., deﬁne Am = ∅. Because ∅ is an event, we now have a countable collection A1, A2 , . . . of events. ∞ It follows nfrom Condition 3 that ∞ A is an event. But it is easy to see that A = m=1 m m=1 m m=1 Am . The union of three events A, B, and C can be constructed either directly from the deﬁnition of A ∪ B ∪ C or by ﬁrst evaluating the union of any two of the events and then forming the union of this combination of events and the third event. In other words, the following result is true.

Theorem 1.4.6

Associative Property. For every three events A, B, and C, the following associative relations are satisﬁed: A ∪ B ∪ C = (A ∪ B) ∪ C = A ∪ (B ∪ C).

Deﬁnition 1.4.8

Intersection of Two Sets. If A and B are any two sets, the intersection of A and B is deﬁned to be the set that contains all outcomes that belong both to A and to B. The notation for the intersection of A and B is A ∩ B. The set A ∩ B is sketched in a Venn diagram in Fig. 1.3. In terms of events, A ∩ B is the event that both A and B occur. The proof of the ﬁrst part of the next result follows from Exercise 3 in this section. The rest of the proof is straightforward.

Figure 1.3 The set A ∩ B.

S A

B

1.4 Set Theory

Theorem 1.4.7

11

If A and B are events, then so is A ∩ B. For all events A and B, A ∩ B = B ∩ A, A ∩ ∅ = ∅,

A ∩ A = A, A ∩ S = A.

A ∩ Ac = ∅,

Furthermore, if A ⊂ B, then A ∩ B = A. The concept of intersection extends to more than two sets. Deﬁnition 1.4.9

Intersection of Many Sets. The intersection of n sets A1, . . . , An is deﬁned to be the set that contains the elements that are common to all these n sets. The notation for this intersection is A1 ∩ A2 ∩ . . . ∩ An or ni=1 Ai . Similar notations are used for the intersection of an inﬁnite sequence of sets or for the intersection of an arbitrary collection of sets. In terms of events, the intersection of a collection of events is the event that every event in the collection occurs. The following result concerning the intersection of three events is straightforward to prove.

Theorem 1.4.8

Associative Property. For every three events A, B, and C, the following associative relations are satisﬁed: A ∩ B ∩ C = (A ∩ B) ∩ C = A ∩ (B ∩ C).

Deﬁnition 1.4.10

Disjoint/Mutually Exclusive. It is said that two sets A and B are disjoint, or mutually exclusive, if A and B have no outcomes in common, that is, if A ∩ B = ∅. The sets A1, . . . , An or the sets A1, A2 , . . . are disjoint if for every i = j , we have that Ai and Aj are disjoint, that is, Ai ∩ Aj = ∅ for all i = j . The events in an arbitrary collection are disjoint if no two events in the collection have any outcomes in common. In terms of events, A and B are disjoint if they cannot both occur. As an illustration of these concepts, a Venn diagram for three events A1, A2 , and A3 is presented in Fig. 1.4. This diagram indicates that the various intersections of A1, A2 , and A3 and their complements will partition the sample space S into eight disjoint subsets.

Figure 1.4 Partition of S determined by three events A1, A2 , A3.

S A1

A2

A1傽A2c 傽A3c

A1傽A2傽A3c

A1c傽A2傽A3c

A1傽A2傽A3 A1傽A2c 傽A3

A3

A1c傽A2傽A3

A1c傽A2c 傽A3 A1c傽A2c 傽A3c

12

Chapter 1 Introduction to Probability

Example 1.4.4

Tossing a Coin. Suppose that a coin is tossed three times. Then the sample space S contains the following eight possible outcomes s1, . . . , s8: s1: s2 :

HHH, THH,

s3 : s4 :

HTH, HHT,

s5 :

HTT,

s6 :

THT,

s7 : s8 :

TTH, TTT.

In this notation, H indicates a head and T indicates a tail. The outcome s3, for example, is the outcome in which a head is obtained on the ﬁrst toss, a tail is obtained on the second toss, and a head is obtained on the third toss. To apply the concepts introduced in this section, we shall deﬁne four events as follows: Let A be the event that at least one head is obtained in the three tosses; let B be the event that a head is obtained on the second toss; let C be the event that a tail is obtained on the third toss; and let D be the event that no heads are obtained. Accordingly, A = {s1, s2 , s3, s4, s5, s6, s7}, B = {s1, s2 , s4, s6}, C = {s4, s5, s6, s8}, D = {s8}. Various relations among these events can be derived. Some of these relations are B ⊂ A, Ac = D, B ∩ D = ∅, A ∪ C = S, B ∩ C = {s4, s6}, (B ∪ C)c = {s3, s7}, and A ∩ (B ∪ C) = {s1, s2 , s4, s5, s6}. Example 1.4.5

Figure 1.5 Sample space for water and electric demand in Example 1.4.5

Demands for Utilities. A contractor is building an ofﬁce complex and needs to plan for water and electricity demand (sizes of pipes, conduit, and wires). After consulting with prospective tenants and examining historical data, the contractor decides that the demand for electricity will range somewhere between 1 million and 150 million kilowatt-hours per day and water demand will be between 4 and 200 (in thousands of gallons per day). All combinations of electrical and water demand are considered possible. The shaded region in Fig. 1.5 shows the sample space for the experiment, consisting of learning the actual water and electricity demands for the ofﬁce complex. We can express the sample space as the set of ordered pairs {(x, y) : 4 ≤ x ≤ 200, 1 ≤ y ≤ 150}, where x stands for water demand in thousands of gallons per day and y

Electric

150

1 0

Water 4

200

1.4 Set Theory

Figure 1.6 Partition of A ∪ B in Theorem 1.4.11.

13

S A B A傽Bc A傽B Ac傽B

stands for the electric demand in millions of kilowatt-hours per day. The types of sets that we want to call events include sets like {water demand is at least 100} = {(x, y) : x ≥ 100}, and {electric demand is no more than 35} = {(x, y) : y ≤ 35}, along with intersections, unions, and complements of such sets. This sample space has inﬁnitely many points. Indeed, the sample space is uncountable. There are many more sets that are difﬁcult to describe and which we will have no need to consider as events.

Additional Properties of Sets The proof of the following useful result is left to Exercise 3 in this section. Theorem 1.4.9

De Morgan’s Laws. For every two sets A and B, (A ∪ B)c = Ac ∩ B c

and

(A ∩ B)c = Ac ∪ B c .

The generalization of Theorem 1.4.9 is the subject of Exercise 5 in this section. The proofs of the following distributive properties are left to Exercise 2 in this section. These properties also extend in natural ways to larger collections of events. Theorem 1.4.10

Distributive Properties. For every three sets A, B, and C, A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

and

A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).

The following result is useful for computing probabilities of events that can be partitioned into smaller pieces. Its proof is left to Exercise 4 in this section, and is illuminated by Fig. 1.6. Theorem 1.4.11

Partitioning a Set. For every two sets A and B, A ∩ B and A ∩ B c are disjoint and A = (A ∩ B) ∪ (A ∩ B c ). In addition, B and A ∩ B c are disjoint, and A ∪ B = B ∪ (A ∩ B c ).

Proof That the Real Numbers Are Uncountable We shall show that the real numbers in the interval [0, 1) are uncountable. Every larger set is a fortiori uncountable. For each number x ∈ [0, 1), deﬁne the sequence {an(x)}∞ n=1 as follows. First, a1(x) = 10x , where y stands for the greatest integer less than or equal to y (round nonintegers down to the closest integer below). Then

14

Chapter 1 Introduction to Probability

0 2 3 0 7 1 3 ... 1 9 9 2 1 0 0 ... 2 7 3 6 0 1 1 ... 8 0 2 1 2 7 9 ... 7 0 1 6 0 1 3 ... 1 5 1 5 1 5 1 ... 2 3 4 5 6 7 8 ... 0 .. .

1 .. .

7 .. .

3 .. .

2 .. .

9 .. .

8 ... .. . . . .

Figure 1.7 An array of a countable collection of sequences of digits with the diagonal underlined.

set b1(x) = 10x − a1(x), which will again be in [0, 1). For n > 1, an(x) = 10bn−1(x) and bn(x) = 10bn−1(x) − an(x). It is easy to see that the sequence {an(x)}∞ n=1 gives a decimal expansion for x in the form x=

∞

an(x)10−n.

(1.4.1)

n=1

By construction, each number of the form x = k/10m for some nonnegative the form k/10m integers k and m will have an(x) = 0 for n > m. The numbers of −n are the only ones that have an alternate decimal expansion x = ∞ n=1 cn (x)10 . When k is not a multiple of 10, this alternate expansion satisﬁes cn(x) = an(x) for n = 1, . . . , m − 1, cm(x) = am(x) − 1, and cn(x) = 9 for n > m. Let C = {0, 1, . . . , 9}∞ stand for the set of all inﬁnite sequences of digits. Let B denote the subset of C consisting of those sequences that don’t end in repeating 9’s. Then we have just constructed a function a from the interval [0, 1) onto B that is one-to-one and whose inverse is given in (1.4.1). We now show that the set B is uncountable, hence [0, 1) is uncountable. Take any countable subset of B and arrange the sequences into a rectangular array with the kth sequence running across the kth row of the array for k = 1, 2, . . . . Figure 1.7 gives an example of part of such an array. In Fig. 1.7, we have underlined the kth digit in the kth sequence for each k. This portion of the array is called the diagonal of the array. We now show that there must exist a sequence in B that is not part of this array. This will prove that the whole set B cannot be put into such an array, and hence cannot be countable. Construct the sequence {dn}∞ n=1 as follows. For each n, let dn = 2 if the nth digit in the nth sequence is 1, and dn = 1 otherwise. This sequence does not end in repeating 9’s; hence, it is in B. We conclude the proof by showing that {dn}∞ n=1 does not appear anywhere in the array. If the sequence did appear in the array, say, in the kth row, then its kth element would be the kth diagonal element of the array. But we constructed the sequence so that for every n (including n = k), its nth element never matched the nth diagonal element. Hence, the sequence can’t be in the kth row, no matter what k is. The argument given here is essentially that of the nineteenth-century German mathematician Georg Cantor.

1.4 Set Theory

15

Summary We will use set theory for the mathematical model of events. Outcomes of an experiment are elements of some sample space S, and each event is a subset of S. Two events both occur if the outcome is in the intersection of the two sets. At least one of a collection of events occurs if the outcome is in the union of the sets. Two events cannot both occur if the sets are disjoint. An event fails to occur if the outcome is in the complement of the set. The empty set stands for every event that cannot possibly occur. The collection of events is assumed to contain the sample space, the complement of each event, and the union of each countable collection of events.

Exercises 1. Suppose that A ⊂ B. Show that B c ⊂ Ac . 2. Prove the distributive properties in Theorem 1.4.10. 3. Prove De Morgan’s laws (Theorem 1.4.9). 4. Prove Theorem 1.4.11. 5. For every collection of events Ai (i ∈ I ), show that c c

c Ai = Ai and Ai = Aci . i∈I

i∈I

i∈I

i∈I

6. Suppose that one card is to be selected from a deck of 20 cards that contains 10 red cards numbered from 1 to 10 and 10 blue cards numbered from 1 to 10. Let A be the event that a card with an even number is selected, let B be the event that a blue card is selected, and let C be the event that a card with a number less than 5 is selected. Describe the sample space S and describe each of the following events both in words and as subsets of S: a. A ∩ B ∩ C d. A ∩ (B ∪ C)

c. A ∪ B ∪ C b. B ∩ C c c c e. A ∩ B ∩ C c .

7. Suppose that a number x is to be selected from the real line S, and let A, B, and C be the events represented by the following subsets of S, where the notation {x: - - -} denotes the set containing every point x for which the property presented following the colon is satisﬁed: A = {x: 1 ≤ x ≤ 5}, B = {x: 3 < x ≤ 7}, C = {x: x ≤ 0}. Describe each of the following events as a set of real numbers: a. Ac d. Ac ∩ B c ∩ C c

b. A ∪ B c. B ∩ C c e. (A ∪ B) ∩ C.

8. A simpliﬁed model of the human blood-type system has four blood types: A, B, AB, and O. There are two antigens, anti-A and anti-B, that react with a person’s

blood in different ways depending on the blood type. AntiA reacts with blood types A and AB, but not with B and O. Anti-B reacts with blood types B and AB, but not with A and O. Suppose that a person’s blood is sampled and tested with the two antigens. Let A be the event that the blood reacts with anti-A, and let B be the event that it reacts with anti-B. Classify the person’s blood type using the events A, B, and their complements. 9. Let S be a given sample space and let A1, A2 , . . . be an For n = 1, 2, . . . , let Bn = ∞inﬁnite sequence ofevents. ∞ A and let C = A . n i=n i i=n i a. Show that B1 ⊃ B2 ⊃ . . . and that C1 ⊂ C2 ⊂ . . .. b. Show ∞ that an outcome in S belongs to the event n=1 Bn if and only if it belongs to an inﬁnite number of the events A1, A2 , . . . . c. Show ∞ that an outcome in S belongs to the event n=1 Cn if and only if it belongs to all the events A1, A2 , . . . except possibly a ﬁnite number of those events. 10. Three six-sided dice are rolled. The six sides of each die are numbered 1–6. Let A be the event that the ﬁrst die shows an even number, let B be the event that the second die shows an even number, and let C be the event that the third die shows an even number. Also, for each i = 1, . . . , 6, let Ai be the event that the ﬁrst die shows the number i, let Bi be the event that the second die shows the number i, and let Ci be the event that the third die shows the number i. Express each of the following events in terms of the named events described above: a. b. c. d. e.

The event that all three dice show even numbers The event that no die shows an even number The event that at least one die shows an odd number The event that at most two dice show odd numbers The event that the sum of the three dices is no greater than 5

11. A power cell consists of two subcells, each of which can provide from 0 to 5 volts, regardless of what the other

16

Chapter 1 Introduction to Probability

subcell provides. The power cell is functional if and only if the sum of the two voltages of the subcells is at least 6 volts. An experiment consists of measuring and recording the voltages of the two subcells. Let A be the event that the power cell is functional, let B be the event that two subcells have the same voltage, let C be the event that the ﬁrst subcell has a strictly higher voltage than the second subcell, and let D be the event that the power cell is not functional but needs less than one additional volt to become functional. a. Deﬁne a sample space S for the experiment as a set of ordered pairs that makes it possible for you to express the four sets above as events. b. Express each of the events A, B, C, and D as sets of ordered pairs that are subsets of S. c. Express the following set in terms of A, B, C, and/or D: {(x, y) : x = y and x + y ≤ 5}.

d. Express the following event in terms of A, B, C, and/or D: the event that the power cell is not functional and the second subcell has a strictly higher voltage than the ﬁrst subcell. 12. Suppose that the sample space S of some experiment is ﬁnite. Show that the collection of all subsets of S satisﬁes the three conditions required to be called the collection of events. 13. Let S be the sample space for some experiment. Show that the collection of subsets consisting solely of S and ∅ satisﬁes the three conditions required in order to be called the collection of events. Explain why this collection would not be very interesting in most real problems. 14. Suppose that the sample space S of some experiment is countable. Suppose also that, for every outcome s ∈ S, the subset {s} is an event. Show that every subset of S must be an event. Hint: Recall the three conditions required of the collection of subsets of S that we call events.

1.5 The Deﬁnition of Probability We begin with the mathematical deﬁnition of probability and then present some useful results that follow easily from the deﬁnition.

Axioms and Basic Theorems In this section, we shall present the mathematical, or axiomatic, deﬁnition of probability. In a given experiment, it is necessary to assign to each event A in the sample space S a number Pr(A) that indicates the probability that A will occur. In order to satisfy the mathematical deﬁnition of probability, the number Pr(A) that is assigned must satisfy three speciﬁc axioms. These axioms ensure that the number Pr(A) will have certain properties that we intuitively expect a probability to have under each of the various interpretations described in Sec. 1.2. The ﬁrst axiom states that the probability of every event must be nonnegative. Axiom 1

For every event A, Pr(A) ≥ 0. The second axiom states that if an event is certain to occur, then the probability of that event is 1.

Axiom 2

Pr(S) = 1. Before stating Axiom 3, we shall discuss the probabilities of disjoint events. If two events are disjoint, it is natural to assume that the probability that one or the other will occur is the sum of their individual probabilities. In fact, it will be assumed that this additive property of probability is also true for every ﬁnite collection of disjoint events and even for every inﬁnite sequence of disjoint events. If we assume that this additive property is true only for a ﬁnite number of disjoint events, we cannot then be certain that the property will be true for an inﬁnite sequence of disjoint events as well. However, if we assume that the additive property is true for every inﬁnite sequence

1.5 The Deﬁnition of Probability

17

of disjoint events, then (as we shall prove) the property must also be true for every ﬁnite number of disjoint events. These considerations lead to the third axiom. Axiom 3

For every inﬁnite sequence of disjoint events A1, A2 , . . . , ∞ ∞ Pr Ai = Pr(Ai ). i=1

i=1

Example 1.5.1

Rolling a Die. In Example 1.4.1, for each subset A of S = {1, 2, 3, 4, 5, 6}, let Pr(A) be the number of elements of A divided by 6. It is trivial to see that this satisﬁes the ﬁrst two axioms. There are only ﬁnitely many distinct collections of nonempty disjoint events. It is not difﬁcult to see that Axiom 3 is also satisﬁed by this example.

Example 1.5.2

A Loaded Die. In Example 1.5.1, there are other choices for the probabilities of events. For example, if we believe that the die is loaded, we might believe that some sides have different probabilities of turning up. To be speciﬁc, suppose that we believe that 6 is twice as likely to come up as each of the other ﬁve sides. We could set pi = 1/7 for i = 1, 2, 3, 4, 5 and p6 = 2/7. Then, for each event A, deﬁne Pr(A) to be the sum of all pi such that i ∈ A. For example, if A = {1, 3, 5}, then Pr(A) = p1 + p3 + p5 = 3/7. It is not difﬁcult to check that this also satisﬁes all three axioms. We are now prepared to give the mathematical deﬁnition of probability.

Deﬁnition 1.5.1

Probability. A probability measure, or simply a probability, on a sample space S is a speciﬁcation of numbers Pr(A) for all events A that satisfy Axioms 1, 2, and 3. We shall now derive two important consequences of Axiom 3. First, we shall show that if an event is impossible, its probability must be 0.

Theorem 1.5.1

Pr(∅) = 0. Proof Consider the inﬁnite sequence of events A1, A2 , . . . such that Ai = ∅ for i = 1, 2, . . . . In other words, each of the events in the sequence is just the empty set ∅. Then this sequence is a sequence of disjoint events, since ∅ ∩ ∅ = ∅. Furthermore, ∞ i=1 Ai = ∅. Therefore, it follows from Axiom 3 that ∞ ∞ ∞ Pr(∅) = Pr Ai = Pr(Ai ) = Pr(∅). i=1

i=1

i=1

This equation states that when the number Pr(∅) is added repeatedly in an inﬁnite series, the sum of that series is simply the number Pr(∅). The only real number with this property is zero. We can now show that the additive property assumed in Axiom 3 for an inﬁnite sequence of disjoint events is also true for every ﬁnite number of disjoint events. Theorem 1.5.2

For every ﬁnite sequence of n disjoint events A1, . . . , An, n n Pr Ai = Pr(Ai ). i=1

i=1

Proof Consider the inﬁnite sequence of events A1, A2 , . . . , in which A1, . . . , An are the n given disjoint events and Ai = ∅ for i > n. Then the events in this inﬁnite

18

Chapter 1 Introduction to Probability

n sequence are disjoint and ∞ i=1 Ai = i=1 Ai . Therefore, by Axiom 3, ∞ n ∞ Ai = Pr Ai = Pr(Ai ) Pr i=1

i=1

=

n

Pr(Ai ) +

i=1

=

n

i=1 ∞

Pr(Ai )

i=n+1

Pr(Ai ) + 0

i=1

=

n

Pr(Ai ).

i=1

Further Properties of Probability From the axioms and theorems just given, we shall now derive four other general properties of probability measures. Because of the fundamental nature of these four properties, they will be presented in the form of four theorems, each one of which is easily proved. Theorem 1.5.3

For every event A, Pr(Ac ) = 1 − Pr(A). Proof Since A and Ac are disjoint events and A ∪ Ac = S, it follows from Theorem 1.5.2 that Pr(S) = Pr(A) + Pr(Ac ). Since Pr(S) = 1 by Axiom 2, then Pr(Ac ) = 1 − Pr(A).

Theorem 1.5.4

If A ⊂ B, then Pr(A) ≤ Pr(B). Proof As illustrated in Fig. 1.8, the event B may be treated as the union of the two disjoint events A and B ∩ Ac . Therefore, Pr(B) = Pr(A) + Pr(B ∩ Ac ). Since Pr(B ∩ Ac ) ≥ 0, then Pr(B) ≥ Pr(A).

Theorem 1.5.5

For every event A, 0 ≤ Pr(A) ≤ 1. Proof It is known from Axiom 1 that Pr(A) ≥ 0. Since A ⊂ S for every event A, Theorem 1.5.4 implies Pr(A) ≤ Pr(S) = 1, by Axiom 2.

Theorem 1.5.6

Figure 1.8 B = A ∪ (B ∩ Ac ) in the proof of Theorem 1.5.4.

For every two events A and B, Pr(A ∩ B c ) = Pr(A) − Pr(A ∩ B).

S B B傽Ac A

1.5 The Deﬁnition of Probability

19

Proof According to Theorem 1.4.11, the events A ∩ B c and A ∩ B are disjoint and A = (A ∩ B) ∪ (A ∩ B c ). It follows from Theorem 1.5.2 that Pr(A) = Pr(A ∩ B) + Pr(A ∩ B c ). Subtract Pr(A ∩ B) from both sides of this last equation to complete the proof. Theorem 1.5.7

For every two events A and B, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

(1.5.1)

Proof From Theorem 1.4.11, we have A ∪ B = B ∪ (A ∩ B c ), and the two events on the right side of this equation are disjoint. Hence, we have Pr(A ∪ B) = Pr(B) + Pr(A ∩ B c ) = Pr(B) + Pr(A) − Pr(A ∩ B), where the ﬁrst equation follows from Theorem 1.5.2, and the second follows from Theorem 1.5.6. Example 1.5.3

Diagnosing Diseases. A patient arrives at a doctor’s ofﬁce with a sore throat and lowgrade fever. After an exam, the doctor decides that the patient has either a bacterial infection or a viral infection or both. The doctor decides that there is a probability of 0.7 that the patient has a bacterial infection and a probability of 0.4 that the person has a viral infection. What is the probability that the patient has both infections? Let B be the event that the patient has a bacterial infection, and let V be the event that the patient has a viral infection. We are told Pr(B) = 0.7, that Pr(V ) = 0.4, and that S = B ∪ V . We are asked to ﬁnd Pr(B ∩ V ). We will use Theorem 1.5.7, which says that Pr(B ∪ V ) = Pr(B) + Pr(V ) − Pr(B ∩ V ).

(1.5.2)

Since S = B ∪ V , the left-hand side of (1.5.2) is 1, while the ﬁrst two terms on the right-hand side are 0.7 and 0.4. The result is 1 = 0.7 + 0.4 − Pr(B ∩ V ), which leads to Pr(B ∩ V ) = 0.1, the probability that the patient has both infections. Example 1.5.4

Demands for Utilities. Consider, once again, the contractor who needs to plan for water and electricity demands in Example 1.4.5. There are many possible choices for how to spread the probability around the sample space (pictured in Fig. 1.5 on page 12). One simple choice is to make the probability of an event E proportional to the area of E. The area of S (the sample space) is (150 − 1) × (200 − 4) = 29,204, so Pr(E) equals the area of E divided by 29,204. For example, suppose that the contractor is interested in high demand. Let A be the set where water demand is at least 100, and let B be the event that electric demand is at least 115, and suppose that these values are considered high demand. These events are shaded with different patterns in Fig. 1.9. The area of A is (150 − 1) × (200 − 100) = 14,900, and the area

20

Chapter 1 Introduction to Probability

Figure 1.9 The two events of interest in utility demand sample space for Example 1.5.4.

Electric

150

A傽B 115 B

A

1 0

Water 4

100

200

of B is (150 − 115) × (200 − 4) = 6,860. So, Pr(A) =

14,900 = 0.5102, 29,204

Pr(B) =

6,860 = 0.2349. 29,204

The two events intersect in the region denoted by A ∩ B. The area of this region is (150 − 115) × (200 − 100) = 3,500, so Pr(A ∩ B) = 3,500/29,204 = 0.1198. If the contractor wishes to compute the probability that at least one of the two demands will be high, that probability is Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) = 0.5102 + 0.2349 − 0.1198 = 0.6253,

according to Theorem 1.5.7. The proof of the following useful result is left to Exercise 13. Theorem 1.5.8

Bonferroni Inequality. For all events A1, . . . , An, n n n n

Pr Ai ≤ Pr(Ai ) and Pr Ai ≥ 1 − Pr(Aci ). i=1

i=1

i=1

i=1

(The second inequality above is known as the Bonferroni inequality.)

Note: Probability Zero Does Not Mean Impossible. When an event has probability 0, it does not mean that the event is impossible. In Example 1.5.4, there are many events with 0 probability, but they are not all impossible. For example, for every x, the event that water demand equals x corresponds to a line segment in Fig. 1.5. Since line segments have 0 area, the probability of every such line segment is 0, but the events are not all impossible. Indeed, if every event of the form {water demand equals x} were impossible, then water demand could not take any value at all. If > 0, the event {water demand is between x − and x + } will have positive probability, but that probability will go to 0 as goes to 0.

Summary We have presented the mathematical deﬁnition of probability through the three axioms. The axioms require that every event have nonnegative probability, that the whole sample space have probability 1, and that the union of an inﬁnite sequence of disjoint events have probability equal to the sum of their probabilities. Some important results to remember include the following:

1.5 The Deﬁnition of Probability .

21

If A1, . . . , Ak are disjoint, Pr ∪ki=1Ai = ki=1 Pr(Ai ).

.

Pr(Ac ) = 1 − Pr(A).

.

A ⊂ B implies that Pr(A) ≤ Pr(B).

.

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

It does not matter how the probabilities were determined. As long as they satisfy the three axioms, they must also satisfy the above relations as well as all of the results that we prove later in the text.

Exercises 1. One ball is to be selected from a box containing red, white, blue, yellow, and green balls. If the probability that the selected ball will be red is 1/5 and the probability that it will be white is 2/5, what is the probability that it will be blue, yellow, or green? 2. A student selected from a class will be either a boy or a girl. If the probability that a boy will be selected is 0.3, what is the probability that a girl will be selected? 3. Consider two events A and B such that Pr(A) = 1/3 and Pr(B) = 1/2. Determine the value of Pr(B ∩ Ac ) for each of the following conditions: (a) A and B are disjoint; (b) A ⊂ B; (c) Pr(A ∩ B) = 1/8. 4. If the probability that student A will fail a certain statistics examination is 0.5, the probability that student B will fail the examination is 0.2, and the probability that both student A and student B will fail the examination is 0.1, what is the probability that at least one of these two students will fail the examination? 5. For the conditions of Exercise 4, what is the probability that neither student A nor student B will fail the examination? 6. For the conditions of Exercise 4, what is the probability that exactly one of the two students will fail the examination? 7. Consider two events A and B with Pr(A) = 0.4 and Pr(B) = 0.7. Determine the maximum and minimum possible values of Pr(A ∩ B) and the conditions under which each of these values is attained. 8. If 50 percent of the families in a certain city subscribe to the morning newspaper, 65 percent of the families subscribe to the afternoon newspaper, and 85 percent of the families subscribe to at least one of the two newspapers, what percentage of the families subscribe to both newspapers? 9. Prove that for every two events A and B, the probability that exactly one of the two events will occur is given by the expression

Pr(A) + Pr(B) − 2 Pr(A ∩ B). 10. For two arbitrary events A and B, prove that Pr(A) = Pr(A ∩ B) + Pr(A ∩ B c ). 11. A point (x, y) is to be selected from the square S containing all points (x, y) such that 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. Suppose that the probability that the selected point will belong to each speciﬁed subset of S is equal to the area of that subset. Find the probability of each of the following subsets: (a) the subset of points such that (x − 21 )2 + (y − 1 2 1 1 3 2 ) ≥ 4 ; (b) the subset of points such that 2 < x + y < 2 ; (c) the subset of points such that y ≤ 1 − x 2 ; (d) the subset of points such that x = y. 12. Let A1, A2 , . . . be an arbitrary inﬁnite sequence of events, and let B1, B2 , . . . be another inﬁnite sequence of events deﬁned as follows: B1 = A1, B2 = Ac1 ∩ A2 , B3 = Ac1 ∩ Ac2 ∩ A3, B4 = Ac1 ∩ Ac2 ∩ Ac3 ∩ A4, . . . . Prove that n n Ai = Pr(Bi ) for n = 1, 2, . . . , Pr i=1

i=1

and that Pr

∞ i=1

Ai =

∞

Pr(Bi ).

i=1

13. Prove Theorem 1.5.8. Hint: Use Exercise 12. 14. Consider, once again, the four blood types A, B, AB, and O described in Exercise 8 in Sec. 1.4 together with the two antigens anti-A and anti-B. Suppose that, for a given person, the probability of type O blood is 0.5, the probability of type A blood is 0.34, and the probability of type B blood is 0.12. a. Find the probability that each of the antigens will react with this person’s blood. b. Find the probability that both antigens will react with this person’s blood.

22

Chapter 1 Introduction to Probability

1.6 Finite Sample Spaces The simplest experiments in which to determine and derive probabilities are those that involve only ﬁnitely many possible outcomes. This section gives several examples to illustrate the important concepts from Sec. 1.5 in ﬁnite sample spaces. Example 1.6.1

Current Population Survey. Every month, the Census Bureau conducts a survey of the United States population in order to learn about labor-force characteristics. Several pieces of information are collected on each of about 50,000 households. One piece of information is whether or not someone in the household is actively looking for employment but currently not employed. Suppose that our experiment consists of selecting three households at random from the 50,000 that were surveyed in a particular month and obtaining access to the information recorded during the survey. (Due to the conﬁdential nature of information obtained during the Current Population Survey, only researchers in the Census Bureau would be able to perform the experiment just described.) The outcomes that make up the sample space S for this experiment can be described as lists of three three distinct numbers from 1 to 50,000. For example (300, 1, 24602) is one such list where we have kept track of the order in which the three households were selected. Clearly, there are only ﬁnitely many such lists. We can assume that each list is equally likely to be chosen, but we need to be able to count how many such lists there are. We shall learn a method for counting the outcomes for this example in Sec. 1.7.

Requirements of Probabilities In this section, we shall consider experiments for which there are only a ﬁnite number of possible outcomes. In other words, we shall consider experiments for which the sample space S contains only a ﬁnite number of points s1, . . . , sn. In an experiment of this type, a probability measure on S is speciﬁed by assigning a probability pi to each point si ∈ S. The number pi is the probability that the outcome of the experiment will be si (i = 1, . . . , n). In order to satisfy the axioms of probability, the numbers p1, . . . , pn must satisfy the following two conditions: pi ≥ 0

for i = 1, . . . , n

and n

pi = 1.

i=1

The probability of each event A can then be found by adding the probabilities pi of all outcomes si that belong to A. This is the general version of Example 1.5.2. Example 1.6.2

Fiber Breaks. Consider an experiment in which ﬁve ﬁbers having different lengths are subjected to a testing process to learn which ﬁber will break ﬁrst. Suppose that the lengths of the ﬁve ﬁbers are 1, 2, 3, 4, and 5 inches, respectively. Suppose also that the probability that any given ﬁber will be the ﬁrst to break is proportional to the length of that ﬁber. We shall determine the probability that the length of the ﬁber that breaks ﬁrst is not more than 3 inches. In this example, we shall let si be the outcome in which the ﬁber whose length is i inches breaks ﬁrst (i = 1, . . . , 5). Then S = {s1, . . . , s5} and pi = αi for i = 1, . . . , 5, where α is a proportionality factor. It must be true that p1 + . . . + p5 = 1, and we know that p1 + . . . + p5 = 15α, so α = 1/15. If A is the event that the length of the

1.6 Finite Sample Spaces

23

ﬁber that breaks ﬁrst is not more than 3 inches, then A = {s1, s2 , s3}. Therefore, Pr(A) = p1 + p2 + p3 =

2 3 2 1 + + = . 15 15 15 5

Simple Sample Spaces A sample space S containing n outcomes s1, . . . , sn is called a simple sample space if the probability assigned to each of the outcomes s1, . . . , sn is 1/n. If an event A in this simple sample space contains exactly m outcomes, then m Pr(A) = . n Example 1.6.3

Tossing Coins. Suppose that three fair coins are tossed simultaneously. We shall determine the probability of obtaining exactly two heads. Regardless of whether or not the three coins can be distinguished from each other by the experimenter, it is convenient for the purpose of describing the sample space to assume that the coins can be distinguished. We can then speak of the result for the ﬁrst coin, the result for the second coin, and the result for the third coin; and the sample space will comprise the eight possible outcomes listed in Example 1.4.4 on page 12. Furthermore, because of the assumption that the coins are fair, it is reasonable to assume that this sample space is simple and that the probability assigned to each of the eight outcomes is 1/8. As can be seen from the listing in Example 1.4.4, exactly two heads will be obtained in three of these outcomes. Therefore, the probability of obtaining exactly two heads is 3/8. It should be noted that if we had considered the only possible outcomes to be no heads, one head, two heads, and three heads, it would have been reasonable to assume that the sample space contained just these four outcomes. This sample space would not be simple because the outcomes would not be equally probable.

Example 1.6.4

Genetics. Inherited traits in humans are determined by material in speciﬁc locations on chromosomes. Each normal human receives 23 chromosomes from each parent, and these chromosomes are naturally paired, with one chromosome in each pair coming from each parent. For the purposes of this text, it is safe to think of a gene as a portion of each chromosome in a pair. The genes, either one at a time or in combination, determine the inherited traits, such as blood type and hair color. The material in the two locations that make up a gene on the pair of chromosomes comes in forms called alleles. Each distinct combination of alleles (one on each chromosome) is called a genotype. Consider a gene with only two different alleles A and a. Suppose that both parents have genotype Aa, that is, each parent has allele A on one chromosome and allele a on the other. (We do not distinguish the same alleles in a different order as a different genotype. For example, aA would be the same genotype as Aa. But it can be convenient to distinguish the two chromosomes during intermediate steps in probability calculations, just as we distinguished the three coins in Example 1.6.3.) What are the possible genotypes of an offspring of these two parents? If all possible results of the parents contributing pairs of alleles are equally likely, what are the probabilities of the different genotypes? To begin, we shall distinguish which allele the offspring receives from each parent, since we are assuming that pairs of contributed alleles are equally likely.

24

Chapter 1 Introduction to Probability

Afterward, we shall combine those results that produce the same genotype. The possible contributions from the parents are:

Mother Father

A

a

A

AA

Aa

a

aA

aa

So, there are three possible genotypes AA, Aa, and aa for the offspring. Since we assumed that every combination was equally likely, the four cells in the table all have probability 1/4. Since two of the cells in the table combined into genotype Aa, that genotype has probability 1/2. The other two genotypes each have probability 1/4, since they each correspond to only one cell in the table. Example 1.6.5

Rolling Two Dice. We shall now consider an experiment in which two balanced dice are rolled, and we shall calculate the probability of each of the possible values of the sum of the two numbers that may appear. Although the experimenter need not be able to distinguish the two dice from one another in order to observe the value of their sum, the speciﬁcation of a simple sample space in this example will be facilitated if we assume that the two dice are distinguishable. If this assumption is made, each outcome in the sample space S can be represented as a pair of numbers (x, y), where x is the number that appears on the ﬁrst die and y is the number that appears on the second die. Therefore, S comprises the following 36 outcomes: (1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)

(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)

(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)

(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)

(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)

(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)

It is natural to assume that S is a simple sample space and that the probability of each of these outcomes is 1/36. Let Pi denote the probability that the sum of the two numbers is i for i = 2, 3, . . . , 12. The only outcome in S for which the sum is 2 is the outcome (1, 1). Therefore, P2 = 1/36. The sum will be 3 for either of the two outcomes (1, 2) and (2, 1). Therefore, P3 = 2/36 = 1/18. By continuing in this manner, we obtain the following probability for each of the possible values of the sum: 1 , 36 2 P3 = P11 = , 36 3 P4 = P10 = , 36

P2 = P12 =

4 , 36 5 P6 = P8 = , 36 6 P7 = . 36 P5 = P9 =

1.7 Counting Methods

25

Summary A simple sample space is a ﬁnite sample space S such that every outcome in S has the same probability. If there are n outcomes in a simple sample space S, then each one must have probability 1/n. The probability of an event E in a simple sample space is the number of outcomes in E divided by n. In the next three sections, we will present some useful methods for counting numbers of outcomes in various events.

Exercises 1. If two balanced dice are rolled, what is the probability that the sum of the two numbers that appear will be odd?

6. If three fair coins are tossed, what is the probability that all three faces will be the same?

2. If two balanced dice are rolled, what is the probability that the sum of the two numbers that appear will be even?

7. Consider the setup of Example 1.6.4 on page 23. This time, assume that two parents have genotypes Aa and aa. Find the possible genotypes for an offspring and ﬁnd the probabilities for each genotype. Assume that all possible results of the parents contributing pairs of alleles are equally likely.

3. If two balanced dice are rolled, what is the probability that the difference between the two numbers that appear will be less than 3? 4. A school contains students in grades 1, 2, 3, 4, 5, and 6. Grades 2, 3, 4, 5, and 6 all contain the same number of students, but there are twice this number in grade 1. If a student is selected at random from a list of all the students in the school, what is the probability that she will be in grade 3? 5. For the conditions of Exercise 4, what is the probability that the selected student will be in an odd-numbered grade?

8. Consider an experiment in which a fair coin is tossed once and a balanced die is rolled once. a. Describe the sample space for this experiment. b. What is the probability that a head will be obtained on the coin and an odd number will be obtained on the die?

1.7 Counting Methods In simple sample spaces, one way to calculate the probability of an event involves counting the number of outcomes in the event and the number of outcomes in the sample space. This section presents some common methods for counting the number of outcomes in a set. These methods rely on special structure that exists in many common experiments, namely, that each outcome consists of several parts and that it is relatively easy to count how many possibilities there are for each of the parts. We have seen that in a simple sample space S, the probability of an event A is the ratio of the number of outcomes in A to the total number of outcomes in S. In many experiments, the number of outcomes in S is so large that a complete listing of these outcomes is too expensive, too slow, or too likely to be incorrect to be useful. In such an experiment, it is convenient to have a method of determining the total number of outcomes in the space S and in various events in S without compiling a list of all these outcomes. In this section, some of these methods will be presented.

26

Chapter 1 Introduction to Probability

Figure 1.10 Three cities with routes between them in Example 1.7.1.

4

1 B 2

A

5 6

C

7 3 8

Multiplication Rule Example 1.7.1

Routes between Cities. Suppose that there are three different routes from city A to city B and ﬁve different routes from city B to city C. The cities and routes are depicted in Fig. 1.10, with the routes numbered from 1 to 8. We wish to count the number of different routes from A to C that pass through B. For example, one such route from Fig. 1.10 is 1 followed by 4, which we can denote (1, 4). Similarly, there are the routes (1, 5), (1, 6), . . . , (3, 8). It is not difﬁcult to see that the number of different routes 3 × 5 = 15. Example 1.7.1 is a special case of a common form of experiment.

Example 1.7.2

Experiment in Two Parts. Consider an experiment that has the following two characteristics: i. The experiment is performed in two parts. ii. The ﬁrst part of the experiment has m possible outcomes x1, . . . , xm, and, regardless of which one of these outcomes xi occurs, the second part of the experiment has n possible outcomes y1, . . . , yn. Each outcome in the sample space S of such an experiment will therefore be a pair having the form (xi , yj ), and S will be composed of the following pairs: (x1, y1)(x1, y2 ) . . . (x1, yn) (x2 , y1)(x2 , y2 ) . . . (x2 , yn) .. .. .. . . . (xm, y1)(xm, y2 ) . . . (xm, yn).

Since each of the m rows in the array in Example 1.7.2 contains n pairs, the following result follows directly. Theorem 1.7.1

Multiplication Rule for Two-Part Experiments. In an experiment of the type described in Example 1.7.2, the sample space S contains exactly mn outcomes. Figure 1.11 illustrates the multiplication rule for the case of n = 3 and m = 2 with a tree diagram. Each end-node of the tree represents an outcome, which is the pair consisting of the two parts whose names appear along the branch leading to the endnode.

Example 1.7.3

Rolling Two Dice. Suppose that two dice are rolled. Since there are six possible outcomes for each die, the number of possible outcomes for the experiment is 6 × 6 = 36, as we saw in Example 1.6.5. The multiplication rule can be extended to experiments with more than two parts.

1.7 Counting Methods

Figure 1.11 Tree diagram in which end-nodes represent outcomes.

y1

(x1, y1) y2

x1

y3 y1

x2

y2 y3

27

(x1, y2) (x1, y3) (x 2, y1) (x 2, y 2 ) (x 2, y 3)

Theorem 1.7.2

Multiplication Rule. Suppose that an experiment has k parts (k ≥ 2), that the ith part of the experiment can have ni possible outcomes (i = 1, . . . , k), and that all of the outcomes in each part can occur regardless of which speciﬁc outcomes have occurred in the other parts. Then the sample space S of the experiment will contain all vectors of the form (u1, . . . , uk ), where ui is one of the ni possible outcomes of part i (i = 1, . . . , k). The total number of these vectors in S will be equal to the product n1n2 . . . nk .

Example 1.7.4

Tossing Several Coins. Suppose that we toss six coins. Each outcome in S will consist of a sequence of six heads and tails, such as HTTHHH. Since there are two possible outcomes for each of the six coins, the total number of outcomes in S will be 26 = 64. If head and tail are considered equally likely for each coin, then S will be a simple sample space. Since there is only one outcome in S with six heads and no tails, the probability of obtaining heads on all six coins is 1/64. Since there are six outcomes in S with one head and ﬁve tails, the probability of obtaining exactly one head is 6/64 = 3/32.

Example 1.7.5

Combination Lock. A standard combination lock has a dial with tick marks for 40 numbers from 0 to 39. The combination consists of a sequence of three numbers that must be dialed in the correct order to open the lock. Each of the 40 numbers may appear in each of the three positions of the combination regardless of what the other two positions contain. It follows that there are 403 = 64,000 possible combinations. This number is supposed to be large enough to discourage would-be thieves from trying every combination. Note: The Multiplication Rule Is Slightly More General. In the statements of Theorems 1.7.1 and 1.7.2, it is assumed that each possible outcome in each part of the experiment can occur regardless of what occurs in the other parts of the experiment. Technically, all that is necessary is that the number of possible outcomes for each part of the experiment not depend on what occurs on the other parts. The discussion of permutations below is an example of this situation.

Permutations Example 1.7.6

Sampling without Replacement. Consider an experiment in which a card is selected and removed from a deck of n different cards, a second card is then selected and removed from the remaining n − 1 cards, and ﬁnally a third card is selected from the remaining n − 2 cards. Each outcome consists of the three cards in the order selected. A process of this kind is called sampling without replacement, since a card that is drawn is not replaced in the deck before the next card is selected. In this experiment, any one of the n cards could be selected ﬁrst. Once this card has been removed, any one of the other n − 1 cards could be selected second. Therefore, there are n(n − 1)

28

Chapter 1 Introduction to Probability

possible outcomes for the ﬁrst two selections. Finally, for every given outcome of the ﬁrst two selections, there are n − 2 other cards that could possibly be selected third. Therefore, the total number of possible outcomes for all three selections is n(n − 1)(n − 2). The situation in Example 1.7.6 can be generalized to any number of selections without replacement. Deﬁnition 1.7.1

Permutations. Suppose that a set has n elements. Suppose that an experiment consists of selecting k of the elements one at a time without replacement. Let each outcome consist of the k elements in the order selected. Each such outcome is called a permutation of n elements taken k at a time. We denote the number of distinct such permutations by the symbol Pn, k . By arguing as in Example 1.7.6, we can ﬁgure out how many different permutations there are of n elements taken k at a time. The proof of the following theorem is simply to extend the reasoning in Example 1.7.6 to selecting k cards without replacement. The proof is left to the reader.

Theorem 1.7.3

Number of Permutations. The number of permutations of n elements taken k at a time is Pn, k = n(n − 1) . . . (n − k + 1).

Example 1.7.7

Current Population Survey. Theorem 1.7.3 allows us to count the number of points in the sample space of Example 1.6.1. Each outcome in S consists of a permutation of n = 50,000 elements taken k = 3 at a time. Hence, the sample space S in that example consisits of 50,000 × 49,999 × 49,998 = 1.25 × 1014

outcomes.

When k = n, the number of possible permutations will be the number Pn, n of different permutations of all n cards. It is seen from the equation just derived that Pn, n = n(n − 1) . . . 1 = n! The symbol n! is read n factorial. In general, the number of permutations of n different items is n!. The expression for Pn, k can be rewritten in the following alternate form for k = 1, . . . , n − 1: n! (n − k)(n − k − 1) . . . 1 = . Pn, k = n(n − 1) . . . (n − k + 1) (n − k)(n − k − 1) . . . 1 (n − k)! Here and elsewhere in the theory of probability, it is convenient to deﬁne 0! by the relation 0! = 1. With this deﬁnition, it follows that the relation Pn, k = n!/(n − k)! will be correct for the value k = n as well as for the values k = 1, . . . , n − 1. To summarize: Theorem 1.7.4

Permutations. The number of distinct orderings of k items selected without replacement from a collection of n different items (0 ≤ k ≤ n) is Pn,k =

n! . (n − k)!

1.7 Counting Methods

29

Example 1.7.8

Choosing Ofﬁcers. Suppose that a club consists of 25 members and that a president and a secretary are to be chosen from the membership. We shall determine the total possible number of ways in which these two positions can be ﬁlled. Since the positions can be ﬁlled by ﬁrst choosing one of the 25 members to be president and then choosing one of the remaining 24 members to be secretary, the possible number of choices is P25, 2 = (25)(24) = 600.

Example 1.7.9

Arranging Books. Suppose that six different books are to be arranged on a shelf. The number of possible permutations of the books is 6! = 720.

Example 1.7.10

Sampling with Replacement. Consider a box that contains n balls numbered 1, . . . , n. First, one ball is selected at random from the box and its number is noted. This ball is then put back in the box and another ball is selected (it is possible that the same ball will be selected again). As many balls as desired can be selected in this way. This process is called sampling with replacement. It is assumed that each of the n balls is equally likely to be selected at each stage and that all selections are made independently of each other. Suppose that a total of k selections are to be made, where k is a given positive integer. Then the sample space S of this experiment will contain all vectors of the form (x1, . . . , xk ), where xi is the outcome of the ith selection (i = 1, . . . , k). Since there are n possible outcomes for each of the k selections, the total number of vectors in S is nk . Furthermore, from our assumptions it follows that S is a simple sample space. Hence, the probability assigned to each vector in S is 1/nk .

Example 1.7.11

Obtaining Different Numbers. For the experiment in Example 1.7.10, we shall determine the probability of the event E that each of the k balls that are selected will have a different number. If k > n, it is impossible for all the selected balls to have different numbers because there are only n different numbers. Suppose, therefore, that k ≤ n. The number of outcomes in the event E is the number of vectors for which all k components are different. This equals Pn, k , since the ﬁrst component x1 of each vector can have n possible values, the second component x2 can then have any one of the other n − 1 values, and so on. Since S is a simple sample space containing nk vectors, the probability p that k different numbers will be selected is p=

Pn, k nk

=

n! . (n − k)!nk

Note: Using Two Different Methods in the Same Problem. Example 1.7.11 illustrates a combination of techniques that might seem confusing at ﬁrst. The method used to count the number of outcomes in the sample space was based on sampling with replacement, since the experiment allows repeat numbers in each outcome. The method used to count the number of outcomes in the event E was permutations (sampling without replacement) because E consists of those outcomes without repeats. It often happens that one needs to use different methods to count the numbers of outcomes in different subsets of the sample space. The birthday problem, which follows, is another example in which we need more than one counting method in the same problem.

30

Chapter 1 Introduction to Probability

The Birthday Problem In the following problem, which is often called the birthday problem, it is required to determine the probability p that at least two people in a group of k people will have the same birthday, that is, will have been born on the same day of the same month but not necessarily in the same year. For the solution presented here, we assume that the birthdays of the k people are unrelated (in particular, we assume that twins are not present) and that each of the 365 days of the year is equally likely to be the birthday of any person in the group. In particular, we ignore the fact that the birth rate actually varies during the year and we assume that anyone actually born on February 29 will consider his birthday to be another day, such as March 1. When these assumptions are made, this problem becomes similar to the one in Example 1.7.11. Since there are 365 possible birthdays for each of k people, the sample space S will contain 365k outcomes, all of which will be equally probable. If k > 365, there are not enough birthdays for every one to be different, and hence at least two people must have the same birthday. So, we assume that k ≤ 365. Counting the number of outcomes in which at least two birthdays are the same is tedious. However, the number of outcomes in S for which all k birthdays will be different is P365, k , since the ﬁrst person’s birthday could be any one of the 365 days, the second person’s birthday could then be any of the other 364 days, and so on. Hence, the probability that all k persons will have different birthdays is P365, k 365k

.

The probability p that at least two of the people will have the same birthday is therefore p =1−

P365, k 365k

=1−

(365)! . (365 − k)!365k

Numerical values of this probability p for various values of k are given in Table 1.1. These probabilities may seem surprisingly large to anyone who has not thought about them before. Many persons would guess that in order to obtain a value of p greater than 1/2, the number of people in the group would have to be about 100. However, according to Table 1.1, there would have to be only 23 people in the group. As a matter of fact, for k = 100 the value of p is 0.9999997.

Table 1.1 The probability p that at least two people in a group of k people will have the same birthday

k

p

k

p

5

0.027

25

0.569

10

0.117

30

0.706

15

0.253

40

0.891

20

0.411

50

0.970

22

0.476

60

0.994

23

0.507

1.7 Counting Methods

31

The calculation in this example illustrates a common technique for solving probability problems. If one wishes to compute the probability of some event A, it might be more straightforward to calculate Pr(Ac ) and then use the fact that Pr(A) = 1 − Pr(Ac ). This idea is particularly useful when the event A is of the form “at least n things happen” where n is small compared to how many things could happen.

Stirling’s Formula For large values of n, it is nearly impossible to compute n!. For n ≥ 70, n! > 10100 and cannot be represented on many scientiﬁc calculators. In most cases for which n! is needed with a large value of n, one only needs the ratio of n! to another large number an. A common example of this is Pn,k with large n and not so large k, which equals n!/(n − k)!. In such cases, we can notice that n! = elog(n!)−log(an). an Compared to computing n!, it takes a much larger n before log(n!) becomes difﬁcult to represent. Furthermore, if we had a simple approximation sn to log(n!) such that limn→∞ |sn − log(n!)| = 0, then the ratio of n!/an to sn/an would be close to 1 for large n. The following result, whose proof can be found in Feller (1968), provides such an approximation. Theorem 1.7.5

Stirling’s Formula. Let sn =

1 1 log(2π ) + n + log(n) − n. 2 2

Then limn→∞ |sn − log(n!)| = 0. Put another way, lim

n→∞

Example 1.7.12

(2π )1/2 nn+1/2 e−n = 1. n!

Approximating the Number of Permutations. Suppose that we want to compute P70,20 = 70!/50!. The approximation from Stirling’s formula is 70! (2π )1/2 7070.5e−70 ≈ = 3.940 × 1035. 50! (2π )1/2 5050.5e−50 The exact calculation yields 3.938 × 1035. The approximation and the exact calculation differ by less than 1/10 of 1 percent.

Summary Suppose that the following conditions are met: .

Each element of a set consists of k distinguishable parts x1, . . . , xk .

.

There are n1 possibilities for the ﬁrst part x1.

.

For each i = 2, . . . , k and each combination (x1, . . . , xi−1) of the ﬁrst i − 1 parts, there are ni possibilities for the ith part xi .

Under these conditions, there are n1 . . . nk elements of the set. The third condition requires only that the number of possibilities for xi be ni no matter what the earlier

32

Chapter 1 Introduction to Probability

parts are. For example, for i = 2, it does not require that the same n2 possibilities be available for x2 regardless of what x1 is. It only requires that the number of possibilities for x2 be n2 no matter what x1 is. In this way, the general rule includes the multiplication rule, the calculation of permutations, and sampling with replacement as special cases. For permutations of m items k at a time, we have ni = m − i + 1 for i = 1, . . . , k, and the ni possibilities for part i are just the ni items that have not yet appeared in the ﬁrst i − 1 parts. For sampling with replacement from m items, we have ni = m for all i, and the m possibilities are the same for every part. In the next section, we shall consider how to count elements of sets in which the parts of each element are not distinguishable.

Exercises 1. Each year starts on one of the seven days (Sunday through Saturday). Each year is either a leap year (i.e., it includes February 29) or not. How many different calendars are possible for a year? 2. Three different classes contain 20, 18, and 25 students, respectively, and no student is a member of more than one class. If a team is to be composed of one student from each of these three classes, in how many different ways can the members of the team be chosen? 3. In how many different ways can the ﬁve letters a, b, c, d, and e be arranged? 4. If a man has six different sportshirts and four different pairs of slacks, how many different combinations can he wear? 5. If four dice are rolled, what is the probability that each of the four numbers that appear will be different? 6. If six dice are rolled, what is the probability that each of the six different numbers will appear exactly once? 7. If 12 balls are thrown at random into 20 boxes, what is the probability that no box will receive more than one ball?

8. An elevator in a building starts with ﬁve passengers and stops at seven ﬂoors. If every passenger is equally likely to get off at each ﬂoor and all the passengers leave independently of each other, what is the probability that no two passengers will get off at the same ﬂoor? 9. Suppose that three runners from team A and three runners from team B participate in a race. If all six runners have equal ability and there are no ties, what is the probability that the three runners from team A will ﬁnish ﬁrst, second, and third, and the three runners from team B will ﬁnish fourth, ﬁfth, and sixth? 10. A box contains 100 balls, of which r are red. Suppose that the balls are drawn from the box one at a time, at random, without replacement. Determine (a) the probability that the ﬁrst ball drawn will be red; (b) the probability that the 50th ball drawn will be red; and (c) the probability that the last ball drawn will be red. 11. Let n and k be positive integers such that both n and n − k are large. Use Stirling’s formula to write as simple an approximation as you can for Pn,k .

1.8 Combinatorial Methods Many problems of counting the number of outcomes in an event amount to counting how many subsets of a certain size are contained in a ﬁxed set. This section gives examples of how to do such counting and where it can arise.

Combinations Example 1.8.1

Choosing Subsets. Consider the set {a, b, c, d} containing the four different letters. We want to count the number of distinct subsets of size two. In this case, we can list all of the subsets of size two: {a, b},

{a, c},

{a, d},

{b, c},

{b, d},

and

{c, d}.

1.8 Combinatorial Methods

33

We see that there are six distinct subsets of size two. This is different from counting permutaions because {a, b} and {b, a} are the same subset. For large sets, it would be tedious, if not impossible, to enumerate all of the subsets of a given size and count them as we did in Example 1.8.1. However, there is a connection between counting subsets and counting permutations that will allow us to derive the general formula for the number of subsets. Suppose that there is a set of n distinct elements from which it is desired to choose a subset containing k elements (1 ≤ k ≤ n). We shall determine the number of different subsets that can be chosen. In this problem, the arrangement of the elements in a subset is irrelevant and each subset is treated as a unit. Deﬁnition 1.8.1

Combinations. Consider a set with n elements. Each subset of size k chosen from this set is called a combination of n elements taken k at a time. We denote the number of distinct such combinations by the symbol Cn, k . No two combinations will consist of exactly the same elements because two subsets with the same elements are the same subset. At the end of Example 1.8.1, we noted that two different permutations (a, b) and (b, a) both correspond to the same combination or subset {a, b}. We can think of permutations as being constructed in two steps. First, a combination of k elements is chosen out of n, and second, those k elements are arranged in a speciﬁc order. There are Cn, k ways to choose the k elements out of n, and for each such choice there are k! ways to arrange those k elements in different orders. Using the multiplication rule from Sec. 1.7, we see that the number of permutations of n elements taken k at a time is Pn, k = Cn, k k!; hence, we have the following.

Theorem 1.8.1

Combinations. The number of distinct subsets of size k that can be chosen from a set of size n is Pn, k n! = . Cn, k = k! k!(n − k)! In Example 1.8.1, we see that C4,2 = 4!/[2!2!] = 6.

Example 1.8.2

Selecting a Committee. Suppose that a committee composed of eight people is to be selected from a group of 20 people. The number of different groups of people that might be on the committee is C20,8 =

Example 1.8.3

20! = 125,970. 8!12!

Choosing Jobs. Suppose that, in Example 1.8.2, the eight people in the committee each get a different job to perform on the committee. The number of ways to choose eight people out of 20 and assign them to the eight different jobs is the number of permutations of 20 elements taken eight at a time, or P20,8 = C20,8 × 8! = 125,970 × 8! = 5,078,110,400.

Examples 1.8.2 and 1.8.3 illustrate the difference and relationship between combinations and permutations. In Example 1.8.3, we count the same group of people in a different order as a different outcome, while in Example 1.8.2, we count the same group in different orders as the same outcome. The two numerical values differ by a factor of 8!, the number of ways to reorder each of the combinations in Example 1.8.2 to get a permutation in Example 1.8.3.

34

Chapter 1 Introduction to Probability

Binomial Coefﬁcients Deﬁnition 1.8.2

Binomial Coefﬁcients. The number Cn, k is also denoted by the symbol k = 0, 1, . . . , n,

n n! . = k!(n − k)! k

n k

. That is, for

(1.8.1)

When this notation is used, this number is called a binomial coefﬁcient. The name binomial coefﬁcient derives from the appearance of the symbol in the binomial theorem, whose proof is left as Exercise 20 in this section. Theorem 1.8.2

Binomial Theorem. For all numbers x and y and each positive integer n, n n k n−k (x + y)n = x y . k k=0 There are a couple of useful relations between binomial coefﬁcients.

Theorem 1.8.3

For all n,

For all n and all k = 0, 1, . . . , n,

n n = 1. = n 0

n n . = n−k k

Proof The ﬁrst equation follows from the fact that 0! = 1. The second equation follows from Eq. (1.8.1). The second equation can also be derived from the fact that selecting k elements to form a subset is equivalent to selecting the remaining n − k elements to form the complement of the subset. It is sometimes convenient to use the expression “n choose k” for the value of C n, k n . Thus, the same quantity is represented by the two different notations Cn, k and k , and we may refer to this quantity in three different ways: as the number of combinations of n elements taken k at a time, as the binomial coefﬁcient of n and k, or simply as “n choose k.” Example 1.8.4

Blood Types. In Example 1.6.4 on page 23, we deﬁned genes, alleles, and genotypes. The gene for human blood type consists of a pair of alleles chosen from the three alleles commonly called O, A, and B. For example, two possible combinations of alleles (called genotypes) to form a blood-type gene would be BB and AO. We will not distinguish the same two alleles in different orders, so OA represents the same genotype as AO. How many genotypes are there for blood type? The answer could easily be found by counting, but it is an example of a more general calculation. Suppose that a gene consists of a pair chosen from a set of n different alleles. Assuming that we cannot distinguish the same pair in different orders, there are n pairs where both alleles are the same, and there are n2 pairs where the two alleles are different. The total number of genotypes is

n n(n − 1) n(n + 1) n+1 n+ =n+ = = . 2 2 2 2

1.8 Combinatorial Methods

35

For the case of blood type, we have n = 3, so there are

4 4×3 = =6 2 2 genotypes, as could easily be veriﬁed by counting.

Note: Sampling with Replacement. The counting method described in Example 1.8.4 is a type of sampling with replacement that is different from the type described in Example 1.7.10. In Example 1.7.10, we sampled with replacement, but we distinguished between samples having the same balls in different orders. This could be called ordered sampling with replacement. In Example 1.8.4, samples containing the same genes in different orders were considered the same outcome. This could be called unordered sampling with replacement. The general formula for the , number of unordered samples of size k with replacement from n elements is n+k−1 k and can be derived in Exercise 19. It is possible to have k larger than n when sampling with replacement. Example 1.8.5

Selecting Baked Goods. You go to a bakery to select some baked goods for a dinner party. You need to choose a total of 12 items. The baker has seven different types of items from which to choose, with lots of each type available. How many different boxfuls of 12 items are possible for you to choose? Here we will not distinguish the same collection of 12 items arranged in different orders in the box. This is an example of unordered sampling with replacement because we can (indeed we must) choose the same type of item more than once, but we are not distinguishing the same items = 18,564 different boxfuls. in different orders. There are 7+12−1 12 Example 1.8.5 raises an issue that can cause confusion if one does not carefully determine the elements of the sample space and carefully specify which outcomes (if any) are equally likely. The next example illustrates the issue in the context of Example 1.8.5.

Example 1.8.6

Selecting Baked Goods. Imagine two different ways of choosing a boxful of 12 baked goods selected from the seven different types available. In the ﬁrst method, you choose one item at random from the seven available. Then, without regard to what item was chosen ﬁrst, you choose the second item at random from the seven available. Then you continue in this way choosing the next item at random from the seven available without regard to what has already been chosen until you have chosen 12. For this method of choosing, it is natural to let the outcomes be the possible sequences of the 12 types of items chosen. The sample space would contain 712 = 1.38 × 1010 different outcomes that would be equally likely. In the second method of choosing, the baker tells you that she has available 18,564 different boxfuls freshly packed. You then select one at random. In this case, the sample space would consist of 18,564 different equally likely outcomes. In spite of the different sample spaces that arise in the two methods of choosing, there are some verbal descriptions that identify an event in both sample spaces. For example, both sample spaces contain an event that could be described as {all 12 items are of the same type} even though the outcomes are different types of mathematical objects in the two sample spaces. The probability that all 12 items are of the same type will actually be different depending on which method you use to choose the boxful. In the ﬁrst method, seven of the 712 equally likely outcomes contain 12 of the same type of item. Hence, the probability that all 12 items are of the same type is

36

Chapter 1 Introduction to Probability

7/712 = 5.06 × 10−10. In the second method, there are seven equally liklely boxes that contain 12 of the same type of item. Hence, the probability that all 12 items are of the same type is 7/18,564 = 3.77 × 10−4. Before one can compute the probability for an event such as {all 12 items are of the same type}, one must be careful about deﬁning the experiment and its outcomes.

Arrangements of Elements of Two Distinct Types When a set contains only elements of two distinct types, a binomial coefﬁcient can be used to represent the number of different arrangements of all the elements in the set. Suppose, for example, that k similar red balls and n − k similar green balls are to be arranged in a row. Since the red balls will occupy k positions in the row, each different arrangement of the n balls corresponds to a different choice of the k positions occupied by the red balls. Hence, the number of different arrangements of the n balls will be equal to the number of different ways in which k positions can be selected for the red balls from the n available positions. Since this number of ways is speciﬁed by the binomial coefﬁcient nk , the number of different arrangements of the n balls is also nk . In other words, the number of different arrangements of n objects consisting of k similar objects of one type and n − k similar objects of a second type is nk . Example 1.8.7

Tossing a Coin. Suppose that a fair coin is to be tossed 10 times, and it is desired to determine (a) the probability p of obtaining exactly three heads and (b) the probability p of obtaining three or fewer heads. (a) The total possible number of different sequences of 10 heads and tails is 210, and it may be assumed that each of these sequences is equally probable. The number of these sequences that contain exactly three heads will be equal to the number of different arrangements that can be formed with three heads and seven tails. Here are some of those arrangements: HHHTTTTTTT, HHTHTTTTTT, HHTTHTTTTT, TTHTHTHTTT, etc. Each such arrangement is equivalent to a choice of where to put the 3 heads among the 10 tosses, so there are 10 3 such arrangements. The probability of obtaining exactly three heads is then 10 3 p = 10 = 0.1172. 2 (b) Using the same reasoning as in part (a), the number of sequences in the sample space that contain exactly k heads (k = 0, 1, 2, 3) is 10 k . Hence, the probability of obtaining three or fewer heads is 10 + 10 + 10 + 10 3 2 1 0 p = 210 1 + 10 + 45 + 120 176 = = 10 = 0.1719. 210 2

Note: Using Two Different Methods in the Same Problem. Part (a) of Example 1.8.7 is another example of using two different counting methods in the same problem. Part (b) illustrates another general technique. In this part, we broke the event of interest into several disjoint subsets and counted the numbers of outcomes separately for each subset and then added the counts together to get the total. In many problems, it can require several applications of the same or different counting

1.8 Combinatorial Methods

37

methods in order to count the number of outcomes in an event. The next example is one in which the elements of an event are formed in two parts (multiplication rule), but we need to perform separate combination calculations to determine the numbers of outcomes for each part. Example 1.8.8

Sampling without Replacement. Suppose that a class contains 15 boys and 30 girls, and that 10 students are to be selected at random for a special assignment. We shall determine the probability p that exactly three boys will be selected. The number of different combinations of the 45 students that might be obtained 45 , and the statement that the 10 students are selected in the sample of 10 students is 10 45 at random means that each of these 10 possible combinations is equally probable. Therefore, we must ﬁnd the number of these combinations that contain exactly three boys and seven girls. When a combination of three boys and seven girls is formed, the number of different combinations in which three boys can be selected from the 15 available boys combinations in which seven girls can be selected is 15 3 , and the number of different from the 30 available girls is 30 . Since each of these combinations of three boys 7 can be paired with each of the combinations of seven girls to form a distinct sample, 30 the number of combinations containing exactly three boys is 15 3 7 . Therefore, the desired probability is 15 30 7 3 = 0.2904. p= 45 10

Example 1.8.9

Playing Cards. Suppose that a deck of 52 cards containing four aces is shufﬂed thoroughly and the cards are then distributed among four players so that each player receives 13 cards. We shall determine the probability that each player will receive one ace. The number of possible different combinations of the four positions in the deck 52 occupied by the four aces is 52 4 , and it may be assumed that each of these 4 combinations is equally probable. If each player is to receive one ace, then there must be exactly one ace among the 13 cards that the ﬁrst player will receive and one ace among each of the remaining three groups of 13 cards that the other three players will receive. In other words, there are 13 possible positions for the ace that the ﬁrst player is to receive, 13 other possible positions for the ace that the second player is to receive, and so on. Therefore, among the 52 4 possible combinations of the positions for the four aces, exactly 134 of these combinations will lead to the desired result. Hence, the probability p that each player will receive one ace is 134 p = = 0.1055. 52 4

Ordered versus Unordered Samples Several of the examples in this section and the previous section involved counting the numbers of possible samples that could arise using various sampling schemes. Sometimes we treated the same collection of elements in different orders as different samples, and sometimes we treated the same elements in different orders as the same sample. In general, how can one tell which is the correct way to count in a given problem? Sometimes, the problem description will make it clear which is needed. For example, if we are asked to ﬁnd the probability

38

Chapter 1 Introduction to Probability

that the items in a sample arrive in a speciﬁed order, then we cannot even specify the event of interest unless we treat different arrangements of the same items as different outcomes. Examples 1.8.5 and 1.8.6 illustrate how different problem descriptions can lead to very different calculations. However, there are cases in which the problem description does not make it clear whether or not one must count the same elements in different orders as different outcomes. Indeed, there are some problems that can be solved correctly both ways. Example 1.8.9 is one such problem. In that problem, we needed to decide what we would call an outcome, and then we needed to count how many outcomes were in the whole sample space S and how many were in the event E of interest. In the solution presented in Example 1.8.9, we chose as our outcomes the positions in the 52-card deck that were occupied by the four aces. We did not count different arrangements of the four aces in those four positions as different outcomes when we counted the number of outcomes in S. Hence, when we calculated the number of outcomes in E, we also did not count the different arrangements of the four aces in the four possible positions as different outcomes. In general, this is the principle that should guide the choice of counting method. If we have the choice between whether or not to count the same elements in different orders as different outcomes, then we need to make our choice and be consistent throughout the problem. If we count the same elements in different orders as different outcomes when counting the outcomes in S, we must do the same when counting the elements of E. If we do not count them as different outcomes when counting S, we should not count them as different when counting E. Example 1.8.10

Playing Cards, Revisited. We shall solve the problem in Example 1.8.9 again, but this time, we shall distinguish outcomes with the same cards in different orders. To go to the extreme, let each outcome be a complete ordering of the 52 cards. So, there are 52! possible outcomes. How many of these have one ace in each of the four sets of 13 cards received by the four players? As before, there are 134 ways to choose the four positions for the four aces, one among each of the four sets of 13 cards. No matter which of these sets of positions we choose, there are 4! ways to arrange the four aces in these four positions. No matter how the aces are arranged, there are 48! ways to arrange the remaining 48 cards in the 48 remaining positions. So, there are 134 × 4! × 48! outcomes in the event of interest. We then calculate p=

134 × 4! × 48! = 0.1055. 52!

In the following example, whether one counts the same items in different orders as different outcomes is allowed to depend on which events one wishes to use. Example 1.8.11

Lottery Tickets. In a lottery game, six numbers from 1 to 30 are drawn at random from a bin without replacement, and each player buys a ticket with six different numbers from 1 to 30. If all six numbers drawn match those on the player’s ticket, the player wins. We assume that all possible draws are equally likely. One way to construct a sample space for the experiment of drawing the winning combination is to consider the possible sequences of draws. That is, each outcome consists of an ordered subset of six numbers chosen from the 30 available numbers. There are P30,6 = 30!/24! such outcomes. With this sample space S, we can calculate probabilities for events such as A = {the draw contains the numbers 1, 14, 15, 20, 23, and 27}, B = {one of the numbers drawn is 15}, and C = {the ﬁrst number drawn is less than 10}.

1.8 Combinatorial Methods

39

There is another natural sample space, which we shall denote S , for this experiment. It consists solely of the different combinations of six numbers drawn from the 30 available. There are 30 6 = 30!/(6!24!) such outcomes. It also seems natural to consider all of these outcomes equally likely. With this sample space, we can calculate the probabilities of the events A and B above, but C is not a subset of the sample space S , so we cannot calculate its probability using this smaller sample space. When the sample space for an experiment could naturally be constructed in more than one way, one needs to choose based on for which events one wants to compute probabilities. Example 1.8.11 raises the question of whether one will compute the same probabilities using two different sample spaces when the event, such as A or B, exists in both sample spaces. In the example, each outcome in the smaller sample space S corresponds to an event in the larger sample space S. Indeed, each outcome s in S corresponds to the event in S containing the 6! permutations of the single combination s . For example, the event A in the example has only one outcome s = (1, 14, 15, 20, 23, 27) in the sample space S , while the corresponding event in the sample space S has 6! permutations including (1, 14, 15, 20, 23, 27), (14, 20, 27, 15, 23, 1), (27, 23, 20, 15, 14, 1), etc. In the sample space S, the probability of the event A is Pr(A) =

1 6!24! 6! = 30 . = P30,6 30! 6

In the sample space S , the event A has this same probability because it has only one of the 30 6 equally likely outcomes. The same reasoning applies to every outcome in S . Hence, if the same event can be expressed in both sample spaces S and S , we will compute the same probability using either sample space. This is a special feature of examples like Example 1.8.11 in which each outcome in the smaller sample space corresponds to an event in the larger sample space with the same number of elements. There are examples in which this feature is not present, and one cannot treat both sample spaces as simple sample spaces. Example 1.8.12

Tossing Coins. An experiment consists of tossing a coin two times. If we want to distinguish H followed by T from T followed by H, we should use the sample space S = {H H, H T , T H, T T }, which might naturally be assumed a simple sample space. On the other hand, we might be interested solely in the number of H’s tossed. In this case, we might consider the smaller sample space S = {0, 1, 2} where each outcome merely counts the number of H’s. The outcomes 0 and 2 in S each correspond to a single outcome in S, but 1 ∈ S corresponds to the event {H T , T H } ⊂ S with two outcomes. If we think of S as a simple sample space, then S will not be a simple sample space, because the outcome 1 will have probability 1/2 while the other two outcomes each have probability 1/4. There are situations in which one would be justiﬁed in treating S as a simple sample space and assigning each of its outcomes probability 1/3. One might do this if one believed that the coin was not fair, but one had no idea how unfair it was or which side were more likely to land up. In this case, S would not be a simple sample space, because two of its outcomes would have probability 1/3 and the other two would have probabilities that add up to 1/3.

40

Chapter 1 Introduction to Probability

Example 1.8.6 is another case of two different sample spaces in which each outcome in one sample space corresponds to a different number of outcomes in the other space. See Exercise 12 in Sec. 1.9 for a more complete analysis of Example 1.8.6.

The Tennis Tournament We shall now present a difﬁcult problem that has a simple and elegant solution. Suppose that n tennis players are entered in a tournament. In the ﬁrst round, the players are paired one against another at random. The loser in each pair is eliminated from the tournament, and the winner in each pair continues into the second round. If the number of players n is odd, then one player is chosen at random before the pairings are made for the ﬁrst round, and that player automatically continues into the second round. All the players in the second round are then paired at random. Again, the loser in each pair is eliminated, and the winner in each pair continues into the third round. If the number of players in the second round is odd, then one of these players is chosen at random before the others are paired, and that player automatically continues into the third round. The tournament continues in this way until only two players remain in the ﬁnal round. They then play against each other, and the winner of this match is the winner of the tournament. We shall assume that all n players have equal ability, and we shall determine the probability p that two speciﬁc players A and B will ever play against each other during the tournament. We shall ﬁrst determine the total number of matches that will be played during the tournament. After each match has been played, one player—the loser of that match—is eliminated from the tournament. The tournament ends when everyone has been eliminated from the tournament except the winner of the ﬁnal match. Since exactly n − 1 players must be eliminated, it follows that exactly n − 1 matches must be played during the tournament. The number of possible pairs of players is n2 . Each of the two players in every match is equally likely to win that match, and all initial pairings are made in a random manner. Therefore, before the tournament begins, every possible pair of players is equally likely to appear in each particular one of the n − 1 matches to be played during the tournament. Accordingly, the probability that players A and B will meet in some particular match that is speciﬁed in advance is 1/ n2 . If A and B do meet in that particular match, one of them will lose and be eliminated. Therefore, these same two players cannot meet in more than one match. It follows from the preceding explanation that the probability p that players A and B will meet at some time during the tournament is equal to the product of the probability 1/ n2 that they will meet in any particular speciﬁed match and the total number n − 1 of different matches in which they might possibly meet. Hence, n−1 2 p= = . n n 2

Summary We showed that the number of size k subsets of a set of size n is nk = n!/[k!(n − k)!]. This turns out to be the number of possible samples of size k drawn without replacement from a population of size n as well as the number of arrangements of n items of two types with k of one type and n − k of the other type. We also saw several

1.8 Combinatorial Methods

41

examples in which more than one counting technique was required at different points in the same problem. Sometimes, more than one technique is required to count the elements of a single set.

Exercises 1. Two pollsters will canvas a neighborhood with 20 houses. Each pollster will visit 10 of the houses. How many different assignments of pollsters to houses are possible? 93 or 2. Which of the following two numbers is larger: 30 93 31 ? 93 or 3. Which of the following two numbers is larger: 30 93 ? 63 4. A box contains 24 light bulbs, of which four are defective. If a person selects four bulbs from the box at random, without replacement, what is the probability that all four bulbs will be defective? 5. Prove that the following number is an integer: 4155 × 4156 × . . . × 4250 × 4251 . 2 × 3 × . . . × 96 × 97 6. Suppose that n people are seated in a random manner in a row of n theater seats. What is the probability that two particular people A and B will be seated next to each other? 7. If k people are seated in a random manner in a row containing n seats (n > k), what is the probability that the people will occupy k adjacent seats in the row? 8. If k people are seated in a random manner in a circle containing n chairs (n > k), what is the probability that the people will occupy k adjacent chairs in the circle? 9. If n people are seated in a random manner in a row containing 2n seats, what is the probability that no two people will occupy adjacent seats? 10. A box contains 24 light bulbs, of which two are defective. If a person selects 10 bulbs at random, without replacement, what is the probability that both defective bulbs will be selected? 11. Suppose that a committee of 12 people is selected in a random manner from a group of 100 people. Determine the probability that two particular people A and B will both be selected. 12. Suppose that 35 people are divided in a random manner into two teams in such a way that one team contains 10 people and the other team contains 25 people. What is the probability that two particular people A and B will be on the same team?

13. A box contains 24 light bulbs of which four are defective. If one person selects 10 bulbs from the box in a random manner, and a second person then takes the remaining 14 bulbs, what is the probability that all four defective bulbs will be obtained by the same person? 14. Prove that, for all positive integers n and k (n ≥ k),

n n n+1 . + = k k k−1 15. a. Prove that

n n n n . . . + + + + = 2n . 0 1 2 n b. Prove that

n n n n n = 0. − + − + . . . + (−1)n n 0 1 2 3 Hint: Use the binomial theorem. 16. The United States Senate contains two senators from each of the 50 states. (a) If a committee of eight senators is selected at random, what is the probability that it will contain at least one of the two senators from a certain speciﬁed state? (b) What is the probability that a group of 50 senators selected at random will contain one senator from each state? 17. A deck of 52 cards contains four aces. If the cards are shufﬂed and distributed in a random manner to four players so that each player receives 13 cards, what is the probability that all four aces will be received by the same player? 18. Suppose that 100 mathematics students are divided into ﬁve classes, each containing 20 students, and that awards are to be given to 10 of these students. If each student is equally likely to receive an award, what is the probability that exactly two students in each class will receive awards? 19. A restaurant has n items on its menu. During a particular day, k customers will arrive and each one will choose one item. The manager wants to count how many different collections of customer choices are possible without regard to the order in which the choices are made. (For example, if k = 3 and a1, . . . , an are the menu items,

42

Chapter 1 Introduction to Probability

then a1a3a1 is not distinguished from a1a1a3.) Prove that the number of different collections of customer choices is n+k−1 . Hint: Assume that the menu items are a1, . . . , an. k Show that each collection of customer choices, arranged with the a1’s ﬁrst, the a2 ’s second, etc., can be identiﬁed with a sequence of k zeros and n − 1 ones, where each 0 stands for a customer choice and each 1 indicates a point in the sequence where the menu item number increases by 1. For example, if k = 3 and n = 5, then a1a1a3 becomes 0011011. 20. Prove the binomial theorem 1.8.2. Hint: You may use an induction argument. That is, ﬁrst prove that the result is true if n = 1. Then, under the assumption that there is

n0 such that the result is true for all n ≤ n0 , prove that it is also true for n = n0 + 1. 21. Return to the birthday problem on page 30. How many different sets of birthdays are available with k people and 365 days when we don’t distinguish the same birthdays in different orders? For example, if k = 3, we would count (Jan. 1, Mar. 3, Jan.1) the same as (Jan. 1, Jan. 1, Mar. 3). 22. Let n be a large even integer. Use Stirlings’ formula (Theorem 1.7.5) n to ﬁnd an approximation to the binomial . Compute the approximation with n = coefﬁcient n/2 500.

1.9 Multinomial Coefﬁcients We learn how to count the number of ways to partition a ﬁnite set into more than two disjoint subsets. This generalizes the binomial coefﬁcients from Sec. 1.8. The generalization is useful when outcomes consist of several parts selected from a ﬁxed number of distinct types. We begin with a fairly simple example that will illustrate the general ideas of this section. Example 1.9.1

Choosing Committees. Suppose that 20 members of an organization are to be divided into three committees A, B, and C in such a way that each of the committees A and B is to have eight members and committee C is to have four members. We shall determine the number of different ways in which members can be assigned to these committees. Notice that each of the 20 members gets assigned to one and only one committee. One way to think of the assignments is to form committee A ﬁrst by choosing its eight members and then split the remaining 12 members into committees B and C. Each of these operations is choosing a combination, and every choice of committee A can be paired with every one of the splits of the remaining 12 members into committees B and C. Hence, the number of assignments into three committees is the product of the numbers of combinations for the two parts of the assignment. Speciﬁcally, to form committee A, we must choose eight out of 20 members, and this can be done in 20 Then to split the remaining 12 members into committees B 8 ways. 12 and C there are are 8 ways to do it. Here, the answer is

20! 12! 20! 20 12 = = = 62,355,150. 8 8!12! 8!4! 8!8!4! 8 Notice how the 12! that appears in the denominator of 20 8 divides out with the 12! 12 that appears in the numerator of 8 . This fact is the key to the general formula that we shall derive next. In general, suppose that n distinct elements are to be divided into k different groups (k ≥ 2) in such a way that, for j = 1, . . . , k, the j th group contains exactly nj elements, where n1 + n2 + . . . + nk = n. It is desired to determine the number of different ways in which the n elements can be divided into the k groups. The

1.9 Multinomial Coefﬁcients

43

n1 elements in the ﬁrst group can be selected from the n available elements in nn 1 different ways. After the n1 elements in the ﬁrst group have been selected, the n2 elements in the second group can be selected from the remaining n − n1 elements 1 different ways. Hence, the total number of different ways of selecting the in n−n n2 1 elements for both the ﬁrst group and the second group is nn n−n n2 . After the n1 + n2 1 elements in the ﬁrst two groups have been selected, the number of different ways in which the n3 elements in the third group can be selected is n−nn1−n2 . Hence, the total 3 number of different ways of selecting the elements for the ﬁrst three groups is

n n − n1 n − n1 − n2 . n1 n2 n3 It follows from the preceding explanation that, for each j = 1, . . . , k − 2 after the ﬁrst j groups have been formed, the number of different ways in which the nj +1 elements in thenext group (j + 1) can be selected from the remaining n − n1 − . . . − ... nj elements is n−n1n− −nj . After the elements of group k − 1 have been selected, j +1 the remaining nk elements must then form the last group. Hence, the total number of different ways of dividing the n elements into the k groups is

n − n1 n − n1 − n2 . . . n − n1 − . . . − nk−2 n n! , = n1 n2 n3 nk−1 n1!n2 ! . . . nk ! where the last formula follows from writing the binomial coefﬁcients in terms of factorials. Deﬁnition 1.9.1

Multinomial Coefﬁcients. The number

n n! , , which we shall denote by n1, n2 , . . . , nk n1!n2 ! . . . nk ! is called a multinomial coefﬁcient. The name multinomial coefﬁcient derives from the appearance of the symbol in the multinomial theorem, whose proof is left as Exercise 11 in this section.

Theorem 1.9.1

Multinomial Theorem. For all numbers x1, . . . , xk and each positive integer n, n n n n (x1 + . . . + xk )n = x 1x 2 . . . xk k , n1, n2 , . . . , nk 1 2 where the summation extends over all possible combinations of nonnegative integers n1, . . . , nk such that n1 + n2 + . . . + nk = n. A multinomial coefﬁcient is a generalization of the binomial coefﬁcient discussed in Sec. 1.8. For k = 2, the multinomial theorem is the same as the binomial theorem, and the multinomial coefﬁcient becomes a binomial coefﬁcient. In particular,

n n . = k k, n − k

Example 1.9.2

Choosing Committees. In Example 1.9.1, we see that the solution obtained there is the same as the multinomial coefﬁcient for which n = 20, k = 3, n1 = n2 = 8, and n3 = 4, namely,

20 20! = 62,355,150. = (8!)2 4! 8, 8, 4

44

Chapter 1 Introduction to Probability

Arrangements of Elements of More Than Two Distinct Types Just as binomial coefﬁcients can be used to represent the number of different arrangements of the elements of a set containing elements of only two distinct types, multinomial coefﬁcients can be used to represent the number of different arrangements of the elements of a set containing elements of k different types (k ≥ 2). Suppose, for example, that n balls of k different colors are to be arranged in a row and that there are nj balls of color j (j = 1, . . . , k), where n1 + n2 + . . . + nk = n. Then each different arrangement of the n balls corresponds to a different way of dividing the n available positions in the row into a group of n1 positions to be occupied by the balls of color 1, a second group of n2 positions to be occupied by the balls of color 2, and so on. Hence, the total number of different possible arrangements of the n balls must be

n n! = . n1, n2 , . . . , nk n1!n2 ! . . . nk ! Example 1.9.3

Rolling Dice. Suppose that 12 dice are to be rolled. We shall determine the probability p that each of the six different numbers will appear twice. Each outcome in the sample space S can be regarded as an ordered sequence of 12 numbers, where the ith number in the sequence is the outcome of the ith roll. Hence, there will be 612 possible outcomes in S, and each of these outcomes can be regarded as equally probable. The number of these outcomes that would contain each of the six numbers 1, 2, . . . , 6 exactly twice will be equal to the number of different possible arrangements of these 12 elements. This number can be determined by evaluating the multinomial coefﬁcient for which n = 12, k = 6, and n1 = n2 = . . . = n6 = 2. Hence, the number of such outcomes is

12 12! = , 2, 2, 2, 2, 2, 2 (2!)6 and the required probability p is p=

Example 1.9.4

12! = 0.0034. 26612

Playing Cards. A deck of 52 cards contains 13 hearts. Suppose that the cards are shufﬂed and distributed among four players A, B, C, and D so that each player receives 13 cards. We shall determine the probability p that player A will receive six hearts, player B will receive four hearts, player C will receive two hearts, and player D will receive one heart. The total number N of different ways in which the 52 cards can be distributed among the four players so that each player receives 13 cards is

52 52! . N= = (13!)4 13, 13, 13, 13 It may be assumed that each of these ways is equally probable. We must now calculate the number M of ways of distributing the cards so that each player receives the required number of hearts. The number of different ways in which the hearts can be distributed to players A, B, C, and D so that the numbers of hearts they receive are 6, 4, 2, and 1, respectively, is

13 13! = . 6, 4, 2, 1 6!4!2!1!

1.9 Multinomial Coefﬁcients

45

Also, the number of different ways in which the other 39 cards can then be distributed to the four players so that each will have a total of 13 cards is

39 39! = . 7, 9, 11, 12 7!9!11!12! Therefore, M=

39! 13! . , 6!4!2!1! 7!9!11!12!

and the required probability p is 13!39!(13!)4 M = = 0.00196. N 6!4!2!1!7!9!11!12!52! There is another approach to this problem along the lines indicated in Example 1.8.9 on page 37. The number of possible different combinations of the 13 posi tions in the deck occupied by the hearts is 52 13 . If player A is to receive six hearts, 13 there are 6 possible combinations of the six positions these hearts occupy among the 13 cards that A will receive. Similarly, if player B is to receive four hearts, there are 13 of their positions among the 13 cards that B will re4 possible combinations 13 ceive. There are 2 possible combinations for player C, and there are 13 1 possible combinations for player D. Hence, 13 13 13 13 1 2 4 6 , p= 52 13 p=

which produces the same value as the one obtained by the ﬁrst method of solution.

Summary n Multinomial coefﬁcients generalize binomial coefﬁcients. The coefﬁcient n ,..., nk is 1 the number of ways to partition a set of n items into distinguishable subsets of sizes n1, . . . , nk where n1 + . . . + nk = n. It is also the number of arrangements of n items of k different types for which ni are of type i for i = 1, . . . , k. Example 1.9.4 illustrates another important point to remember about computing probabilities: There might be more than one correct method for computing the same probability.

Exercises 1. Three pollsters will canvas a neighborhood with 21 houses. Each pollster will visit seven of the houses. How many different assignments of pollsters to houses are possible? 2. Suppose that 18 red beads, 12 yellow beads, eight blue beads, and 12 black beads are to be strung in a row. How many different arrangements of the colors can be formed? 3. Suppose that two committees are to be formed in an organization that has 300 members. If one committee is

to have ﬁve members and the other committee is to have eight members, in how many different ways can these committees be selected? 4. If the letters s, s, s, t, t, t, i, i, a, c are arranged in a random order, what is the probability that they will spell the word “statistics”? 5. Suppose that n balanced dice are rolled. Determine the probability that the number j will appear exactly nj times (j = 1, . . . , 6), where n1 + n2 + . . . + n6 = n.

46

Chapter 1 Introduction to Probability

6. If seven balanced dice are rolled, what is the probability that each of the six different numbers will appear at least once? 7. Suppose that a deck of 25 cards contains 12 red cards. Suppose also that the 25 cards are distributed in a random manner to three players A, B, and C in such a way that player A receives 10 cards, player B receives eight cards, and player C receives seven cards. Determine the probability that player A will receive six red cards, player B will receive two red cards, and player C will receive four red cards. 8. A deck of 52 cards contains 12 picture cards. If the 52 cards are distributed in a random manner among four players in such a way that each player receives 13 cards, what is the probability that each player will receive three picture cards? 9. Suppose that a deck of 52 cards contains 13 red cards, 13 yellow cards, 13 blue cards, and 13 green cards. If the 52 cards are distributed in a random manner among four players in such a way that each player receives 13 cards, what is the probability that each player will receive 13 cards of the same color?

10. Suppose that two boys named Davis, three boys named Jones, and four boys named Smith are seated at random in a row containing nine seats. What is the probability that the Davis boys will occupy the ﬁrst two seats in the row, the Jones boys will occupy the next three seats, and the Smith boys will occupy the last four seats? 11. Prove the multinomial theorem 1.9.1. (You may wish to use the same hint as in Exercise 20 in Sec. 1.8.) 12. Return to Example 1.8.6. Let S be the larger sample space (ﬁrst method of choosing) and let S be the smaller sample space (second method). For each element s of S , let N(s ) stand for the number of elements of S that lead to the same boxful s when the order of choosing is ignored. a. For each s ∈ S , ﬁnd a formula for N(s ). Hint: Let ni stand for the number of items of type i in s for i = 1, . . . , 7. b. Verify that s ∈S N(s ) equals the number of outcomes in S.

1.10 The Probability of a Union of Events The axioms of probability tell us directly how to ﬁnd the probability of the union of disjoint events. Theorem 1.5.7 showed how to ﬁnd the probability for the union of two arbitrary events. This theorem is generalized to the union of an arbitrary ﬁnite collection of events. We shall now consider again an arbitrary sample space S that may contain either a ﬁnite number of outcomes or an inﬁnite number, and we shall develop some further general properties of the various probabilities that might be speciﬁed for theevents in S. In this section, we shall study in particular the probability of the union ni=1 Ai of n events A1, . . . , An. If the events A1, . . . , An are disjoint, we know that n n Ai = Pr(Ai ). Pr i=1

i=1

Furthermore, for every two events A1 and A2 , regardless of whether or not they are disjoint, we know from Theorem 1.5.7 of Sec. 1.5 that Pr(A1 ∪ A2 ) = Pr(A1) + Pr(A2 ) − Pr(A1 ∩ A2 ). In this section, we shall extend this result, ﬁrst to three events and then to an arbitrary ﬁnite number of events.

The Union of Three Events Theorem 1.10.1

For every three events A1, A2 , and A3,

1.10 The Probability of a Union of Events

47

Pr(A1 ∪ A2 ∪ A3) = Pr(A1) + Pr(A2 ) + Pr(A3) − [Pr(A1 ∩ A2 ) + Pr(A2 ∩ A3) + Pr(A1 ∩ A3)] + Pr(A1 ∩ A2 ∩ A3).

(1.10.1)

Proof By the associative property of unions (Theorem 1.4.6), we can write A1 ∪ A2 ∪ A3 = (A1 ∪ A2 ) ∪ A3. Apply Theorem 1.5.7 to the two events A = A1 ∪ A2 and B = A3 to obtain Pr(A1 ∪ A2 ∪ A3) = Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

(1.10.2)

We next compute the three probabilities on the far right side of (1.10.2) and combine them to get (1.10.1). First, apply Theorem 1.5.7 to the two events A1 and A2 to obtain Pr(A) = Pr(A1) + Pr(A2 ) − Pr(A1 ∩ A2 ).

(1.10.3)

Next, use the ﬁrst distributive property in Theorem 1.4.10 to write A ∩ B = (A1 ∪ A2 ) ∩ A3 = (A1 ∩ A3) ∪ (A2 ∩ A3).

(1.10.4)

Apply Theorem 1.5.7 to the events on the far right side of (1.10.4) to obtain Pr(A ∩ B) = Pr(A1 ∩ A3) + Pr(A2 ∩ A3) − Pr(A1 ∩ A2 ∩ A3).

(1.10.5)

Substitute (1.10.3), Pr(B) = Pr(A3), and (1.10.5) into (1.10.2) to complete the proof.

Example 1.10.1

Student Enrollment. Among a group of 200 students, 137 students are enrolled in a mathematics class, 50 students are enrolled in a history class, and 124 students are enrolled in a music class. Furthermore, the number of students enrolled in both the mathematics and history classes is 33, the number enrolled in both the history and music classes is 29, and the number enrolled in both the mathematics and music classes is 92. Finally, the number of students enrolled in all three classes is 18. We shall determine the probability that a student selected at random from the group of 200 students will be enrolled in at least one of the three classes. Let A1 denote the event that the selected student is enrolled in the mathematics class, let A2 denote the event that he is enrolled in the history class, and let A3 denote the event that he is enrolled in the music class. To solve the problem, we must determine the value of Pr(A1 ∪ A2 ∪ A3). From the given numbers, Pr(A1) =

137 , 200

Pr(A1 ∩ A2 ) =

Pr(A2 ) = 33 , 200

Pr(A1 ∩ A2 ∩ A3) =

50 , 200

Pr(A3) =

Pr(A2 ∩ A3) =

29 , 200

124 , 200 Pr(A1 ∩ A3) =

92 , 200

18 . 200

It follows from Eq. (1.10.1) that Pr(A1 ∪ A2 ∪ A3) = 175/200 = 7/8.

The Union of a Finite Number of Events A result similar to Theorem 1.10.1 holds for any arbitrary ﬁnite number of events, as shown by the following theorem.

48

Chapter 1 Introduction to Probability

Theorem 1.10.2

For every n events A1, . . . , An, n n Pr Ai = Pr(Ai ) − Pr(Ai ∩ Aj ) + Pr(Ai ∩ Aj ∩ Ak ) i=1

i=1

−

i 0, what is the value of Pr(A|B)? 2. If A and B are disjoint events and Pr(B) > 0, what is the value of Pr(A|B)? 3. If S is the sample space of an experiment and A is any event in that space, what is the value of Pr(A|S)? 4. Each time a shopper purchases a tube of toothpaste, he chooses either brand A or brand B. Suppose that for each purchase after the ﬁrst, the probability is 1/3 that he will choose the same brand that he chose on his preceding purchase and the probability is 2/3 that he will switch brands. If he is equally likely to choose either brand A or brand B on his ﬁrst purchase, what is the probability that both his ﬁrst and second purchases will be brand A and both his third and fourth purchases will be brand B? 5. A box contains r red balls and b blue balls. One ball is selected at random and its color is observed. The ball is then returned to the box and k additional balls of the same color are also put into the box. A second ball is then selected at random, its color is observed, and it is returned to the box together with k additional balls of the same color. Each time another ball is selected, the process is repeated. If four balls are selected, what is the probability that the ﬁrst three balls will be red and the fourth ball will be blue? 6. A box contains three cards. One card is red on both sides, one card is green on both sides, and one card is red on one side and green on the other. One card is selected from the box at random, and the color on one side is observed. If this side is green, what is the probability that the other side of the card is also green? 7. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city subscribes to newspaper A, what is the probability that the family also subscribes to newspaper B? 8. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city subscribes to at least one of the three newspapers A, B, and C, what is the probability that the family subscribes to newspaper A? 9. Suppose that a box contains one blue card and four red cards, which are labeled A, B, C, and D. Suppose also that

two of these ﬁve cards are selected at random, without replacement. a. If it is known that card A has been selected, what is the probability that both cards are red? b. If it is known that at least one red card has been selected, what is the probability that both cards are red? 10. Consider the following version of the game of craps: The player rolls two dice. If the sum on the ﬁrst roll is 7 or 11, the player wins the game immediately. If the sum on the ﬁrst roll is 2, 3, or 12, the player loses the game immediately. However, if the sum on the ﬁrst roll is 4, 5, 6, 8, 9, or 10, then the two dice are rolled again and again until the sum is either 7 or 11 or the original value. If the original value is obtained a second time before either 7 or 11 is obtained, then the player wins. If either 7 or 11 is obtained before the original value is obtained a second time, then the player loses. Determine the probability that the player will win this game. 11. For any two events A and B with Pr(B) > 0, prove that Pr(Ac |B) = 1 − Pr(A|B). 12. For any three events A, B, and D, such that Pr(D) > 0, prove that Pr(A ∪ B|D) = Pr(A|D) + Pr(B|D) − Pr(A ∩ B|D). 13. A box contains three coins with a head on each side, four coins with a tail on each side, and two fair coins. If one of these nine coins is selected at random and tossed once, what is the probability that a head will be obtained? 14. A machine produces defective parts with three different probabilities depending on its state of repair. If the machine is in good working order, it produces defective parts with probability 0.02. If it is wearing down, it produces defective parts with probability 0.1. If it needs maintenance, it produces defective parts with probability 0.3. The probability that the machine is in good working order is 0.8, the probability that it is wearing down is 0.1, and the probability that it needs maintenance is 0.1. Compute the probability that a randomly selected part will be defective.

66

Chapter 2 Conditional Probability

15. The percentages of voters classed as Liberals in three different election districts are divided as follows: in the ﬁrst district, 21 percent; in the second district, 45 percent; and in the third district, 75 percent. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a Liberal?

same brand of toothpaste that he chose on his preceding purchase is 1/3, and the probability that he will switch brands is 2/3. Suppose that on his ﬁrst purchase the probability that he will choose brand A is 1/4 and the probability that he will choose brand B is 3/4. What is the probability that his second purchase will be brand B?

16. Consider again the shopper described in Exercise 4. On each purchase, the probability that he will choose the

17. Prove the conditional version of the law of total probability (2.1.5).

2.2 Independent Events If learning that B has occurred does not change the probability of A, then we say that A and B are independent. There are many cases in which events A and B are not independent, but they would be independent if we learned that some other event C had occurred. In this case, A and B are conditionally independent given C. Example 2.2.1

Tossing Coins. Suppose that a fair coin is tossed twice. The experiment has four outcomes, HH, HT, TH, and TT, that tell us how the coin landed on each of the two tosses. We can assume that this sample space is simple so that each outcome has probability 1/4. Suppose that we are interested in the second toss. In particular, we want to calculate the probability of the event A = {H on second toss}. We see that A = {HH,TH}, so that Pr(A) = 2/4 = 1/2. If we learn that the ﬁrst coin landed T, we might wish to compute the conditional probability Pr(A|B) where B = {T on ﬁrst toss}. Using the deﬁnition of conditional probability, we easily compute Pr(A|B) =

Pr(A ∩ B) 1/4 1 = = , Pr(B) 1/2 2

because A ∩ B = {T H } has probability 1/4. We see that Pr(A|B) = Pr(A); hence, we don’t change the probability of A even after we learn that B has occurred.

Deﬁnition of Independence The conditional probability of the event A given that the event B has occurred is the revised probability of A after we learn that B has occurred. It might be the case, however, that no revision is necessary to the probability of A even after we learn that B occurs. This is precisely what happened in Example 2.2.1. In this case, we say that A and B are independent events. As another example, if we toss a coin and then roll a die, we could let A be the event that the die shows 3 and let B be the event that the coin lands with heads up. If the tossing of the coin is done in isolation of the rolling of the die, we might be quite comfortable assigning Pr(A|B) = Pr(A) = 1/6. In this case, we say that A and B are independent events. In general, if Pr(B) > 0, the equation Pr(A|B) = Pr(A) can be rewritten as Pr(A ∩ B)/ Pr(B) = Pr(A). If we multiply both sides of this last equation by Pr(B), we obtain the equation Pr(A ∩ B) = Pr(A) Pr(B). In order to avoid the condition Pr(B) > 0, the mathematical deﬁnition of the independence of two events is stated as follows: Deﬁnition 2.2.1

Independent Events. Two events A and B are independent if Pr(A ∩ B) = Pr(A) Pr(B).

2.2 Independent Events

67

Suppose that Pr(A) > 0 and Pr(B) > 0. Then it follows easily from the deﬁnitions of independence and conditional probability that A and B are independent if and only if Pr(A|B) = Pr(A) and Pr(B|A) = Pr(B).

Independence of Two Events If two events A and B are considered to be independent because the events are physically unrelated, and if the probabilities Pr(A) and Pr(B) are known, then the deﬁnition can be used to assign a value to Pr(A ∩ B). Example 2.2.2

Machine Operation. Suppose that two machines 1 and 2 in a factory are operated independently of each other. Let A be the event that machine 1 will become inoperative during a given 8-hour period, let B be the event that machine 2 will become inoperative during the same period, and suppose that Pr(A) = 1/3 and Pr(B) = 1/4. We shall determine the probability that at least one of the machines will become inoperative during the given period. The probability Pr(A ∩ B) that both machines will become inoperative during the period is

1 1 1 = . Pr(A ∩ B) = Pr(A) Pr(B) = 3 4 12 Therefore, the probability Pr(A ∪ B) that at least one of the machines will become inoperative during the period is Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) 1 1 1 1 = . = + − 3 4 12 2

The next example shows that two events A and B, which are physically related, can, nevertheless, satisfy the deﬁnition of independence. Example 2.2.3

Rolling a Die. Suppose that a balanced die is rolled. Let A be the event that an even number is obtained, and let B be the event that one of the numbers 1, 2, 3, or 4 is obtained. We shall show that the events A and B are independent. In this example, Pr(A) = 1/2 and Pr(B) = 2/3. Furthermore, since A ∩ B is the event that either the number 2 or the number 4 is obtained, Pr(A ∩ B) = 1/3. Hence, Pr(A ∩ B) = Pr(A) Pr(B). It follows that the events A and B are independent events, even though the occurrence of each event depends on the same roll of a die. The independence of the events A and B in Example 2.2.3 can also be interpreted as follows: Suppose that a person must bet on whether the number obtained on the die will be even or odd, that is, on whether or not the event A will occur. Since three of the possible outcomes of the roll are even and the other three are odd, the person will typically have no preference between betting on an even number and betting on an odd number. Suppose also that after the die has been rolled, but before the person has learned the outcome and before she has decided whether to bet on an even outcome or on an odd outcome, she is informed that the actual outcome was one of the numbers 1, 2, 3, or 4, i.e., that the event B has occurred. The person now knows that the outcome was 1, 2, 3, or 4. However, since two of these numbers are even and two are odd, the person will typically still have no preference between betting on an even number and betting on an odd number. In other words, the information that the event B has

68

Chapter 2 Conditional Probability

occurred is of no help to the person who is trying to decide whether or not the event A has occurred.

Independence of Complements

In the foregoing discussion of independent events, we stated that if A and B are independent, then the occurrence or nonoccurrence of A should not be related to the occurrence or nonoccurrence of B. Hence, if A and B satisfy the mathematical deﬁnition of independent events, then it should also be true that A and B c are independent events, that Ac and B are independent events, and that Ac and B c are independent events. One of these results is established in the next theorem. Theorem 2.2.1

If two events A and B are independent, then the events A and B c are also independent. Proof Theorem 1.5.6 says that Pr(A ∩ B c ) = Pr(A) − Pr(A ∩ B). Furthermore, since A and B are independent events, Pr(A ∩ B) = Pr(A) Pr(B). It now follows that Pr(A ∩ B c ) = Pr(A) − Pr(A) Pr(B) = Pr(A)[1 − Pr(B)] = Pr(A) Pr(B c ). Therefore, the events A and B c are independent. The proof of the analogous result for the events Ac and B is similar, and the proof for the events Ac and B c is required in Exercise 2 at the end of this section.

Independence of Several Events The deﬁnition of independent events can be extended to any number of events, A1, . . . , Ak . Intuitively, if learning that some of these events do or do not occur does not change our probabilities for any events that depend only on the remaining events, we would say that all k events are independent. The mathematical deﬁnition is the following analog to Deﬁnition 2.2.1. Deﬁnition 2.2.2

(Mutually) Independent Events. The k events A1, . . . , Ak are independent (or mutually independent) if, for every subset Ai1, . . . , Aij of j of these events (j = 2, 3, . . . , k), Pr(Ai1 ∩ . . . ∩ Aij ) = Pr(Ai1) . . . Pr(Aij ). As an example, in order for three events A, B, and C to be independent, the following four relations must be satisﬁed: Pr(A ∩ B) = Pr(A) Pr(B), (2.2.1) Pr(A ∩ C) = Pr(A) Pr(C), Pr(B ∩ C) = Pr(B) Pr(C), and Pr(A ∩ B ∩ C) = Pr(A) Pr(B) Pr(C).

(2.2.2)

It is possible that Eq. (2.2.2) will be satisﬁed, but one or more of the three relations (2.2.1) will not be satisﬁed. On the other hand, as is shown in the next example,

2.2 Independent Events

69

it is also possible that each of the three relations (2.2.1) will be satisﬁed but Eq. (2.2.2) will not be satisﬁed. Example 2.2.4

Pairwise Independence. Suppose that a fair coin is tossed twice so that the sample space S = {HH, HT, TH, TT} is simple. Deﬁne the following three events: A = {H on ﬁrst toss} = {HH, HT}, B = {H on second toss} = {HH, TH}, and C = {Both tosses the same} = {HH, TT}. Then A ∩ B = A ∩ C = B ∩ C = A ∩ B ∩ C = {H H }. Hence, Pr(A) = Pr(B) = Pr(C) = 1/2 and Pr(A ∩ B) = Pr(A ∩ C) = Pr(B ∩ C) = Pr(A ∩ B ∩ C) = 1/4. It follows that each of the three relations of Eq. (2.2.1) is satisﬁed but Eq. (2.2.2) is not satisﬁed. These results can be summarized by saying that the events A, B, and C are pairwise independent, but all three events are not independent. We shall now present some examples that will illustrate the power and scope of the concept of independence in the solution of probability problems.

Example 2.2.5

Inspecting Items. Suppose that a machine produces a defective item with probability p (0 < p < 1) and produces a nondefective item with probability 1 − p. Suppose further that six items produced by the machine are selected at random and inspected, and that the results (defective or nondefective) for these six items are independent. We shall determine the probability that exactly two of the six items are defective. It can be assumed that the sample space S contains all possible arrangements of six items, each one of which might be either defective or nondefective. For j = 1, . . . , 6, we shall let Dj denote the event that the j th item in the sample is defective so that Djc is the event that this item is nondefective. Since the outcomes for the six different items are independent, the probability of obtaining any particular sequence of defective and nondefective items will simply be the product of the individual probabilities for the items. For example, Pr(D1c ∩ D2 ∩ D3c ∩ D4c ∩ D5 ∩ D6c ) = Pr(D1c ) Pr(D2 ) Pr(D3c ) Pr(D4c ) Pr(D5) Pr(D6c ) = (1 − p)p(1 − p)(1 − p)p(1 − p) = p 2 (1 − p)4. It can be seen that the probability of any other particular sequence in S containing two defective items and four nondefective items will also be p 2 (1 − p)4. Hence, the probability that there will be exactly two defectives in the sample of six items can be found by multiplying the probability p 2 (1 − p)4 of any particular sequence containing two defectives by the possible number of such sequences. Since there are 26 distinct arrangements of two defective items and four nondefective items, the probability of obtaining exactly two defectives is 26 p 2 (1 − p)4.

Example 2.2.6

Obtaining a Defective Item. For the conditions of Example 2.2.5, we shall now determine the probability that at least one of the six items in the sample will be defective. Since the outcomes for the different items are independent, the probability that all six items will be nondefective is (1 − p)6. Therefore, the probability that at least one item will be defective is 1 − (1 − p)6.

70

Chapter 2 Conditional Probability

Example 2.2.7

Tossing a Coin Until a Head Appears. Suppose that a fair coin is tossed until a head appears for the ﬁrst time, and assume that the outcomes of the tosses are independent. We shall determine the probability pn that exactly n tosses will be required. The desired probability is equal to the probability of obtaining n − 1 tails in succession and then obtaining a head on the next toss. Since the outcomes of the tosses are independent, the probability of this particular sequence of n outcomes is pn = (1/2)n. The probability that a head will be obtained sooner or later (or, equivalently, that tails will not be obtained forever) is ∞ n=1

pn =

1 1 1 ... + + + = 1. 2 4 8

Since the sum of the probabilities pn is 1, it follows that the probability of obtaining an inﬁnite sequence of tails without ever obtaining a head must be 0. Example 2.2.8

Inspecting Items One at a Time. Consider again a machine that produces a defective item with probability p and produces a nondefective item with probability 1 − p. Suppose that items produced by the machine are selected at random and inspected one at a time until exactly ﬁve defective items have been obtained. We shall determine the probability pn that exactly n items (n ≥ 5) must be selected to obtain the ﬁve defectives. The ﬁfth defective item will be the nth item that is inspected if and only if there are exactly four defectives among the ﬁrst n − 1 items and then the nth item is defective. By reasoning similar to that given in Example 2.2.5, it can be shown that the probability of obtaining exactly four defectives and n − 5 nondefectives among 4 n−5. The probability that the nth item will be the ﬁrst n − 1 items is n−1 4 p (1 − p) defective is p. Since the ﬁrst event refers to outcomes for only the ﬁrst n − 1 items and the second event refers to the outcome for only the nth item, these two events are independent. Therefore, the probability that both events will occur is equal to the product of their probabilities. It follows that

n−1 5 p (1 − p)n−5. pn = 4

Example 2.2.9

People v. Collins. Finkelstein and Levin (1990) describe a criminal case whose verdict was overturned by the Supreme Court of California in part due to a probability calculation involving both conditional probability and independence. The case, People v. Collins, 68 Cal. 2d 319, 438 P.2d 33 (1968), involved a purse snatching in which witnesses claimed to see a young woman with blond hair in a ponytail ﬂeeing from the scene in a yellow car driven by a black man with a beard. A couple meeting the description was arrested a few days after the crime, but no physical evidence was found. A mathematician calculated the probability that a randomly selected couple would possess the described characteristics as about 8.3 × 10−8, or 1 in 12 million. Faced with such overwhelming odds and no physical evidence, the jury decided that the defendants must have been the only such couple and convicted them. The Supreme Court thought that a more useful probability should have been calculated. Based on the testimony of the witnesses, there was a couple that met the above description. Given that there was already one couple who met the description, what is the conditional probability that there was also a second couple such as the defendants? Let p be the probability that a randomly selected couple from a population of n couples has certain characteristics. Let A be the event that at least one couple in the population has the characteristics, and let B be the event that at least two couples

2.2 Independent Events

71

have the characteristics. What we seek is Pr(B|A). Since B ⊂ A, it follows that Pr(B|A) =

Pr(B ∩ A) Pr(B) = . Pr(A) Pr(A)

We shall calculate Pr(B) and Pr(A) by breaking each event into more manageable pieces. Suppose that we number the n couples in the population from 1 to n. Let Ai be the event that couple number i has the characteristics in question for i = 1, . . . , n, and let C be the event that exactly one couple has the characteristics. Then A = (Ac ∩ Ac . . . ∩ Ac )c , 1

2

n

C = (A1 ∩ Ac2 . . . ∩ Acn) ∪ (Ac1 ∩ A2 ∩ Ac3 . . . ∩ Acn) ∪ . . . ∪ (Ac1 ∩ . . . ∩ Acn−1 ∩ An), B = A ∩ C c. Assuming that the n couples are mutually independent, Pr(Ac ) = (1 − p)n, and Pr(A) = 1 − (1 − p)n. The n events whose union is C are disjoint and each one has probability p(1 − p)n−1, so Pr(C) = np(1 − p)n−1. Since A = B ∪ C with B and C disjoint, we have Pr(B) = Pr(A) − Pr(C) = 1 − (1 − p)n − np(1 − p)n−1. So, Pr(B|A) =

1 − (1 − p)n − np(1 − p)n−1 . 1 − (1 − p)n

(2.2.3)

The Supreme Court of California reasoned that, since the crime occurred in a heavily populated area, n would be in the millions. For example, with p = 8.3 × 10−8 and n = 8,000,000, the value of (2.2.3) is 0.2966. Such a probability suggests that there is a reasonable chance that there was another couple meeting the same description as the witnesses provided. Of course, the court did not know how large n was, but the fact that (2.2.3) could easily be so large was grounds enough to rule that reasonable doubt remained as to the guilt of the defendants.

Independence and Conditional Probability Two events A and B with positive probability are independent if and only if Pr(A|B) = Pr(A). Similar results hold for larger collections of independent events. The following theorem, for example, is straightforward to prove based on the deﬁnition of independence. Theorem 2.2.2

Let A1, . . . , Ak be events such that Pr(A1 ∩ . . . ∩ Ak ) > 0. Then A1, . . . , Ak are independent if and only if, for every two disjoint subsets {i1, . . . , im} and {j1, . . . , j} of {1, . . . , k}, we have Pr(Ai1 ∩ . . . ∩ Aim |Aj1 ∩ . . . ∩ Aj ) = Pr(Ai1 ∩ . . . ∩ Aim ). Theorem 2.2.2 says that k events are independent if and only if learning that some of the events occur does not change the probability that any combination of the other events occurs.

The Meaning of Independence

We have given a mathematical deﬁnition of independent events in Deﬁnition 2.2.1. We have also given some interpretations for what it means for events to be independent. The most instructive interpretation is the one based on conditional probability. If learning that B occurs does not change the probability of A, then A and B are independent. In simple examples such as tossing what we believe to be a fair coin, we would generally not expect to change our minds

72

Chapter 2 Conditional Probability

about what is likely to happen on later ﬂips after we observe earlier ﬂips; hence, we declare the events that concern different ﬂips to be independent. However, consider a situation similar to Example 2.2.5 in which items produced by a machine are inspected to see whether or not they are defective. In Example 2.2.5, we declared that the different items were independent and that each item had probability p of being defective. This might make sense if we were conﬁdent that we knew how well the machine was performing. But if we were unsure of how the machine were performing, we could easily imagine changing our mind about the probability that the 10th item is defective depending on how many of the ﬁrst nine items are defective. To be speciﬁc, suppose that we begin by thinking that the probability is 0.08 that an item will be defective. If we observe one or zero defective items in the ﬁrst nine, we might not make much revision to the probability that the 10th item is defective. On the other hand, if we observe eight or nine defectives in the ﬁrst nine items, we might be uncomfortable keeping the probability at 0.08 that the 10th item will be defective. In summary, when deciding whether to model events as independent, try to answer the following question: “If I were to learn that some of these events occurred, would I change the probabilities of any of the others?” If we feel that we already know everything that we could learn from these events about how likely the others should be, we can safely model them as independent. If, on the other hand, we feel that learning some of these events could change our minds about how likely some of the others are, then we should be more careful about determining the conditional probabilities and not model the events as independent.

Mutually Exclusive Events and Mutually Independent Events Two similar-sounding deﬁnitions have appeared earlier in this text. Deﬁnition 1.4.10 deﬁnes mutually exclusive events, and Deﬁnition 2.2.2 deﬁnes mutually independent events. It is almost never the case that the same set of events satisﬁes both deﬁnitions. The reason is that if events are disjoint (mutually exclusive), then learning that one occurs means that the others deﬁnitely did not occur. Hence, learning that one occurs would change the probabilities for all the others to 0, unless the others already had probability 0. Indeed, this suggests the only condition in which the two deﬁnitions would both apply to the same collection of events. The proof of the following result is left to Exercise 24 in this section. Theorem 2.2.3

Let n > 1 and let A1, . . . , An be events that are mutually exclusive. The events are also mutually independent if and only if all the events except possibly one of them has probability 0.

Conditionally Independent Events Conditional probability and independence combine into one of the most versatile models of data collection. The idea is that, in many circumstances, we are unwilling to say that certain events are independent because we believe that learning some of them will provide information about how likely the others are to occur. But if we knew the frequency with which such events would occur, we might then be willing to assume that they are independent. This model can be illustrated using one of the examples from earlier in this section. Example 2.2.10

Inspecting Items. Consider again the situation in Example 2.2.5. This time, however, suppose that we believe that we would change our minds about the probabilities of later items being defective were we to learn that certain numbers of early items

2.2 Independent Events

73

were defective. Suppose that we think of the number p from Example 2.2.5 as the proportion of defective items that we would expect to see if we were to inspect a very large sample of items. If we knew this proportion p, and if we were to sample only a few, say, six or 10 items now, we might feel conﬁdent maintaining that the probability of a later item being defective remains p even after we inspect some of the earlier items. On the other hand, if we are not sure what would be the proportion of defective items in a large sample, we might not feel conﬁdent keeping the probability the same as we continue to inspect. To be precise, suppose that we treat the proportion p of defective items as unknown and that we are dealing with an augmented experiment as described in Deﬁnition 2.1.3. For simplicity, suppose that p can take one of two values, either 0.01 or 0.4, the ﬁrst corresponding to normal operation and the second corresponding to a need for maintenance. Let B1 be the event that p = 0.01, and let B2 be the event that p = 0.4. If we knew that B1 had occurred, then we would proceed under the assumption that the events D1, D2 , . . . were independent with Pr(Di |B1) = 0.01 for all i. For example, we could do the same calculations as in Examples 2.2.5 and 2.2.8 with p = 0.01. Let A be the event that we observe exactly two defectives in a random sample of six items. Then Pr(A|B1) = 26 0.012 0.994 = 1.44 × 10−3. Similarly, if we knew that B2 had occurred, then we would assume that D1, D2 , . . . were independent with Pr(Di |B2 ) = 0.4. In this case, Pr(A|B2 ) = 26 0.42 0.64 = 0.311. In Example 2.2.10, there is no reason that p must be required to assume at most two different values. We could easily allow p to take a third value or a fourth value, etc. Indeed, in Chapter 3 we shall learn how to handle the case in which every number between 0 and 1 is a possible value of p. The point of the simple example is to illustrate the concept of assuming that events are independent conditional on another event, such as B1 or B2 in the example. The formal concept illustrated in Example 2.2.10 is the following: Deﬁnition 2.2.3

Conditional Independence. We say that events A1, . . . , Ak are conditionally independent given B if, for every subcollection Ai1, . . . , Aij of j of these events (j = 2, 3, . . . , k), Pr Ai1 ∩ . . . ∩ Aij B = Pr(Ai1|B) . . . Pr(Aij |B). Deﬁnition 2.2.3 is identical to Deﬁnition 2.2.2 for independent events with the modiﬁcation that all probabilities in the deﬁnition are now conditional on B. As a note, even if we assume that events A1, . . . , Ak are conditionally independent given B, it is not necessary that they be conditionally independent given B c . In Example 2.2.10, the events D1, D2 , . . . were conditionally independent given both B1 and B2 = B1c , which is the typical situation. Exercise 16 in Sec. 2.3 is an example in which events are conditionally independent given one event B but are not conditionally independent given the complement B c . Recall that two events A1 and A2 (with Pr(A1) > 0) are independent if and only if Pr(A2 |A1) = Pr(A2 ). A similar result holds for conditionally independent events.

Theorem 2.2.4

Suppose that A1, A2 , and B are events such that Pr(A1 ∩ B) > 0. Then A1 and A2 are conditionally independent given B if and only if Pr(A2 |A1 ∩ B) = Pr(A2 |B). This is another example of the claim we made earlier that every result we can prove has an analog conditional on an event B. The reader can prove this theorem in Exercise 22.

74

Chapter 2 Conditional Probability

The Collector’s Problem Suppose that n balls are thrown in a random manner into r boxes (r ≤ n). We shall assume that the n throws are independent and that each of the r boxes is equally likely to receive any given ball. The problem is to determine the probability p that every box will receive at least one ball. This problem can be reformulated in terms of a collector’s problem as follows: Suppose that each package of bubble gum contains the picture of a baseball player, that the pictures of r different players are used, that the picture of each player is equally likely to be placed in any given package of gum, and that pictures are placed in different packages independently of each other. The problem now is to determine the probability p that a person who buys n packages of gum (n ≥ r) will obtain a complete set of r different pictures. For i = 1, . . . , r, let A i denote the event that the picture of player i is missing from all n packages. Then ri=1 Ai is the event that the picture of at least one player is missing. We shall ﬁnd Pr( ri=1 Ai ) by applying Eq. (1.10.6). Since the picture of each of the r players is equally likely to be placed in any particular package, the probability that the picture of player i will not be obtained in any particular package is (r − 1)/r. Since the packages are ﬁlled independently, the probability that the picture of player i will not be obtained in any of the n packages is [(r − 1)/r]n. Hence, n

r −1 for i = 1, . . . , r. Pr(Ai ) = r Now consider any two players i and j . The probability that neither the picture of player i nor the picture of player j will be obtained in any particular package is (r − 2)/r. Therefore, the probability that neither picture will be obtained in any of the n packages is [(r − 2)/r]n. Thus, n

r −2 . Pr(Ai ∩ Aj ) = r If we next consider any three players i, j , and k, we ﬁnd that n

r −3 . Pr(Ai ∩ Aj ∩ Ak ) = r By continuing in this way, we ﬁnally arrive at the probability Pr(A1 ∩ A2 ∩ . . . ∩ Ar ) that the pictures of all r players are missing from the n packages. Of course, this probability is 0. Therefore, by Eq. (1.10.6) of Sec. 1.10, r n n n

r −2 1 r r −1 r r . . . Pr Ai = r − + + (−1) r −1 r r r 2 i=1

r−1 n r j 1− = (−1)j +1 . j r j =1 Since the probability p of obtaining a complete set of r different pictures is equal to 1 − Pr( ri=1 Ai ), it follows from the foregoing derivation that p can be written in the form

n r−1 j j r p= 1− (−1) . j r j =0

2.2 Independent Events

75

Summary A collection of events is independent if and only if learning that some of them occur does not change the probabilities that any combination of the rest of them occurs. Equivalently, a collection of events is independent if and only if the probability of the intersection of every subcollection is the product of the individual probabilities. The concept of independence has a version conditional on another event. A collection of events is independent conditional on B if and only if the conditional probability of the intersection of every subcollection given B is the product of the individual conditional probabilities given B. Equivalently, a collection of events is conditionally independent given B if and only if learning that some of them (and B) occur does not change the conditional probabilities given B that any combination of the rest of them occur. The full power of conditional independence will become more apparent after we introduce Bayes’ theorem in the next section.

Exercises 1. If A and B are independent events and Pr(B) < 1, what is the value of Pr(Ac |B c )? 2. Assuming that A and B are independent events, prove that the events Ac and B c are also independent. 3. Suppose that A is an event such that Pr(A) = 0 and that B is any other event. Prove that A and B are independent events. 4. Suppose that a person rolls two balanced dice three times in succession. Determine the probability that on each of the three rolls, the sum of the two numbers that appear will be 7. 5. Suppose that the probability that the control system used in a spaceship will malfunction on a given ﬂight is 0.001. Suppose further that a duplicate, but completely independent, control system is also installed in the spaceship to take control in case the ﬁrst system malfunctions. Determine the probability that the spaceship will be under the control of either the original system or the duplicate system on a given ﬂight. 6. Suppose that 10,000 tickets are sold in one lottery and 5000 tickets are sold in another lottery. If a person owns 100 tickets in each lottery, what is the probability that she will win at least one ﬁrst prize? 7. Two students A and B are both registered for a certain course. Assume that student A attends class 80 percent of the time, student B attends class 60 percent of the time, and the absences of the two students are independent. a. What is the probability that at least one of the two students will be in class on a given day? b. If at least one of the two students is in class on a given day, what is the probability that A is in class that day?

8. If three balanced dice are rolled, what is the probability that all three numbers will be the same? 9. Consider an experiment in which a fair coin is tossed until a head is obtained for the ﬁrst time. If this experiment is performed three times, what is the probability that exactly the same number of tosses will be required for each of the three performances? 10. The probability that any child in a certain family will have blue eyes is 1/4, and this feature is inherited independently by different children in the family. If there are ﬁve children in the family and it is known that at least one of these children has blue eyes, what is the probability that at least three of the children have blue eyes? 11. Consider the family with ﬁve children described in Exercise 10. a. If it is known that the youngest child in the family has blue eyes, what is the probability that at least three of the children have blue eyes? b. Explain why the answer in part (a) is different from the answer in Exercise 10. 12. Suppose that A, B, and C are three independent events such that Pr(A) = 1/4, Pr(B) = 1/3, and Pr(C) = 1/2. (a) Determine the probability that none of these three events will occur. (b) Determine the probability that exactly one of these three events will occur. 13. Suppose that the probability that any particle emitted by a radioactive material will penetrate a certain shield is 0.01. If 10 particles are emitted, what is the probability that exactly one of the particles will penetrate the shield?

76

Chapter 2 Conditional Probability

14. Consider again the conditions of Exercise 13. If 10 particles are emitted, what is the probability that at least one of the particles will penetrate the shield? 15. Consider again the conditions of Exercise 13. How many particles must be emitted in order for the probability to be at least 0.8 that at least one particle will penetrate the shield? 16. In the World Series of baseball, two teams A and B play a sequence of games against each other, and the ﬁrst team that wins a total of four games becomes the winner of the World Series. If the probability that team A will win any particular game against team B is 1/3, what is the probability that team A will win the World Series? 17. Two boys A and B throw a ball at a target. Suppose that the probability that boy A will hit the target on any throw is 1/3 and the probability that boy B will hit the target on any throw is 1/4. Suppose also that boy A throws ﬁrst and the two boys take turns throwing. Determine the probability that the target will be hit for the ﬁrst time on the third throw of boy A. 18. For the conditions of Exercise 17, determine the probability that boy A will hit the target before boy B does. 19. A box contains 20 red balls, 30 white balls, and 50 blue balls. Suppose that 10 balls are selected at random one at a time, with replacement; that is, each selected ball is replaced in the box before the next selection is made. Determine the probability that at least one color will be missing from the 10 selected balls.

20. Suppose that A1, . . . , Ak form a sequence of k independent events. Let B1, . . . , Bk be another sequence of k events such that for each value of j (j = 1, . . . , k), either Bj = Aj or Bj = Acj . Prove that B1, . . . , Bk are also independent events. Hint: Use an induction argument based on the number of events Bj for which Bj = Acj . 21. Prove Theorem 2.2.2 on page 71. Hint: The “only if ” direction is direct from the deﬁnition of independence on page 68. For the “if ” direction, use induction on the value of j in the deﬁnition of independence. Let m = j − 1 and let = 1 with j1 = ij . 22. Prove Theorem 2.2.4 on page 73. 23. A programmer is about to attempt to compile a series of 11 similar programs. Let Ai be the event that the ith program compiles successfully for i = 1, . . . , 11. When the programming task is easy, the programmer expects that 80 percent of programs should compile. When the programming task is difﬁcult, she expects that only 40 percent of the programs will compile. Let B be the event that the programming task was easy. The programmer believes that the events A1, . . . , A11 are conditionally independent given B and given B c . a. Compute the probability that exactly 8 out of 11 programs will compile given B. b. Compute the probability that exactly 8 out of 11 programs will compile given B c . 24. Prove Theorem 2.2.3 on page 72.

2.3 Bayes’ Theorem Suppose that we are interested in which of several disjoint events B1, . . . , Bk will occur and that we will get to observe some other event A. If Pr(A|Bi ) is available for each i, then Bayes’ theorem is a useful formula for computing the conditional probabilities of the Bi events given A. We begin with a typical example. Example 2.3.1

Test for a Disease. Suppose that you are walking down the street and notice that the Department of Public Health is giving a free medical test for a certain disease. The test is 90 percent reliable in the following sense: If a person has the disease, there is a probability of 0.9 that the test will give a positive response; whereas, if a person does not have the disease, there is a probability of only 0.1 that the test will give a positive response. Data indicate that your chances of having the disease are only 1 in 10,000. However, since the test costs you nothing, and is fast and harmless, you decide to stop and take the test. A few days later you learn that you had a positive response to the test. Now, what is the probability that you have the disease?

2.3 Bayes’ Theorem

77

The last question in Example 2.3.1 is a prototype of the question for which Bayes’ theorem was designed. We have at least two disjoint events (“you have the disease” and “you do not have the disease”) about which we are uncertain, and we learn a piece of information (the result of the test) that tells us something about the uncertain events. Then we need to know how to revise the probabilities of the events in the light of the information we learned. We now present the general structure in which Bayes’ theorem operates before returning to the example.

Statement, Proof, and Examples of Bayes’ Theorem Example 2.3.2

Selecting Bolts. Consider again the situation in Example 2.1.8, in which a bolt is selected at random from one of two boxes. Suppose that we cannot tell without making a further effort from which of the two boxes the one bolt is being selected. For example, the boxes may be identical in appearance or somebody else may actually select the box, but we only get to see the bolt. Prior to selecting the bolt, it was equally likely that each of the two boxes would be selected. However, if we learn that event A has occurred, that is, a long bolt was selected, we can compute the conditional probabilities of the two boxes given A. To remind the reader, B1 is the event that the box is selected containing 60 long bolts and 40 short bolts, while B2 is the event that the box is selected containing 10 long bolts and 20 short bolts. In Example 2.1.9, we computed Pr(A) = 7/15, Pr(A|B1) = 3/5, Pr(A|B2 ) = 1/3, and Pr(B1) = Pr(B2 ) = 1/2. So, for example, Pr(B1|A) =

Pr(A ∩ B1) Pr(B1) Pr(A|B1) = = Pr(A) Pr(A)

1 2

× 7 15

3 5

=

9 . 14

Since the ﬁrst box has a higher proportion of long bolts than the second box, it seems reasonable that the probability of B1 should rise after we learn that a long bolt was selected. It must be that Pr(B2 |A) = 5/14 since one or the other box had to be selected. In Example 2.3.2, we started with uncertainty about which of two boxes would be chosen and then we observed a long bolt drawn from the chosen box. Because the two boxes have different chances of having a long bolt drawn, the observation of a long bolt changed the probabilities of each of the two boxes having been chosen. The precise calculation of how the probabilities change is the purpose of Bayes’ theorem. Theorem 2.3.1

Bayes’ theorem. Let the events B1, . . . , Bk form a partition of the space S such that Pr(Bj ) > 0 for j = 1, . . . , k, and let A be an event such that Pr(A) > 0. Then, for i = 1, . . . , k, Pr(Bi ) Pr(A|Bi ) Pr(Bi |A) = k . (2.3.1) j =1 Pr(Bj ) Pr(A|Bj ) Proof By the deﬁnition of conditional probability, Pr(Bi |A) =

Pr(Bi ∩ A) . Pr(A)

The numerator on the right side of Eq. (2.3.1) is equal to Pr(Bi ∩ A) by Theorem 2.1.1. The denominator is equal to Pr(A) according to Theorem 2.1.4.

78

Chapter 2 Conditional Probability

Example 2.3.3

Test for a Disease. Let us return to the example with which we began this section. We have just received word that we have tested positive for a disease. The test was 90 percent reliable in the sense that we described in Example 2.3.1. We want to know the probability that we have the disease after we learn that the result of the test is positive. Some readers may feel that this probability should be about 0.9. However, this feeling completely ignores the small probability of 0.0001 that you had the disease before taking the test. We shall let B1 denote the event that you have the disease, and let B2 denote the event that you do not have the disease. The events B1 and B2 form a partition. Also, let A denote the event that the response to the test is positive. The event A is information we will learn that tells us something about the partition elements. Then, by Bayes’ theorem, Pr(B1|A) = =

Pr(A|B1) Pr(B1) Pr(A|B1) Pr(B1) + Pr(A|B2 ) Pr(B2 ) (0.9)(0.0001) = 0.00090. (0.9)(0.0001) + (0.1)(0.9999)

Thus, the conditional probability that you have the disease given the test result is approximately only 1 in 1000. Of course, this conditional probability is approximately 9 times as great as the probability was before you were tested, but even the conditional probability is quite small. Another way to explain this result is as follows: Only one person in every 10,000 actually has the disease, but the test gives a positive response for approximately one person in every 10. Hence, the number of positive responses is approximately 1000 times the number of persons who actually have the disease. In other words, out of every 1000 persons for whom the test gives a positive response, only one person actually has the disease. This example illustrates not only the use of Bayes’ theorem but also the importance of taking into account all of the information available in a problem. Example 2.3.4

Identifying the Source of a Defective Item. Three different machines M1, M2, and M3 were used for producing a large batch of similar manufactured items. Suppose that 20 percent of the items were produced by machine M1, 30 percent by machine M2 , and 50 percent by machine M3. Suppose further that 1 percent of the items produced by machine M1 are defective, that 2 percent of the items produced by machine M2 are defective, and that 3 percent of the items produced by machine M3 are defective. Finally, suppose that one item is selected at random from the entire batch and it is found to be defective. We shall determine the probability that this item was produced by machine M2 . Let Bi be the event that the selected item was produced by machine Mi (i = 1, 2, 3), and let A be the event that the selected item is defective. We must evaluate the conditional probability Pr(B2 |A). The probability Pr(Bi ) that an item selected at random from the entire batch was produced by machine Mi is as follows, for i = 1, 2, 3: Pr(B1) = 0.2,

Pr(B2 ) = 0.3,

Pr(B3) = 0.5.

Furthermore, the probability Pr(A|Bi ) that an item produced by machine Mi will be defective is Pr(A|B1) = 0.01,

Pr(A|B2 ) = 0.02,

It now follows from Bayes’ theorem that

Pr(A|B3) = 0.03.

2.3 Bayes’ Theorem

79

Pr(B2 ) Pr(A|B2 ) Pr(B2 |A) = 3 j =1 Pr(Bj ) Pr(A|Bj ) = Example 2.3.5

(0.3)(0.02) = 0.26. (0.2)(0.01) + (0.3)(0.02) + (0.5)(0.03)

Identifying Genotypes. Consider a gene that has two alleles (see Example 1.6.4 on page 23) A and a. Suppose that the gene exhibits itself through a trait (such as hair color or blood type) with two versions. We call A dominant and a recessive if individuals with genotypes AA and Aa have the same version of the trait and the individuals with genotype aa have the other version. The two versions of the trait are called phenotypes. We shall call the phenotype exhibited by individuals with genotypes AA and Aa the dominant trait, and the other trait will be called the recessive trait. In population genetics studies, it is common to have information on the phenotypes of individuals, but it is rather difﬁcult to determine genotypes. However, some information about genotypes can be obtained by observing phenotypes of parents and children. Assume that the allele A is dominant, that individuals mate independently of genotype, and that the genotypes AA, Aa, and aa occur in the population with probabilities 1/4, 1/2, and 1/4, respectively. We are going to observe an individual whose parents are not available, and we shall observe the phenotype of this individual. Let E be the event that the observed individual has the dominant trait. We would like to revise our opinion of the possible genotypes of the parents. There are six possible genotype combinations, B1, . . . , B6, for the parents prior to making any observations, and these are listed in Table 2.2. The probabilities of the Bi were computed using the assumption that the parents mated independently of genotype. For example, B3 occurs if the father is AA and the mother is aa (probability 1/16) or if the father is aa and the mother is AA (probability 1/16). The values of Pr(E|Bi ) were computed assuming that the two available alleles are passed from parents to children with probability 1/2 each and independently for the two parents. For example, given B4, the event E occurs if and only if the child does not get two a’s. The probability of getting a from both parents given B4 is 1/4, so Pr(E|B4) = 3/4. Now we shall compute Pr(B1|E) and Pr(B5|E). We leave the other calculations to the reader. The denominator of Bayes’ theorem is the same for both calculations, namely, Pr(E) =

5

Pr(Bi ) Pr(E|Bi )

i=1

=

1 1 1 3 1 1 1 3 1 ×1+ ×1+ ×1+ × + × + ×0= . 16 4 8 4 4 4 2 16 4

Table 2.2 Parental genotypes for Example 2.3.5 (AA, AA) (AA, Aa) (AA, aa) (Aa, Aa) (Aa, aa) (aa, aa) Name of event Probability of Bi Pr(E|Bi )

B1

B2

B3

B4

B5

B6

1/16

1/4

1/8

1/4

1/4

1/16

1

1

1

3/4

1/2

0

80

Chapter 2 Conditional Probability

Applying Bayes’ theorem, we get Pr(B1|E) =

1 16

×1 3 4

=

1 , 12

Pr(B5|E) =

1 4

× 3 4

1 2

1 = . 6

Note: Conditional Version of Bayes’ Theorem. There is also a version of Bayes’ theorem conditional on an event C: Pr(Bi |C) Pr(A|Bi ∩ C) Pr(Bi |A ∩ C) = k . (2.3.2) Pr(B |C) Pr(A|B ∩ C) j j j =1

Prior and Posterior Probabilities In Example 2.3.4, a probability like Pr(B2 ) is often called the prior probability that the selected item will have been produced by machine M2 , because Pr(B2 ) is the probability of this event before the item is selected and before it is known whether the selected item is defective or nondefective. A probability like Pr(B2 |A) is then called the posterior probability that the selected item was produced by machine M2 , because it is the probability of this event after it is known that the selected item is defective. Thus, in Example 2.3.4, the prior probability that the selected item will have been produced by machine M2 is 0.3. After an item has been selected and has been found to be defective, the posterior probability that the item was produced by machine M2 is 0.26. Since this posterior probability is smaller than the prior probability that the item was produced by machine M2 , the posterior probability that the item was produced by one of the other machines must be larger than the prior probability that it was produced by one of those machines (see Exercises 1 and 2 at the end of this section).

Computation of Posterior Probabilities in More Than One Stage Suppose that a box contains one fair coin and one coin with a head on each side. Suppose also that one coin is selected at random and that when it is tossed, a head is obtained. We shall determine the probability that the coin is the fair coin. Let B1 be the event that the coin is fair, let B2 be the event that the coin has two heads, and let H1 be the event that a head is obtained when the coin is tossed. Then, by Bayes’ theorem, Pr(B1|H1) = =

Pr(B1) Pr(H1|B1) Pr(B1) Pr(H1|B1) + Pr(B2 ) Pr(H1|B2 ) 1 (1/2)(1/2) = . (1/2)(1/2) + (1/2)(1) 3

(2.3.3)

Thus, after the ﬁrst toss, the posterior probability that the coin is fair is 1/3. Now suppose that the same coin is tossed again and we assume that the two tosses are conditionally independent given both B1 and B2 . Suppose that another head is obtained. There are two ways of determining the new value of the posterior probability that the coin is fair. The ﬁrst way is to return to the beginning of the experiment and assume again that the prior probabilities are Pr(B1) = Pr(B2 ) = 1/2. We shall let H1 ∩ H2 denote the event in which heads are obtained on two tosses of the coin, and we shall calculate the posterior probability Pr(B1|H1 ∩ H2 ) that the coin is fair after we have observed the

2.3 Bayes’ Theorem

81

event H1 ∩ H2 . The assumption that the tosses are conditionally independent given B1 means that Pr(H1 ∩ H2 |B1) = 1/2 × 1/2 = 1/4. By Bayes’ theorem, Pr(B1|H1 ∩ H2 ) = =

Pr(B1) Pr(H1 ∩ H2 |B1) Pr(B1) Pr(H1 ∩ H2 |B1) + Pr(B2 ) Pr(H1 ∩ H2 |B2 ) 1 (1/2)(1/4) = . (1/2)(1/4) + (1/2)(1) 5

(2.3.4)

The second way of determining this same posterior probability is to use the conditional version of Bayes’ theorem (2.3.2) given the event H1. Given H1, the conditional probability of B1 is 1/3, and the conditional probability of B2 is therefore 2/3. These conditional probabilities can now serve as the prior probabilities for the next stage of the experiment, in which the coin is tossed a second time. Thus, we can apply (2.3.2) with C = H1, Pr(B1|H1) = 1/3, and Pr(B2 |H1) = 2/3. We can then compute the posterior probability Pr(B1|H1 ∩ H2 ) that the coin is fair after we have observed a head on the second toss and a head on the ﬁrst toss. We shall need Pr(H2 |B1 ∩ H1), which equals Pr(H2 |B1) = 1/2 by Theorem 2.2.4 since H1 and H2 are conditionally independent given B1. Since the coin is two-headed when B2 occurs, Pr(H2 |B2 ∩ H1) = 1. So we obtain Pr(B1|H1 ∩ H2 ) = =

Pr(B1|H1) Pr(H2 |B1 ∩ H1) Pr(B1|H1) Pr(H2 |B1 ∩ H1) + Pr(B2 |H1) Pr(H2 |B2 ∩ H1) 1 (1/3)(1/2) = . (1/3)(1/2) + (2/3)(1) 5

(2.3.5)

The posterior probability of the event B1 obtained in the second way is the same as that obtained in the ﬁrst way. We can make the following general statement: If an experiment is carried out in more than one stage, then the posterior probability of every event can also be calculated in more than one stage. After each stage has been carried out, the posterior probability calculated for the event after that stage serves as the prior probability for the next stage. The reader should look back at (2.3.2) to see that this interpretation is precisely what the conditional version of Bayes’ theorem says. The example we have been doing with coin tossing is typical of many applications of Bayes’ theorem and its conditional version because we are assuming that the observable events are conditionally independent given each element of the partition B1, . . . , Bk (in this case, k = 2). The conditional independence makes the probability of Hi (head on ith toss) given B1 (or given B2 ) the same whether or not we also condition on earlier tosses (see Theorem 2.2.4).

Conditionally Independent Events The calculations that led to (2.3.3) and (2.3.5) together with Example 2.2.10 illustrate simple cases of a very powerful statistical model for observable events. It is very common to encounter a sequence of events that we believe are similar in that they all have the same probability of occurring. It is also common that the order in which the events are labeled does not affect the probabilities that we assign. However, we often believe that these events are not independent, because, if we were to observe some of them, we would change our minds about the probability of the ones we had not observed depending on how many of the observed events occur. For example, in the coin-tossing calculation leading up to Eq. (2.3.3), before any tosses occur, the probability of H2 is the same as the probability of H1, namely, the

82

Chapter 2 Conditional Probability

denominator of (2.3.3), 3/4, as Theorem 2.1.4 says. However, after observing that the event H1 occurs, the probability of H2 is Pr(H2 |H1), which is the denominator of (2.3.5), 5/6, as computed by the conditional version of the law of total probability (2.1.5). Even though we might treat the coin tosses as independent conditional on the coin being fair, and we might treat them as independent conditional on the coin being two-headed (in which case we know what will happen every time anyway), we cannot treat them as independent without the conditioning information. The conditioning information removes an important source of uncertainty from the problem, so we partition the sample space accordingly. Now we can use the conditional independence of the tosses to calculate joint probabilities of various combinations of events conditionally on the partition events. Finally, we can combine these probabilities using Theorem 2.1.4 and (2.1.5). Two more examples will help to illustrate these ideas. Example 2.3.6

Learning about a Proportion. In Example 2.2.10 on page 72, a machine produced defective parts in one of two proportions, p = 0.01 or p = 0.4. Suppose that the prior probability that p = 0.01 is 0.9. After sampling six parts at random, suppose that we observe two defectives. What is the posterior probability that p = 0.01? Let B1 = {p = 0.01} and B2 = {p = 0.4} as in Example 2.2.10. Let A be the event that two defectives occur in a random sample of size six. The prior probability of B1 is 0.9, and the prior probability of B2 is 0.1. We already computed Pr(A|B1) = 1.44 × 10−3 and Pr(A|B2 ) = 0.311 in Example 2.2.10. Bayes’ theorem tells us that Pr(B1|A) =

0.9 × 1.44 × 10−3 = 0.04. 0.9 × 1.44 × 10−3 + 0.1 × 0.311

Even though we thought originally that B1 had probability as high as 0.9, after we learned that there were two defective items in a sample as small as six, we changed our minds dramatically and now we believe that B1 has probability as small as 0.04. The reason for this major change is that the event A that occurred has much higher probability if B2 is true than if B1 is true. Example 2.3.7

A Clinical Trial. Consider the same clinical trial described in Examples 2.1.12 and 2.1.13. Let Ei be the event that the ith patient has success as her outcome. Recall that Bj is the event that p = (j − 1)/10 for j = 1, . . . , 11, where p is the proportion of successes among all possible patients. If we knew which Bj occurred, we would say that E1, E2 , . . . were independent. That is, we are willing to model the patients as conditionally independent given each event Bj , and we set Pr(Ei |Bj ) = (j − 1)/10 for all i, j . We shall still assume that Pr(Bj ) = 1/11 for all j prior to the start of the trial. We are now in position to express what we learn about p by computing posterior probabilities for the Bj events after each patient ﬁnishes the trial. For example, consider the ﬁrst patient. We calculated Pr(E1) = 1/2 in (2.1.6). If E1 occurs, we apply Bayes’ theorem to get Pr(Bj |E1) =

Pr(E1|Bj ) Pr(Bj ) 1/2

=

2(j − 1) j − 1 = . 10 × 11 55

(2.3.6)

After observing one success, the posterior probabilities of large values of p are higher than their prior probabilities and the posterior probabilities of low values of p are lower than their prior probabilities as we would expect. For example, Pr(B1|E1) = 0, because p = 0 is ruled out after one success. Also, Pr(B2 |E1) = 0.0182, which is much smaller than its prior value 0.0909, and Pr(B11|E1) = 0.1818, which is larger than its prior value 0.0909.

2.3 Bayes’ Theorem

83

0.5 0.4 0.3 0.2 0.1 0

B1

B2

B3

B4

B5

B6

B7

B8

B9 B10 B11

Figure 2.3 The posterior probabilities of partition elements after 40 patients in Example 2.3.7.

We could check how the posterior probabilities behave after each patient is observed. However, we shall skip ahead to the point at which all 40 patients in the imipramine column of Table 2.1 have been observed. Let A stand for the observed event that 22 of them are successes and 18 are failures. We can use the same reasoning 40 possible sequences of 40 as in Example 2.2.5 to compute Pr(A|Bj ). There are 22 patients with 22 successes, and, conditional on Bj , the probability of each sequence is ([j − 1]/10)22 (1 − [j − 1]/10)18. So,

40 ([j − 1]/10)22 (1 − [j − 1]/10)18, Pr(A|Bj ) = (2.3.7) 22 for each j . Then Bayes’ theorem tells us that 1 40 22 18 11 22 ([j − 1]/10) (1 − [j − 1]/10) . Pr(Bj |A) = 11 1 40 22 18 i=1 11 22 ([i − 1]/10) (1 − [i − 1]/10) Figure 2.3 shows the posterior probabilities of the 11 partition elements after observing A. Notice that the probabilities of B6 and B7 are the highest, 0.42. This corresponds to the fact that the proportion of successes in the observed sample is 22/40 = 0.55, halfway between (6 − 1)/10 and (7 − 1)/10. We can also compute the probability that the next patient will be a success both before the trial and after the 40 patients. Before the trial, Pr(E41) = Pr(E1), which equals 1/2, as computed in (2.1.6). After observing the 40 patients, we can compute Pr(E41|A) using the conditional version of the law of total probability, (2.1.5): Pr(E41|A) =

11

Pr(E41|Bj ∩ A) Pr(Bj |A).

(2.3.8)

j =1

Using the values of Pr(Bj |A) in Fig. 2.3 and the fact that Pr(E41|Bj ∩ A) = Pr(E41|Bj ) = (j − 1)/10 (conditional independence of the Ei given the Bj ), we compute (2.3.8) to be 0.5476. This is also very close to the observed frequency of success. The calculation at the end of Example 2.3.7 is typical of what happens after observing many conditionally independent events with the same conditional probability of occurrence. The conditional probability of the next event given those that were observed tends to be close to the observed frequency of occurrence among the observed events. Indeed, when there is substantial data, the choice of prior probabilities becomes far less important.

84

Chapter 2 Conditional Probability

0.5 X

0.4

X

0.3 0.2 0.1

X X B1

0

X B2

X B3

X B4

B5

X B6

B7

B8

X X X B9 B10 B11

Figure 2.4 The posterior probabilities of partition elements after 40 patients in Example 2.3.8. The X characters mark the values of the posterior probabilities calculated in Example 2.3.7.

Example 2.3.8

The Effect of Prior Probabilities. Consider the same clinical trial as in Example 2.3.7. This time, suppose that a different researcher has a different prior opinion about the value of p, the probability of success. This researcher believes the following prior probabilities: Event

B1

B2

B3

B4

B5

B6

B7

B8

B9

B10

B11

p

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Prior prob.

0.00

0.19

0.19

0.17

0.14

0.11

0.09

0.06

0.04

0.01

0.00

We can recalculate the posterior probabilities using Bayes’ theorem, and we get the values pictured in Fig. 2.4. To aid comparison, the posterior probabilities from Example 2.3.7 are also plotted in Fig. 2.4 using the symbol X. One can see how close the two sets of posterior probabilities are despite the large differences between the prior probabilities. If there had been fewer patients observed, there would have been larger differences between the two sets of posterior probabilites because the observed events would have provided less information. (See Exercise 12 in this section.)

Summary Bayes’ theorem tells us how to compute the conditional probability of each event in a partition given an observed event A. A major use of partitions is to divide the sample space into small enough pieces so that a collection of events of interest become conditionally independent given each event in the partition.

Exercises 1. Suppose that k events B1, . . . , Bk form a partition of the sample space S. For i = 1, . . . , k, let Pr(Bi ) denote the prior probability of Bi . Also, for each event A such that Pr(A) > 0, let Pr(Bi |A) denote the posterior probability

of Bi given that the event A has occurred. Prove that if Pr(B1|A) < Pr(B1), then Pr(Bi |A) > Pr(Bi ) for at least one value of i (i = 2, . . . , k).

2.3 Bayes’ Theorem

2. Consider again the conditions of Example 2.3.4 in this section, in which an item was selected at random from a batch of manufactured items and was found to be defective. For which values of i (i = 1, 2, 3) is the posterior probability that the item was produced by machine Mi larger than the prior probability that the item was produced by machine Mi ? 3. Suppose that in Example 2.3.4 in this section, the item selected at random from the entire lot is found to be nondefective. Determine the posterior probability that it was produced by machine M2 . 4. A new test has been devised for detecting a particular type of cancer. If the test is applied to a person who has this type of cancer, the probability that the person will have a positive reaction is 0.95 and the probability that the person will have a negative reaction is 0.05. If the test is applied to a person who does not have this type of cancer, the probability that the person will have a positive reaction is 0.05 and the probability that the person will have a negative reaction is 0.95. Suppose that in the general population, one person out of every 100,000 people has this type of cancer. If a person selected at random has a positive reaction to the test, what is the probability that he has this type of cancer? 5. In a certain city, 30 percent of the people are Conservatives, 50 percent are Liberals, and 20 percent are Independents. Records show that in a particular election, 65 percent of the Conservatives voted, 82 percent of the Liberals voted, and 50 percent of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a Liberal? 6. Suppose that when a machine is adjusted properly, 50 percent of the items produced by it are of high quality and the other 50 percent are of medium quality. Suppose, however, that the machine is improperly adjusted during 10 percent of the time and that, under these conditions, 25 percent of the items produced by it are of high quality and 75 percent are of medium quality. a. Suppose that ﬁve items produced by the machine at a certain time are selected at random and inspected. If four of these items are of high quality and one item is of medium quality, what is the probability that the machine was adjusted properly at that time? b. Suppose that one additional item, which was produced by the machine at the same time as the other ﬁve items, is selected and found to be of medium quality. What is the new posterior probability that the machine was adjusted properly? 7. Suppose that a box contains ﬁve coins and that for each coin there is a different probability that a head will be obtained when the coin is tossed. Let pi denote the probability of a head when the ith coin is tossed (i =

85

1, . . . , 5), and suppose that p1 = 0, p2 = 1/4, p3 = 1/2, p4 = 3/4, and p5 = 1. a. Suppose that one coin is selected at random from the box and when it is tossed once, a head is obtained. What is the posterior probability that the ith coin was selected (i = 1, . . . , 5)? b. If the same coin were tossed again, what would be the probability of obtaining another head? c. If a tail had been obtained on the ﬁrst toss of the selected coin and the same coin were tossed again, what would be the probability of obtaining a head on the second toss? 8. Consider again the box containing the ﬁve different coins described in Exercise 7. Suppose that one coin is selected at random from the box and is tossed repeatedly until a head is obtained. a. If the ﬁrst head is obtained on the fourth toss, what is the posterior probability that the ith coin was selected (i = 1, . . . , 5)? b. If we continue to toss the same coin until another head is obtained, what is the probability that exactly three additional tosses will be required? 9. Consider again the conditions of Exercise 14 in Sec. 2.1. Suppose that several parts will be observed and that the different parts are conditionally independent given each of the three states of repair of the machine. If seven parts are observed and exactly one is defective, compute the posterior probabilities of the three states of repair. 10. Consider again the conditions of Example 2.3.5, in which the phenotype of an individual was observed and found to be the dominant trait. For which values of i (i = 1, . . . , 6) is the posterior probability that the parents have the genotypes of event Bi smaller than the prior probability that the parents have the genotyes of event Bi ? 11. Suppose that in Example 2.3.5 the observed individual has the recessive trait. Determine the posterior probability that the parents have the genotypes of event B4. 12. In the clinical trial in Examples 2.3.7 and 2.3.8, suppose that we have only observed the ﬁrst ﬁve patients and three of the ﬁve had been successes. Use the two different sets of prior probabilities from Examples 2.3.7 and 2.3.8 to calculate two sets of posterior probabilities. Are these two sets of posterior probabilities as close to each other as were the two in Examples 2.3.7 and 2.3.8? Why or why not? 13. Suppose that a box contains one fair coin and one coin with a head on each side. Suppose that a coin is drawn at random from this box and that we begin to ﬂip the coin. In Eqs. (2.3.4) and (2.3.5), we computed the conditional

86

Chapter 2 Conditional Probability

probability that the coin was fair given that the ﬁrst two ﬂips both produce heads. a. Suppose that the coin is ﬂipped a third time and another head is obtained. Compute the probability that the coin is fair given that all three ﬂips produced heads. b. Suppose that the coin is ﬂipped a fourth time and the result is tails. Compute the posterior probability that the coin is fair. 14. Consider again the conditions of Exercise 23 in Sec. 2.2. Assume that Pr(B) = 0.4. Let A be the event that exactly 8 out of 11 programs compiled. Compute the conditional probability of B given A.

independent with probability 0.01 of being defective. However, it is possible for the machine to develop a “memory” in the following sense: After each defective item, and independent of anything that happened earlier, the probability that the next item is defective is 2/5. After each nondefective item, and independent of anything that happened earlier, the probability that the next item is defective is 1/165. Assume that the machine is either operating normally for the whole time we observe or has a memory for the whole time that we observe. Let B be the event that the machine is operating normally, and assume that Pr(B) = 2/3. Let Di be the event that the ith item inspected is defective. Assume that D1 is independent of B.

15. Use the prior probabilities in Example 2.3.8 for the events B1, . . . , B11. Let E1 be the event that the ﬁrst patient is a success. Compute the probability of E1 and explain why it is so much less than the value computed in Example 2.3.7. 16. Consider a machine that produces items in sequence. Under normal operating conditions, the items are

a. Prove that Pr(Di ) = 0.01 for all i. Hint: Use induction. b. Assume that we observe the ﬁrst six items and the event that occurs is E = D1c ∩ D2c ∩ D3 ∩ D4 ∩ D5c ∩ D6c . That is, the third and fourth items are defective, but the other four are not. Compute Pr(B|D).

2.4 The Gambler’s Ruin Problem Consider two gamblers with ﬁnite resources who repeatedly play the same game against each other. Using the tools of conditional probability, we can calculate the probability that each of the gamblers will eventually lose all of his money to the opponent.

Statement of the Problem Suppose that two gamblers A and B are playing a game against each other. Let p be a given number (0 < p < 1), and suppose that on each play of the game, the probability that gambler A will win one dollar from gambler B is p and the probability that gambler B will win one dollar from gambler A is 1 − p. Suppose also that the initial fortune of gambler A is i dollars and the initial fortune of gambler B is k − i dollars, where i and k − i are given positive integers. Thus, the total fortune of the two gamblers is k dollars. Finally, suppose that the gamblers play the game repeatedly and independently until the fortune of one of them has been reduced to 0 dollars. Another way to think about this problem is that B is a casino and A is a gambler who is determined to quit as soon he wins k − i dollars from the casino or when he goes broke, whichever comes ﬁrst. We shall now consider this game from the point of view of gambler A. His initial fortune is i dollars and on each play of the game his fortune will either increase by one dollar with a probability of p or decrease by one dollar with a probability of 1 − p. If p > 1/2, the game is favorable to him; if p < 1/2, the game is unfavorable to him; and if p = 1/2, the game is equally favorable to both gamblers. The game ends either when the fortune of gambler A reaches k dollars, in which case gambler B will have no money left, or when the fortune of gambler A reaches 0 dollars. The problem is to

2.4 The Gambler’s Ruin Problem

87

determine the probability that the fortune of gambler A will reach k dollars before it reaches 0 dollars. Because one of the gamblers will have no money left at the end of the game, this problem is called the Gambler’s Ruin problem.

Solution of the Problem We shall continue to assume that the total fortune of the gamblers A and B is k dollars, and we shall let ai denote the probability that the fortune of gambler A will reach k dollars before it reaches 0 dollars, given that his initial fortune is i dollars. We assume that the game is the same each time it is played and the plays are independent of each other. It follows that, after each play, the Gambler’s Ruin problem essentially starts over with the only change being that the initial fortunes of the two gamblers have changed. In particular, for each j = 0, . . . , k, each time that we observe a sequence of plays that lead to gambler A’s fortune being j dollars, the conditional probability, given such a sequence, that gambler A wins is aj . If gambler A’s fortune ever reaches 0, then gambler A is ruined, hence a0 = 0. Similarly, if his fortune ever reaches k, then gambler A has won, hence ak = 1. We shall now determine the value of ai for i = 1, . . . , k − 1. Let A1 denote the event that gambler A wins one dollar on the ﬁrst play of the game, let B1 denote the event that gambler A loses one dollar on the ﬁrst play of the game, and let W denote the event that the fortune of gambler A ultimately reaches k dollars before it reaches 0 dollars. Then Pr(W ) = Pr(A1) Pr(W |A1) + Pr(B1) Pr(W |B1) = pPr(W |A1) + (1 − p)Pr(W |B1).

(2.4.1)

Since the initial fortune of gambler A is i dollars (i = 1, . . . , k − 1), then Pr(W ) = ai . Furthermore, if gambler A wins one dollar on the ﬁrst play of the game, then his fortune becomes i + 1 dollars and the conditional probability Pr(W |A1) that his fortune will ultimately reach k dollars is therefore ai+1. If A loses one dollar on the ﬁrst play of the game, then his fortune becomes i − 1 dollars and the conditional probability Pr(W |B1) that his fortune will ultimately reach k dollars is therefore ai−1. Hence, by Eq. (2.4.1), ai = pai+1 + (1 − p)ai−1.

(2.4.2)

We shall let i = 1, . . . , k − 1 in Eq. (2.4.2). Then, since a0 = 0 and ak = 1, we obtain the following k − 1 equations: a1 =pa2 , a2 =pa3 + (1 − p)a1, a3 =pa4 + (1 − p)a2 , .. . ak−2 =pak−1 + (1 − p)ak−3,

(2.4.3)

ak−1 =p + (1 − p)ak−2 . If the value of ai on the left side of the ith equation is rewritten in the form pai + (1 − p)ai and some elementary algebra is performed, then these k − 1 equations can

88

Chapter 2 Conditional Probability

be rewritten as follows: a2 − a 1 =

1−p a1, p

1−p a3 − a 2 = (a2 − a1) p 1−p a4 − a3 = (a3 − a2 ) p .. .

=

=

1−p (ak−2 − ak−3) = ak−1 − ak−2 = p 1−p 1 − ak−1 = (ak−1 − ak−2 ) = p

1−p p 1−p p 1−p p 1−p p

2 a1, 3 a1,

(2.4.4)

k−2 a1, k−1 a1.

By equating the sum of the left sides of these k − 1 equations with the sum of the right sides, we obtain the relation i k−1 1−p . (2.4.5) 1 − a1 = a1 p i=1

Solution for a Fair Game Suppose ﬁrst that p = 1/2. Then (1 − p)/p = 1, and it

follows from Eq. (2.4.5) that 1 − a1 = (k − 1)a1, from which a1 = 1/k. In turn, it follows from the ﬁrst equation in (2.4.4) that a2 = 2/k, it follows from the second equation in (2.4.4) that a3 = 3/k, and so on. In this way, we obtain the following complete solution when p = 1/2: ai =

Example 2.4.1

i k

for i = 1, . . . , k − 1.

(2.4.6)

The Probability of Winning in a Fair Game. Suppose that p = 1/2, in which case the game is equally favorable to both gamblers; and suppose that the initial fortune of gambler A is 98 dollars and the initial fortune of gambler B is just two dollars. In this example, i = 98 and k = 100. Therefore, it follows from Eq. (2.4.6) that there is a probability of 0.98 that gambler A will win two dollars from gambler B before gambler B wins 98 dollars from gambler A.

Solution for an Unfair Game Suppose now that p = 1/2. Then Eq. (2.4.5) can be rewritten in the form

1 − a 1 = a1

Hence,

k 1−p 1−p − p p

. 1−p − 1 p

1−p −1 p . a1 = k 1−p −1 p

(2.4.7)

(2.4.8)

2.4 The Gambler’s Ruin Problem

89

Each of the other values of ai for i = 2, . . . , k − 1 can now be determined in turn from the equations in (2.4.4). In this way, we obtain the following complete solution: i

1−p −1 p ai = for i = 1, . . . , k − 1. (2.4.9) k 1−p −1 p Example 2.4.2

The Probability of Winning in an Unfavorable Game. Suppose that p = 0.4, in which case the probability that gambler A will win one dollar on any given play is smaller than the probability that he will lose one dollar. Suppose also that the initial fortune of gambler A is 99 dollars and the initial fortune of gambler B is just one dollar. We shall determine the probability that gambler A will win one dollar from gambler B before gambler B wins 99 dollars from gambler A. In this example, the required probability ai is given by Eq. (2.4.9), in which (1 − p)/p = 3/2, i = 99, and k = 100. Therefore, 99 3 −1 2 1 2 = . ≈ ai = 100 3/2 3 3 −1 2 Hence, although the probability that gambler A will win one dollar on any given play is only 0.4, the probability that he will win one dollar before he loses 99 dollars is approximately 2/3.

Summary We considered a gambler and an opponent who each start with ﬁnite amounts of money. The two then play a sequence of games against each other until one of them runs out of money. We were able to calculate the probability that each of them would be the ﬁrst to run out as a function of the probability of winning the game and of how much money each has at the start.

Exercises 1. Consider the unfavorable game in Example 2.4.2. This time, suppose that the initial fortune of gambler A is i dollars with i ≤ 98. Suppose that the initial fortune of gambler B is 100 − i dollars. Show that the probability is greater than 1/2 that gambler A losses i dollars before winning 100 − i dollars. 2. Consider the following three different possible conditions in the gambler’s ruin problem: a. The initial fortune of gambler A is two dollars, and the initial fortune of gambler B is one dollar. b. The initial fortune of gambler A is 20 dollars, and the initial fortune of gambler B is 10 dollars. c. The initial fortune of gambler A is 200 dollars, and the initial fortune of gambler B is 100 dollars.

Suppose that p = 1/2. For which of these three conditions is there the greatest probability that gambler A will win the initial fortune of gambler B before he loses his own initial fortune? 3. Consider again the three different conditions (a), (b), and (c) given in Exercise 2, but suppose now that p < 1/2. For which of these three conditions is there the greatest probability that gambler A will win the initial fortune of gambler B before he loses his own initial fortune? 4. Consider again the three different conditions (a), (b), and (c) given in Exercise 2, but suppose now that p > 1/2. For which of these three conditions is there the greatest probability that gambler A will win the initial fortune of gambler B before he loses his own initial fortune?

90

Chapter 2 Conditional Probability

5. Suppose that on each play of a certain game, a person is equally likely to win one dollar or lose one dollar. Suppose also that the person’s goal is to win two dollars by playing this game. How large an initial fortune must the person have in order for the probability to be at least 0.99 that she will achieve her goal before she loses her initial fortune? 6. Suppose that on each play of a certain game, a person will either win one dollar with probability 2/3 or lose one dollar with probability 1/3. Suppose also that the person’s goal is to win two dollars by playing this game. How large an initial fortune must the person have in order for the probability to be at least 0.99 that he will achieve his goal before he loses his initial fortune? 7. Suppose that on each play of a certain game, a person will either win one dollar with probability 1/3 or lose one dollar with probability 2/3. Suppose also that the person’s goal is to win two dollars by playing this game. Show that no matter how large the person’s initial fortune might be,

the probability that she will achieve her goal before she loses her initial fortune is less than 1/4. 8. Suppose that the probability of a head on any toss of a certain coin is p (0 < p < 1), and suppose that the coin is tossed repeatedly. Let Xn denote the total number of heads that have been obtained on the ﬁrst n tosses, and let Yn = n − Xn denote the total number of tails on the ﬁrst n tosses. Suppose that the tosses are stopped as soon as a number n is reached such that either Xn = Yn + 3 or Yn = Xn + 3. Determine the probability that Xn = Yn + 3 when the tosses are stopped. 9. Suppose that a certain box A contains ﬁve balls and another box B contains 10 balls. One of these two boxes is selected at random, and one ball from the selected box is transferred to the other box. If this process of selecting a box at random and transferring one ball from that box to the other box is repeated indeﬁnitely, what is the probability that box A will become empty before box B becomes empty?

2.5 Supplementary Exercises 1. Suppose that A, B, and D are any three events such that Pr(A|D) ≥ Pr(B|D) and Pr(A|D c ) ≥ Pr(B|D c ). Prove that Pr(A) ≥ Pr(B).

C are independent. Suppose also that 4Pr(A) = 2Pr(B) = Pr(C) > 0 and Pr(A ∪ B ∪ C) = 5Pr(A). Determine the value of Pr(A).

2. Suppose that a fair coin is tossed repeatedly and independently until both a head and a tail have appeared at least once. (a) Describe the sample space of this experiment. (b) What is the probability that exactly three tosses will be required?

10. Suppose that each of two dice is loaded so that when either die is rolled, the probability that the number k will appear is 0.1 for k = 1, 2, 5, or 6 and is 0.3 for k = 3 or 4. If the two loaded dice are rolled independently, what is the probability that the sum of the two numbers that appear will be 7?

3. Suppose that A and B are events such that Pr(A) = 1/3, Pr(B) = 1/5, and Pr(A|B) + Pr(B|A) = 2/3. Evaluate Pr(Ac ∪ B c ). 4. Suppose that A and B are independent events such that Pr(A) = 1/3 and Pr(B) > 0. What is the value of Pr(A ∪ B c |B)? 5. Suppose that in 10 rolls of a balanced die, the number 6 appeared exactly three times. What is the probability that the ﬁrst three rolls each yielded the number 6? 6. Suppose that A, B, and D are events such that A and B are independent, Pr(A ∩ B ∩ D) = 0.04, Pr(D|A ∩ B) = 0.25, and Pr(B) = 4 Pr(A). Evaluate Pr(A ∪ B). 7. Suppose that the events A, B, and C are mutually independent. Under what conditions are Ac , B c , and C c mutually independent? 8. Suppose that the events A and B are disjoint and that each has positive probability. Are A and B independent? 9. Suppose that A, B, and C are three events such that A and B are disjoint, A and C are independent, and B and

11. Suppose that there is a probability of 1/50 that you will win a certain game. If you play the game 50 times, independently, what is the probability that you will win at least once? 12. Suppose that a balanced die is rolled three times, and let Xi denote the number that appears on the ith roll (i = 1, 2, 3). Evaluate Pr(X1 > X2 > X3). 13. Three students A, B, and C are enrolled in the same class. Suppose that A attends class 30 percent of the time, B attends class 50 percent of the time, and C attends class 80 percent of the time. If these students attend class independently of each other, what is (a) the probability that at least one of them will be in class on a particular day and (b) the probability that exactly one of them will be in class on a particular day? 14. Consider the World Series of baseball, as described in Exercise 16 of Sec. 2.2. If there is probability p that team A will win any particular game, what is the probability

2.5 Supplementary Exercises

that it will be necessary to play seven games in order to determine the winner of the Series? 15. Suppose that three red balls and three white balls are thrown at random into three boxes and and that all throws are independent. What is the probability that each box contains one red ball and one white ball? 16. If ﬁve balls are thrown at random into n boxes, and all throws are independent, what is the probability that no box contains more than two balls? 17. Bus tickets in a certain city contain four numbers, U , V , W , and X. Each of these numbers is equally likely to be any of the 10 digits 0, 1, . . . , 9, and the four numbers are chosen independently. A bus rider is said to be lucky if U + V = W + X. What proportion of the riders are lucky? 18. A certain group has eight members. In January, three members are selected at random to serve on a committee. In February, four members are selected at random and independently of the ﬁrst selection to serve on another committee. In March, ﬁve members are selected at random and independently of the previous two selections to serve on a third committee. Determine the probability that each of the eight members serves on at least one of the three committees. 19. For the conditions of Exercise 18, determine the probability that two particular members A and B will serve together on at least one of the three committees. 20. Suppose that two players A and B take turns rolling a pair of balanced dice and that the winner is the ﬁrst player who obtains the sum of 7 on a given roll of the two dice. If A rolls ﬁrst, what is the probability that B will win? 21. Three players A, B, and C take turns tossing a fair coin. Suppose that A tosses the coin ﬁrst, B tosses second, and C tosses third; and suppose that this cycle is repeated indeﬁnitely until someone wins by being the ﬁrst player to obtain a head. Determine the probability that each of three players will win. 22. Suppose that a balanced die is rolled repeatedly until the same number appears on two successive rolls, and let X denote the number of rolls that are required. Determine the value of Pr(X = x), for x = 2, 3, . . . . 23. Suppose that 80 percent of all statisticians are shy, whereas only 15 percent of all economists are shy. Suppose also that 90 percent of the people at a large gathering are economists and the other 10 percent are statisticians. If you meet a shy person at random at the gathering, what is the probability that the person is a statistician? 24. Dreamboat cars are produced at three different factories A, B, and C. Factory A produces 20 percent of the total output of Dreamboats, B produces 50 percent, and C produces 30 percent. However, 5 percent of the cars produced at A are lemons, 2 percent of those produced

91

at B are lemons, and 10 percent of those produced at C are lemons. If you buy a Dreamboat and it turns out to be a lemon, what is the probability that it was produced at factory A? 25. Suppose that 30 percent of the bottles produced in a certain plant are defective. If a bottle is defective, the probability is 0.9 that an inspector will notice it and remove it from the ﬁlling line. If a bottle is not defective, the probability is 0.2 that the inspector will think that it is defective and remove it from the ﬁlling line. a. If a bottle is removed from the ﬁlling line, what is the probability that it is defective? b. If a customer buys a bottle that has not been removed from the ﬁlling line, what is the probability that it is defective? 26. Suppose that a fair coin is tossed until a head is obtained and that this entire experiment is then performed independently a second time. What is the probability that the second experiment requires more tosses than the ﬁrst experiment? 27. Suppose that a family has exactly n children (n ≥ 2). Assume that the probability that any child will be a girl is 1/2 and that all births are independent. Given that the family has at least one girl, determine the probability that the family has at least one boy. 28. Suppose that a fair coin is tossed independently n times. Determine the probability of obtaining exactly n − 1 heads, given (a) that at least n − 2 heads are obtained and (b) that heads are obtained on the ﬁrst n − 2 tosses. 29. Suppose that 13 cards are selected at random from a regular deck of 52 playing cards. a. If it is known that at least one ace has been selected, what is the probability that at least two aces have been selected? b. If it is known that the ace of hearts has been selected, what is the probability that at least two aces have been selected? 30. Suppose that n letters are placed at random in n envelopes, as in the matching problem of Sec. 1.10, and let qn denote the probability that no letter is placed in the correct envelope. Show that the probability that exactly one letter is placed in the correct envelope is qn−1. 31. Consider again the conditions of Exercise 30. Show that the probability that exactly two letters are placed in the correct envelopes is (1/2)qn−2 . 32. Consider again the conditions of Exercise 7 of Sec. 2.2. If exactly one of the two students A and B is in class on a given day, what is the probability that it is A? 33. Consider again the conditions of Exercise 2 of Sec. 1.10. If a family selected at random from the city

92

Chapter 2 Conditional Probability

subscribes to exactly one of the three newspapers A, B, and C, what is the probability that it is A? 34. Three prisoners A, B, and C on death row know that exactly two of them are going to be executed, but they do not know which two. Prisoner A knows that the jailer will not tell him whether or not he is going to be executed. He therefore asks the jailer to tell him the name of one prisoner other than A himself who will be executed. The jailer responds that B will be executed. Upon receiving this response, Prisoner A reasons as follows: Before he spoke to the jailer, the probability was 2/3 that he would be one of the two prisoners executed. After speaking to the jailer, he knows that either he or prisoner C will be the other one to be executed. Hence, the probability that he will be executed is now only 1/2. Thus, merely by asking the jailer his question, the prisoner reduced the probability that he would be executed from 2/3 to 1/2, because he could go through exactly this same reasoning regardless of which answer the jailer gave. Discuss what is wrong with prisoner A’s reasoning. 35. Suppose that each of two gamblers A and B has an initial fortune of 50 dollars, and that there is probability p that gambler A will win on any single play of a game against gambler B. Also, suppose either that one gambler can win one dollar from the other on each play of the game or that they can double the stakes and one can win two dollars from the other on each play of the game. Under which of these two conditions does A have the greater probability of winning the initial fortune of B before losing her own for each of the following conditions: (a) p < 1/2; (b) p > 1/2; (c) p = 1/2? 36. A sequence of n job candidates is prepared to interview for a job. We would like to hire the best candidate, but we have no information to distinguish the candidates

before we interview them. We assume that the best candidate is equally likely to be each of the n candidates in the sequence before the interviews start. After the interviews start, we are able to rank those candidates we have seen, but we have no information about where the remaining candidates rank relative to those we have seen. After each interview, it is required that either we hire the current candidate immediately and stop the interviews, or we must let the current candidate go and we never can call them back. We choose to interview as follows: We select a number 0 ≤ r < n and we interview the ﬁrst r candidates without any intention of hiring them. Starting with the next candidate r + 1, we continue interviewing until the current candidate is the best we have seen so far. We then stop and hire the current candidate. If none of the candidates from r + 1 to n is the best, we just hire candidate n. We would like to compute the probability that we hire the best candidate and we would like to choose r to make this probability as large as possible. Let A be the event that we hire the best candidate, and let Bi be the event that the best candidate is in position i in the sequence of interviews. a. Let i > r. Find the probability that the candidate who is relatively the best among the ﬁrst i interviewed appears in the ﬁrst r interviews. b. Prove that Pr(A|Bi ) = 0 for i ≤ r and Pr(A|Bi ) = r/(i − 1) for i > r. c. For ﬁxed r, let pr be the probability of A using that value of r. Prove that pr = (r/n) ni=r+1(i − 1)−1. d. Let qr = pr − pr−1 for r = 1, . . . , n − 1, and prove that qr is a strictly decreasing function of r. e. Show that a value of r that maximizes pr is the last r such that qr > 0. (Hint: Write pr = p0 + q1 + . . . + qr for r > 0.) f. For n = 10, ﬁnd the value of r that maximizes pr , and ﬁnd the corresponding pr value.

Chapter

Random Variables and Distributions 3.1 3.2 3.3 3.4 3.5 3.6

3

Random Variables and Discrete Distributions Continuous Distributions The Cumulative Distribution Function Bivariate Distributions Marginal Distributions Conditional Distributions

3.7 3.8 3.9 3.10 3.11

Multivariate Distributions Functions of a Random Variable Functions of Two or More Random Variables Markov Chains Supplementary Exercises

3.1 Random Variables and Discrete Distributions A random variable is a real-valued function deﬁned on a sample space. Random variables are the main tools used for modeling unknown quantities in statistical analyses. For each random variable X and each set C of real numbers, we could calculate the probability that X takes its value in C. The collection of all of these probabilities is the distribution of X. There are two major classes of distributions and random variables: discrete (this section) and continuous (Sec. 3.2). Discrete distributions are those that assign positive probability to at most countably many different values. A discrete distribution can be characterized by its probability function (p.f.), which speciﬁes the probability that the random variable takes each of the different possible values. A random variable with a discrete distribution will be called a discrete random variable.

Deﬁnition of a Random Variable Example 3.1.1

Tossing a Coin. Consider an experiment in which a fair coin is tossed 10 times. In this experiment, the sample space S can be regarded as the set of outcomes consisting of the 210 different sequences of 10 heads and/or tails that are possible. We might