Basic Principles of Probability
and Statistics
Lecture notes for PET 472
Spring 2012
Prepared by: Thomas W. Engler, Ph.D., P.E
Definitions
Risk Analysis
Assessing probabilities of occurrence for each possible
outcome
Risk Analysis
Probabilities and prob. distributions
Representing judgments about chance
events
Modeling
Geologic, reservoir, drilling
Operations, Economics
Decision criteria
EV, profit, IRR
Present to management for decision
Definitions
Sample Space
Complete set of outcomes
(52 cards)
Outcome
Subset of the sample space
(drawing a 5 of any suit)
Probability
Likelihood of drawing a 5
P(A) = 4/52
Definitions
Equally likely outcomes
Have same probability to occur
Mutually exclusive outcomes
The occurrence of any given outcome excludes the
occurrence of other outcomes
Independent events
The occurrence of one outcome does not influence the
occurrence of another
Conditional probability
The probability of an outcome is dependent upon one or
more events that have previously occurred.
Rules of Operation
Symbol Definition Expression
P(A) Probability of outcome A occurring
P(A+B) Probability of outcome A and/or B P(A+B)=P(A)+P(B)-P(AB)
occurring
P(AB) Probability of A and B occurring P(AB) = P(A) P(B|A)
P(A|B) Probability of A given B has
occurred.
Rules of Operation Addition Theorem
P(A+B)=P(A)+P(B)-P(AB)
Example
12
outcome A drawing 4, 5, 6 of any suit P ( A )
52
outcome B J or Q of any suit 8
P(B)
52
20
P(A B)
52 P(AB) 0
Mutually
A Exclusive
B events
Venn Diagram
Rules of Operation Addition Theorem
P(A+B)=P(A)+P(B)-P(AB)
Example
12
outcome A drawing 4, 5, 6 of any suit P ( A )
52
outcome B drawing a diamond 13
P(B)
52
22
P(A B) 3
52 P(AB)
52
A
B
Venn Diagram
Rules of Operation Multiplication Theorem
P(AB)=P(A)P(A|B)
Example
4
outcome A drawing any jack P( A)
52
outcome B drawing a four of hearts P(B | A) 1
on the second draw 51
4 1 1
P(AB) conditional
52 51 663
Sampling without replacement
- observed outcome is not returned
- series of dependent events
Rules of Operation Multiplication Theorem
P(AB)=P(A)P(B)
Example
4
outcome A drawing any jack, return P( A)
52
outcome B drawing a four of hearts 1
P(B)
on the second draw 52
4 1 1
P(AB)
52 52 676
Sampling with replacement
- observed outcome is returned to sample space
- series of independent events
Example
Example: Exploration example involving conditional probabilities
Decision: drill prospect or farmout and retain an override
Tabulated gross per-well reserves for existing wells
Percent of wells
EUR Number having these
Bcf of wells reserves
2 7 35% P(B|A)
3 7 35% These are conditional probabilities. That is,
4 4 20% given a well is productive there is a 35%
5 2 10% chance of producing 2 Bcf.
20
NPV for alternatives
EUR Drill option Farmout option
Bcf NPV, $ EMV, $ NPV, $ EMV, $
2 40000 14000 9000 3150
3 90000 31500 12500 4375
4 130000 26000 15000 3000
5 200000 20000 18000 1800
91500 12325
Example
Dry hole cost = 70000
Probability of finding gas, P(A) = 0.25
Apply multiplication theorem
Probability of finding gas and that reserves are 2 Bcf? P(AB)
Possible
outcome P(A) P(B|A) P(AB)
dry hole 0.25 0.7500
2 Bcf 0.25 0.35 0.0875 P(AB) = P(A) P(B|A)
3 Bcf 0.25 0.35 0.0875
4 Bcf 0.25 0.20 0.0500
5 Bcf 0.25 0.10 0.0250
1.0000
EMV calculations
Possible Drill well Farmout
outcome P(AB) NPV, $ EMV, $ NPV, $ EMV, $
dry hole 0.7500 -70000 -52500 0 0
2 Bcf 0.0875 40000 3500 9000 788
3 Bcf 0.0875 90000 7875 12500 1094
4 Bcf 0.0500 130000 6500 15000 750
5 Bcf 0.0250 200000 5000 18000 450
1.0000 -29625 3081
Example
Find minimum probability required to justify drilling, (ps)min
ps EMV EMV
drill farmout
0 -70000 0
0.25 -29625 3081
50000 (ps)min = 0.47
30000
10000
-10000 0 0.1 0.2 0.3 0.4 0.5 ps 0.6
EMV, $
-30000
-50000
-70000
-90000
Probability Distributions
A graphical representation of the range and likelihoods of
possible values of a random variable
Probability density function
Random variable
f(x), frequency
a variable that can have more
than one possible value, also
known as stochastic or deterministic
x, random variable
Useful method to describe a range of possible values. Basis
for Monte Carlo Simulation.
Probability Distributions Frequency distributions
Data Range frequency Percent
Well No Net pay, ft Divide into intervals 50 - 80 4 20%
1 111 Or bins 81 - 110 7 35%
2 81 111 - 140 5 25%
3 142 141 - 170 3 15%
4 59 171 - 200 1 5%
5 109 20 100%
6 96
7 124
8 139
9 89 Histogram representation
10 129 Of statistical data
11 104
12 186
13 65 8 40%
14 95 7 35%
15 54 6 30%
16 72
5 25%
frequency
17 167
Percent
18 135 4 20%
19 84 3 15%
20 154
2 10%
1 5%
0 0%
50 - 80 81 - 110 111 - 140 141 - 170 171 - 200
Net Pay, feet
Probability Distributions Cumulative frequency distributions
Range frequency Percent
minimum Cumulative
50 - 80 4 20% Range Percent
81 - 110 7 35% 50 0%
111 - 140 5 25% 80 20%
141 - 170 3 15% 110 55%
171 - 200 1 5% 140 80%
20 100% 170 95%
200 100%
maximum
100%
80%
Cumulative percent
Benefits
1. Can easily read probabilities 60%
2. Necessary for Monte Carlo 40%
Simulation
20%
0%
0 50 100 150 200
Net Pay, feet
Parameters of distributions
A parameter that describes central tendency or average of the distribution
Mean, m weighted average value of the random variable
Median value of the random variable with equal likelihood above or below
Mode value most likely to occur
A parameter that describes the variability of the distribution
Variance, s2 mean of the squared deviations about the mean
Standard deviation, s square root of variancedegree of dispersion of distribution about
the mean
A
sa<sb
ma=mb
Parameters of distributions Computing mean and standard deviation
Depth k,md f, %
4807.5 2.5 17.0
1. Arithmetic average of discrete sample data set 4808.5
4809.5
59
221
20.7
19.1
4810.5 211 20.4
4811.5 275 23.3
N 4812.5
4813.5
384
108
24.0
23.3
xi 4814.5
4815.5
147
290
16.1
17.2
m i 1
4816.5 170 15.3
N number of equally-probable values 4817.5
4818.5
278
238
15.9
18.6
N 4819.5
4820.5
167
304
16.2
20.0
4821.5 98 16.9
4822.5 191 18.1
N 2 4823.5 266 20.3
( x i m)
4824.5 40 15.3
4825.5 260 15.1
4826.5 179 14.0
s i 1
4827.5 312 15.6
4828.5 272 15.5
4829.5 395 19.4
N 4830.5
4831.5
4832.5
405
275
852
17.5
16.4
17.2
4833.5 610 15.5
m 17.6 4834.5
4835.5
406
535
20.2
18.3
s
4836.5 663 19.6
2.87 4837.5
4838.5
597
434
17.7
20.0
4839.5 339 16.8
4840.5 216 13.3
4841.5 332 18.0
4842.5 295 16.1
4843.5 882 15.1
4844.5 600 18.0
4845.5 407 15.7
4847.5 479 17.8
4847.5 139 20.5
Core porosity and permeability 4847.5 135 8.4
m 17.6
s 2.87
Parameters of distributions Computing mean and standard deviation
2. Values listed as frequencies in groups
nixi
i index to denote number of intervals
m i n frequency of data points in each interval
ni x midpoint value of each interval
i
2
n i ( x i m ) Porosity ni pi xi m s2
i interval frequency prob. midpoint mean deviation variance
s i 1 7 x < 10 1 0.024 8.5 0.202 85.342 2.032
ni 2 10 x < 12 0 0.000 11.0 0.000 45.402 0.000
3 12 x < 14 1 0.024 13.0 0.310 22.450 0.535
i 4 14 x < 16 10 0.238 15.0 3.571 7.497 1.785
5 16 x < 18 12 0.286 17.0 4.857 0.545 0.156
6 18 x < 20 8 0.190 19.0 3.619 1.592 0.303
7 20 x < 22 7 0.167 21.0 3.500 10.640 1.773
8 22 x < 25 3 0.071 23.5 1.679 33.200 2.371
42 1.00 m 17.74 s2 = 8.96
s 2.993
Applicable for large data sets
Results are approximate
Parameters of distributions Computing mean and standard deviation
3. Discrete probability distributions
x midpoint
m pi x i drilling costs probability of range
$M $M
EV
$M
xi*pi
$M
(x i-m)2
($M) 2
p(x i)(x i-m)
($M) 2
i 100.0 0
105.2 0.007 102.6 0.7 0.7 1641.3 10.7
111.5 0.040 108.4 4.3 4.5 1208.5 48.3
2 130.6 0.229 121.1 27.7 29.9 486.8 111.5
s p i ( x i m) 136.3
148.2
0.093
0.225
133.5
142.3
12.4
32.0
12.7
33.3
93.4
0.7
8.7
0.2
i 165.2 0.278 156.7 43.6 45.9 184.6 51.3
168.7 0.035 167.0 5.8 5.9 568.2 19.9
178.5 0.066 173.6 11.5 11.8 929.5 61.3
183.7 0.021 181.1 3.8 3.9 1443.0 30.3
190.0 0.007 186.9 1.3 1.3 1912.9 13.4
m 143.1 149.9 355.6
s 15.8 s 18.9
pi is the probability of occurrence of the xith value
of the random variable
Parameters of distributions Computing mean and standard deviation
4. Cumulative frequency distribution
x midpoint
drilling costs probability of range EV xi*pi (x i-m)2 p(x i)(x i-m)
2
$M $M $M $M ($M) ($M) 2
100.0 0
105.2 0.007 102.6 0.7 0.7 1641.3 10.7
111.5 0.040 108.4 4.3 4.5 1208.5 48.3
130.6 0.229 121.1 27.7 29.9 486.8 111.5
136.3 0.093 133.5 12.4 12.7 93.4 8.7
148.2 0.225 142.3 32.0 33.3 0.7 0.2
165.2 0.278 156.7 43.6 45.9 184.6 51.3
168.7 0.035 167.0 5.8 5.9 568.2 19.9
178.5 0.066 173.6 11.5 11.8 929.5 61.3
183.7 0.021 181.1 3.8 3.9 1443.0 30.3
1.0
190.0 0.007 186.9 1.3 1.3 1912.9 13.4
0.9
m 143.1 149.9 355.6
0.8
s 15.8 s 18.9
Cumulative probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
100.0 120.0 140.0 160.0 180.0 200.0
Drilling Costs, $M
Types of distributions
Normal
Lognormal
Uniform
Triangle
Binomial
Multinomial
hypergeometric
Types of distributions Normal
Characteristics m
Define by m and s f(x)
Mode=mean=median s s
Curve is symmetric
x
Cumulative frequency graph is s shaped
Can normalize and obtain area (probability) under
the curve.
x m Cumulative frequency
t
s
x
Types of distributions Normal
Given a set of data how do you know whether it
is normally distributed?
Shape of curves
median = mean
Examples: porosity, fractional flow
m
Cumulative frequency
f(x)
s s
x x
Types of distributions Lognormal
mode
Characteristics median
m
Define by m and s f(x)
Modemeanmedian
Curve is asymmetric
x
Cumulative frequency graph exhibits rapid rise
Can transform to normal
variable by y=ln(x)
Cumulative frequency
x
Types of distributions Lognormal
Examples:
permeability
thickness
oil recovery (bbls/acre-foot)
field sizes in a play mode
median
m
f(x)
x
Types of distributions Uniform
Characteristics: f(x)
all values are equi-probable
specify min and max
allows for uncertainty min max x
used in Monte Carlo simulation
Cumulative frequency
100%
min max x
Types of distributions Triangle
Characteristics: f(x)
all values are equi-probable M, most likely
specify min and max
allows for uncertainty L, low H, high x
used in Monte Carlo simulation
Cumulative frequency
100%
min max x
Types of distributions Triangle
Convert to cumulative frequency plot:
normalize to a 0 to 1 scale: x' x L f(x)
HL
Define m as: ML M, most likely
m
HL
For x m, cumulative probability is given by:
L, low H, high x
2
( x )
P ( x )
m
For x > m,
2
(1 x )
P ( x ) 1
1 m
Types of distributions Triangle
Example f(x)
Estimated costs to drill a well vary from a minimum of $100,000
to a maximum of $200,000,with the most probable value at $130,000. M, 130
Convert the probability distribution to a cumulative
frequency distribution
L, 100 H, 200 x
x, random x' cumulative 1.0
variable normalized probability
(drilling costs) x 0.8
Cumulative probability
100 0.0 0.000
110 0.1 0.033 0.6
120 0.2 0.133
130 0.3 0.300 0.4
140 0.4 0.486
150 0.5 0.643
0.2
160 0.6 0.771
170 0.7 0.871
0.0
180 0.8 0.943
100 120 140 160 180 200
190 0.9 0.986
Drilling Costs, ($M)
200 1.0 1.000
Types of distributions Binomial
Describes a stochastic process characterized by:
1. Only two outcomes can occur
2. Each trial is an independent event
3. The probability of each outcomes remains constant over repeated trials
4. Binomial probability equation is given by:
n x nx
P( x ) C x p (1 p)
where
x = number of successes (0 x n)
n = total number of trials
p = probability of success on any given trial
and the combination of n things taken x at a time
n n!
Cx
x!(n x )!
Types of distributions Binomial
Example
Your company proposes to drill 5 wells in a new basin where the chance of
success is 0.15 per well
What is the probability of only one discovery in the five wells drilled?
What is the probability of at least one discovery in the 5-well drilling
program?
1.0
Number of P(x) Cumulative 0.9
0.8 Cumulative
discoveries P(x)
0.7
0 0.4437 0.4437
0.6
1 0.3915 0.8352
P(x)
0.5
2 0.1382 0.9734 0.4
0.3
3 0.0244 0.9978
0.2
4 0.0022 0.9999 0.1
5 0.0001 1.0000 0.0
0 1 2 3 4 5
Number of discoveries
Types of distributions Multinomial
Describes a stochastic process characterized by:
1. Any number of discrete outcomes
2. Each trial is an independent event
3. The probability of each outcomes remains constant over repeated trials
4. Multinomial probability equation is given by:
n! x1 x 2 xr
P( x1, x 2, ..., x r ) p1 p 2 ...p r
x1! x 2 !...x r !
where
r = number of possible outcomes
x1 = number of times outcome 1 occurs in n trials
x2 = number of times outcome 2 occurs in n trials
xr = number of times outcome r occurs in n trials
n = total number of trials
pr = probability of outcome r on any given trial
Types of distributions Multinomial
Example
Your company proposes to drill 10 wells in a new basin where the chance
of success is 15% per well
What is the probability of obtaining 7 dry holes, 2 fields in the 1-2 mmbbl
range and 1 field in the 8-12 mmbbl range?
outcome probability number of trials (wells) in program n= 10
range of
probability of dry holes x1 = 7
mmbbl outcome
1-2 0.08 probability of 1-2 mmbbl x2 = 2
2-4 0.04 probability of 2-4 mmbbl x3 = 0
4-8 0.02 probability of 4-8 mmbbl x4 = 0
8-12 0.01
0.150
probability of 8-12 mmbbl x5 = 1
probability
of dry hole 0.850 0.7%
Types of distributions Hypergeometric
Describes a stochastic process characterized by:
1. Any number of discrete outcomes
2. Each trial is dependent on the previous event (sampling without
replacement)
3. The probability of each outcomes remains constant over repeated trials
4. Hypergeometric probability equation for two possible outcomes:
d1 N d1
C C
P( x ) x n x
N
Cn
where
n=number of trials
di = number of successes in the sample space before the n trials
xi = number of successes in n trials
N = total number of elements in the sample space before the n trials
Cab = the number of combinations of a things taken b at a time.
Types of distributions Hypergeometric
Example
Our company has identified ten seismic anomalies of about equal size in a
new offshore area. In an adjacent area, 30% of the drilled structures were
oil productive.
If we drill 5 wells (test 5 anomalies) what is the probability of two
discoveries?
number_s a mpl e n= 5
number_pop N= 10
popul a ti on_s d1 = 3
s a mpl e_s x1 = 2
42%