Delivery Paper
Delivery Paper
Abstract
⋆ Updated manuscript with additional Appendices from citation (Gunning and Glin-
sky, 2004). Please cite as both (Gunning and Glinsky, 2004) and the website.
∗ Corresponding author.
Email address: [email protected] (James Gunning).
1 Introduction 5
4.3 Sampling 25
5 The software 28
6 Examples 29
7 Conclusions 34
2
5.2 Stochastic trace outputs 42
8.1 Inputs 45
8.2 Outputs 46
9.8 Inputs/Outputs 57
3
13.3 Recommendations 76
4
1 Introduction
The basic purpose of the seismic experiment in the oil exploration business has
always been the extraction of information regarding the location, size and na-
ture of hydrocarbon reserves in the subsurface. To say this is to also grant that
the analysis of seismic data is necessarily and always an inversion problem:
we do not measure reservoir locations and sizes; we measure reflected wave-
forms at the surface, from which information about potential reservoir zones
is extracted by a sequence of processing steps (stacking, migration etc) that
are fundamentally inversion calculations designed to put reflection responses
at their correct positions in time and space.
Such a step is in fact no longer contentious: it has long been agreed in the geo-
physical community that seismic inversion tools ought to use whatever rock–
physics knowledge is regionally available in order to constrain the considerable
non–uniqueness encountered in traditional inversion methodologies. Further,
in an exploration or early–appraisal context, some limited log or core data are
usually available, from which additional constraints on likely fluid types, time
and depth horizons etc can be constructed.
5
ods like sparse–spike inversion, even though these methods have a sound geo-
physical basis. Conversely, detailed multi–property models of the reservoir –
such as geostatistical earth models – are often weak in their connection to
geophysical principles: the relationship to seismic data is often embodied in
arbitrary regression or neural–network mappings that implicitly hope that the
correct rock physics has been divined in the calibration process.
6
prior model of the rock physics and geological surfaces of interest will always
be non–trivial, though repeated experience of such an exercise will reduce the
required effort with time. This effort is justified by the fact that a Bayesian
inversion constructed around an appropriate prior will always produce more
reliable predictions than an inversion technique which does not integrate the
regional rock physics or geological knowledge. This fact will be obvious if we
do the thought experiment of asking what happens when the seismic data
has poor signal to noise ratio. We assert that the use of Bayesian model–
based inversion techniques should become far more widespread once the first
mentioned obstacle above is overcome.
The main content of the paper is laid out as follows; in section 2 the overall
framework and design of the inversion problem is outlined. Section 2.1 describes
the basic model and suitable notation, section 2.2 outlines the construction of
the prior model, and section 3 describes the forward model and associated
likelihood. Section 4 covers the problems of mode mapping and sampling from
the posterior. An outline of the software is provided in section 5: it is released
under a generic open–source licence rather like the popular GNU and open–
BSD style licenses. A discussion of a suite of example/test cases is given in
section 6, and conclusions are offered in section 7.
7
1998) is piped to the routine in conjunction with a set of parameters describing
the local prior model, also in SU format. Stochastic realisations are then drawn
from the posterior and written out, also in SU format. The details of these for-
mats are discussed in section 5. Inverting on a trace-by–trace basis amounts
to an assumption that the traces are independent, and this can be guaranteed
by decimating the seismic data to a transverse sampling scale equal to the
longest transverse correlation appropriate for the local geology. Spacings of a
few hundred metres may be appropriate. Working with independent traces has
the great advantage that the inversion calculation is massively parallelizable,
and the computation may be farmed out by scatter–gather operations on clus-
ter systems. Finer scale models may then be reconstructed by interpolation if
desired.
From another point of view, the sheer difficulty of rigorously sampling from
inter–trace correlated inversion problems is the price of modelling at a scale
finer than the transverse correlation length of the sediments (and/or surfaces)
of interest, which is commonly several hundred metres or more. We know from
the Nyquist theorem that any random signal can be largely reconstructed by
sampling at the Nyquist rate corresponding to this correlation length, and
intermediate values can be recovered by smooth interpolation. This is a strong
argument for performing inversion studies at a coarser scale than the fine scale
(say 10-30m) associated with the acquisition geometry 2
8
probably only possible in systems with fully multi–Gaussian distributions of
properties, but such a restriction is too great when we wish to study systems
with strong nonlinear effect like fluid substitution and discrete components like
layer pinchouts or uncertain fluids. The latter, more interesting problems only
become possible if we reduce the transverse sampling rate.
The model used in this inversion is somewhat coarser than that commonly
used in geocellular models. At each trace location, the time–region of interest
is regarded as a stack of layers, typically a metre to several tens of metres
in size. Each layer is generically a mixture of two “end-member” rock types:
a permeable reservoir member, and an impermeable non–reservoir member.
the balance of these two is determined by a layer net–to–gross (NG ), and the
internal structure of mixed layers (0 < NG < 1) is assumed to be finely hori-
zontally laminated, so acoustic properties can be computed using an effective
medium theory appropriate for this assumption. Seismic energy reflects at the
boundaries between layers, producing a surface signal that may be syntheti-
cally computed, given an incident wavelet and all the requisite rock properties.
At the current trace location, the set of rock layers in the inversion region is
fundamentally modelled in time rather than depth. The depth d enters as a
relatively weakly controlling parameter of the rock properties, but time is a
fundamental variable in terms of computing seismic responses, so is the better
choice of variable for the basic parameterisation.
Models in depth are more fundamentally useful for reservoir decision making,
but we adopt a flow wherein depth models can be created from any time
model by a simple post–processing step. Depth constraints can take the form
of thickness constraints or absolute depth constraints, and both of these can be
applied either through the prior parameters or the isopach criterion we discuss
later. Either way, the generation of depth models will require the specification
of a reference layer, from which all layer depths will be hung or supported
as required. This scheme ensures that depth and time models are mutually
consistent.
The model consists of Nl layers, with ti the top of layer i. Layer i is bounded
by the times (ti , ti+1 ), i = 1...Nl . An additional parameter, tbase , is required to
specified the bottom of the model. Figure 1 shows a cartoon of the model.
9
Reflection coefficients Synthetic seismic
t t
Fig. 1. Schematic of the local layer–based model and its notation. For description
of the model parameters see the main text. The reflection coefficient sequence and
synthetic seismic are part of the forward model of section 3.
The properties of permeable rocks that we explicitly model are the p–wave
velocity vp,s , the shear velocity vs,s , and porosity φs , but for impermeable
members we use vp,m , vs,m , and density ρm . Facies are assumed isotropic. These
rock properties are in general controlled by a loading curve which depends
primarily on depth but also possibly a low–frequency interval velocity (LFIV)
(derived perhaps from the migration), as explained in section 2.2.3.
Permeable members that are susceptible of fluid substitution will also require
knowledge of the dry matrix grain properties, and saturations, densities and p–
wave velocities of the fluids undergoing substitution. For a particular rock type,
the grain properties are taken as known, but the fluid saturations, densities
and velocities can form part of the stochastic model.
The set of parameters describing the full acoustic properties for layer i, bounded
by times ti−1 , ti , with hydrocarbon h present, is then
at each trace. A i subscript is implicit for all quantities. If the layer is a pure
impermeable rock (NG = 0), this simplifies to
The union of these parameters with the layer times ti (and the final bottom-
layer base time tbase ) then forms the full model vector of interest. Since the
times are the dominant variables, it is convenient to arrange them to occur
10
first in the full model vector, so this is assembled as
The extra stochastic factors A, B affect the synthetic seismic and are explained
in section 3.1.4. Hereon we use dm for the dimensionality of the model m.
Before inversion, the times ti are not known precisely, but are modelled as
stochastic variables with prior distributions N (t¯i , σt2i ). The mean and standard
deviation are estimated from horizon picking and approximate uncertainty es-
timates (e.g. a half–loop or so may be chosen for the σti ). The prior distribution
on layer times is supplemented by an ordering criterion which says that layers
cannot swap:
ti ≥ ti−1 , (4)
so the prior for the layer times is actually a product of truncated Gaussians
with delta functions at the endpoints absorbing the mis–ordered configurations.
A typical scenario is that the top and bottom of a large packet of layers have
been picked reasonably accurately, but the time–location of the intermediate
surfaces is not well known.
The truncation rules allow the useful possibility of pinchouts. For example,
layer 2 may disappear if we have t3 = t2 , in which case the interface at this
time comprises a contrast between layer 1 properties and layer 3 properties,
providing t3 > t1 . Modelling pinchouts in this way then requires a specific
algorithm for determining where the pinchouts occur from a given set of (un-
truncated) layer times ti . We have used this scheme:
(1) Form a sort order based on the layer prior time distributions uncertain-
11
ties σti (sorting in increasing order), and from the top layer down as a
secondary sort order if some σti ’s are identical.
(2) From the set of candidate time samples {ti } (which will not in general
satisfy (4)), proceed to fix the times in the sort order above, truncating
values to preserve the ordering criterion (4) as we proceed. For example,
if the sort order is {1,4,2,3}, and we have a set t1 < t4 < t3 < t2 , this will
end up truncated at t1 , (t2 = t4 ), (t3 = t4 ), t4 .
This recipe is designed to allow the “better picked” horizons higher priority in
setting the layer boundary sequence.
The modeller will have formed beliefs about the probability of certain kinds
of hydrocarbons in each layer, informed by non-seismic sources such as pres-
sure data or resistivity logs. A relative prior probability is assigned to each
hydrocarbon type in each layer on this basis. Specifically, each layer i may
bear fluids; oil (o), gas (g), brine (b), or low–saturation gas (l). The modeller
must specify the prior probabilities of each of these phases, on a layer basis,
as Fio , Fig , Fib , Fil respectively, with Fio + Fig + Fib + Fil = 1.
The default density ordering is {b, l, o, g}, with low–saturation gas placed
between brine and oil because it usually arises as a residual saturation trail
behind migrating hydrocarbons.
The set of possible fluids in each layer (as implied by the prior probabilities)
are combined with a rule for density ordering to then enumerate a discrete set
of possible fluid combinations k = 1 . . . NF . 3 Suppose the fluid–combination
3 E.g. a two layer system under ordering rule (3), where it is known that gas (and
12
k corresponds to the set of fluid labels fik ∈ {b, l, o, g}, i = 1 . . . Nl . Then the
prior probability of this fluid–combination is taken to be
Q
Fi,f
Pk = PNF i Q ik . (5)
k′ =1 i Fi,fik′
Note that in multi–layer systems, this makes the marginal prior probability
of obtaining a certain fluid in a given layer quite different to the prior prob-
ability specified on a layer basis, simply because of the ordering criterion 4 .
In subsequent discussions, the prior probability of each distinct fluid combi-
nation k enumerated in this way is denoted Pk . Sometimes, it is desirable to
set fluid probabilities based on data observed after the density–ordering mech-
anisms are applied, or, equivalently, “post–ordering joint prior probabilities”.
Alternative mechanisms to achieve this end are described in Appendix 15.
Net–to–gross
2
The net–to–gross prior distribution is taken as N (NG , σN G
), truncated within
[0, 1]. The mean and standard deviation of this distribution can be determined
by consultation with the geologist. A broad prior may be used to reflect un-
certain knowledge.
low sat. gas) cannot occur (Fig = Fil = 0), may have the allowable set {(brine,
brine) : (oil, brine) : (oil, oil)}, so NF = 3.
4 For example, in the two–layer problem of the preceeding footnote, if the prior
probability of oil in each layer had been specified as, say, 0.6, the 3 combina-
tions would have probabilities proportional to 0.42 , 0.6 × 0.4, 0.62 respectively, or
0.21, 0.316, 0.474 after normalisation, so the post–ordering prior probability of oil in
layer 2 is 0.474, and in layer 1 is 0.316 + 0.474 = 0.79. For an alternative style of
fluid probability specification, see Appendix 15
5 The algorithms in this code treat truncated Gaussian distributions as a mixture
of a true truncated Gaussian plus delta functions at the trunction points, of weight
equal to the truncated probability. This has the advantage of giving non–vanishing
probability to certain physically reasonable scenarios, like NG = 0, or a layer pin-
chout.
13
Trend curves
The trend curves are a set of rock-matrix velocities (p–wave, s–wave) vpf ,
vsf , and density ρf (or porosity φf ) regressions for all end-member rocks. For
reservoir–members, the porosity is used in preference to the density 6 , and the
set of regressions is
For impermeable rocks, the porosity is of no direct interest, and the density
is regressed directly on vpf using a Gardner–Gardner–Gregory (GGG) type
relationship:
B
log ρf = (log Aρ + Bρ log vpf ) ± σρf or: ρf = Aρ vpfρ ± σρf
vsf = (Avs + Bvs vpf ) ± σsf (8)
vpf = (Avp + Bvp × depth + Cvp × LFIV) ± σpf .
14
The regression errors (σpf ,σsf , . . . ) used in these relationships are the predic-
tion errors formed from linear regression studies, which yield t–distributions
for the predictive distribution. We approximate this result by taking the prior
to be of Normal form, with variance set to the regression variance. For exam-
2
ple, the prior for vp,f is N (Avp + Bvp d + Cvp LFIV, σpf ). This approximation is
exact in the limit of large data.
The Bayesian paradigm requires a likelihood function which specifies how prob-
able the data are, given a particular model. This requires calculation of a syn-
thetic seismic trace from the suite of layers and their properties, and forming
the likelihood by comparing the seismic data and the synthetic seismic. The
forward seismic model is a simple convolutional model, which treats layers as
isotropic homogeneous entities with effective properties computed from the
successive application of Gassman fluid substitution in the permeable rocks
and Backus averaging with the impermeable rocks.
15
3.1 Computing the synthetic seismic
When two different isotropic rocks are mixed at a fine (sub–seismic resolu-
tion) scale with strong horizontal layering, it is well known that the effective
acoustic properties of the mixture can be computed using the Backus average
(Mavko et al. (1998), p. 92). This assumes that the wavelengths are long com-
pared to the fine–scale laminations represented by the net–to–gross measure.
The standard formulae are
1 NG 1 − NG
= + , (9)
Meff Mpermeable Mimpermeable
where M can stand for either the p-wave (M ) or shear (µ) modulus (µeff =
2
ρeff vs,eff , Meff = Keff + 43 µeff = ρeff vp,eff
2
), and the effective density is
−1 Six 1 − Six
Kfluid = + . (11)
Kix Kib
When the effective fluid replaces brine in the pore space, the saturated rock has
effective elastic parameters that are computed from the usual low-frequency
Gassman rules (Mavko et al., 1998): viz
The Mpermeable in the Backus relation (9) will be the p–wave modulus for perme-
able rock after fluid substution via Gassman (only the permeable end–member
will undergo fluid substitution). Here Kdry is the dry rock bulk modulus, Kg is
the bulk modulus of rock mineral grains, Kfluid the bulk modulus of the substi-
tuted fluid (gas/oil/brine/low-sat. gas), and Ksat the effective bulk modulus of
the saturated rock. The shear modulus µ is unchanged by fluid substitution.
16
7
brine and hydrocarbon h), the Gassman law can be written
A typical computation sequence for computing the set of effective properties for
a laminated, fluid–substituted rock layer would run as described in Appendix 1.
Prior to computing the synthetic seismic, the effective properties of all the
layers must be computed following this recipe.
Ssyn ≡ w ∗ R, (14)
where we use an FFT for the convolution, and w and R will be discretised at
the same sampling rate ∆t as the seismic data set S for the trace. The set of
delta functions in R are projected onto the discretised time–grid using a 4–
point Lagrange interpolation scheme (Abramowitz and Stegun, 1965) based on
the nearest 4 samples to the time of a spike. This ensures the synthetic seismic
has smooth derivatives with respect to the layer times, a crucial property in
the minimisation routines that are described in section 4.1.
Available data traces S may be near or far-offset (or both), and an appropriate
wavelet w will be provided for each. The P–P reflection coefficient for small
layer constrasts and incident angles θ is 8
³ ´
à ! ∆ρ 2 ∆v s
1 ∆ρ ∆v p ∆v p 2 vs 2 ρ
+ vs
Rpp = + + θ2 − , (15)
2 ρ vp 2 vp vp 2
with
17
ρ = (ρ1 + ρ2 )/2 (16)
vp = (vp,1 + vp,2 )/2 (17)
vs = (vs,1 + vs,2 )/2 (18)
∆ρ = ρ2 − ρ1 (19)
∆vp = vp,2 − vp,1 (20)
∆vs = vs,2 − vs,1 (21)
and layer 2 below layer 1. All properties in these formulae are effective prop-
erties obtained as per Appendix 1. The coefficient of θ2 is usually called the
AVO gradient.
³ ´
à ! 2 ∆ρ 2 ∆v s
A ∆ρ ∆v p ∆v p 2 vs ρ
+ vs
Rpp (A, B) = + + Bθ2 − , (22)
2 ρ vp 2 vp vp 2
where the factors A and B may be stochastic, typically Gaussian with means
close to unity and small variances: A ∼ N (Ā, σA ), B ∼ N (B̄, σB ) 9 .
Explicitly, θ2 for a given stack is obtained from the Dix equation. The stack
has average stack velocity Vst , event–time Tst and mean–square stack offset
3 3
(Xst,max − Xst,min )
hXst2 i = , (23)
3(Xst,max − Xst,min )
2
vp,1
θ2 = . (24)
Vst4 Tst2 /hXst2 i
9 In some cases, there are uncertainties with the overall wavelet scale. The facility
to lock A and B also exists, so B = A ∼ N (Ā, σA ) is then used in the prior.
18
à !
θ ∆ρ ∆ρ ∆vs
Rps = − + 2β( +2 )
2 ρ ρ vs
" #
θ3 2 ∆ρ ∆vs
+ (1 + 8β + 9β ) + β(16 + 24β) . (25)
12 ρ vs
The likelihood function associated with the synthetic seismic mismatch is con-
structed as
X
Lseis = exp(−ferror (Ssyn − S)2 /2σs2 )), (26)
error–sampling points
where the range of the error–sampling points is computed from the prior mean–
layer time range and the wavelet characteristics as per figure 2. Within this
range, the points used in the error sum are spaced at the maximum multiple
of the seismic sampling time p∆t which is smaller than the optimal error
sampling time ∆Ts derived in Appendix 2 (i.e. p∆t ≤ ∆Ts ). The correction
factor ferror ≡ p∆t/∆Ts , is designed to recovery the error that would obtain if
the error trace were discretised at exactly ∆Ts . The signal–to–noise ratio SN
is implied by the noise level σs supplied by the user, with typically SN ≈ 3.
The stack information required by the likelihood model thus comprises the pa-
rameters Vst (stack velocity to event), Xst,max (far offset), Xst,min (near offset),
and Tst (event time), a wavelet appropriate for the stack, and the noise level
σs .
For certain layers there may be a constraint on the thickness which is obtained
from well data, or kriged maps of layer thicknesses constrained to well data.
Let j = 1, . . . Nt label the layers on which such a constraint is imposed. The
thickness of the layer must match a known thickness ∆Zj , within an error σ∆Zj
specified. This can be represented in a likelihood function
X (vj,p,eff (tj − tj−1 ) − ∆Zj )2
Liso = exp(− 2
) (27)
j 2σ∆Z j
19
Region above model in which
Error trace
assumption of no reflectors is implied
wavelet coda
t=0
Tabove
Tbelow
t=0
wavelet precursor
Fig. 2. Diagram indicating range over which the error samples are computed. In
general Tabove,below are free parameters, but we take Tabove = wavelet–coda and
Tbelow = wavelet–precursor as a sensible choice, since this allows the full reflection
energy from the top and bottom reflections to enter the error count for models not
too different to the mean prior model. The error is usually acquired by a sum over
a subsampled range (circled samples), the sub–sampling rate computed as per the
main text.
20
omitted if the layer contains brine. Any fixed or irrelevant parameters are
discarded. Different possible fluid combination will then yield model vectors
mk of different dimension.
Model–selection problems can proceed in two ways. The first is to seek the
marginal probability of the kth model P (k|d) by integrating out all the conti-
nous variables mk in the problem and then drawing samples via the decomposi-
tion P (k, mk |d) = P (mk |k, d)P (k|d). The problem is to find reliable estimates
of P (k|d) when the necessary integrals cannot be done analytically. Methods
to form such estimates using MCMC methods are described in Raftery (1996),
and tend to focus on harmonic means of the likelihood computed with MCMC
samples drawn from the conditional P (mk |k, d). Preliminary experiments have
shown some difficulty in stabilising the various estimators described by Raftery.
The best estimator we have found is the full Laplace estimator, based on the
posterior covariance matrix obtained by Newton updates at the mode. This
forms the basis of an “independence sampler”, whose implementation details
are described in Appendix 4.
The second method samples for the full posterior P (k, mk |d) by constructing
a hybrid MCMC scheme that make jumps between the various models as well
as jumps within the continuous parameter space of each model. Such methods
are variously described as model jumping methods, e.g. Andrieu et al. (2001),
Phillips and Smith (1996); we implement the methods described by Adrieu.
For a fixed fluid combination k (of prior probability Pk ), the posterior dis-
tribution of the model parameters mk should clearly be proportional to the
product of the prior distributions with all the applicable likelihoods:
21
Efficient sampling from the posterior will require location of all the feasible
modes for a particular fluid combination, plus an estimate of the dispersion at
each mode. The latter is usually taken as the local covariance associated with
a Gaussian approximation to the local density.
Location of the modes is usually performed by searching for minima of the neg-
ative log–posterior starting from strategic points. For each fluid combination,
the full model vector m can easily have dimension of d = O(50) or more, so
this minimisation is quite demanding. The log–posterior surface is smooth for
all parameters except where a layer pinches out, since the disappearing layer
causes an abrupt discontuity in the reflection coefficient. Such problems can
be tackled with variable metric methods like BFGS minimizers 10 specially
adapted to detect solutions that lie along the lines of dicontinuity (typically at
some ti+1 = ti ). Still, very best-case d–dimension minimisation usually requires
O(d2 ) function evaluations, with the coefficient being in the order of 10-100.
Observe also that the function evaluation is a synthetic seismic (over Nt sam-
ples), plus prior, plus isopach, so the work count for this is O(d2 + Nt log(Nt )),
which is typically O(103 ) flops.
This accounting shows that the overall minimisation, performed over all d pa-
rameters will be quite slow. In practice, some of the parameters affect the
posterior much more strongly than others. Layer times, p–velocities and den-
sities have strong effects, whereas, e.g. hydrocarbon saturations have rather
weak ones. This observation leads to a scheme wherein the minimisation is
carried out over a subset of important parameters (say 2 or three per layer),
and the remainder are estimated after the minimisation terminates using a
sequence of Newton iterates as follows.
22
X = ∇f (m0 ) (31)
C̃m = (X T CD
−1
X + Cm −1 −1
) (32)
m̃ = C̃m (X T CD
−1 −1
(y + Xm0 − f (m0 )) + Cm m̄) (33)
where CD = diag(σi2 ). Here the f (m) will be read from the seismic (26) or
isopach likelihoods (27).
The updates encountered in the code are all cases where the error covariance
is diagonal, so the formulae are more simply stated in terms of the scaled
residuals e:
−1/2
e(m0 ) = CD (f (m0 ) − y) (34)
X̃ ≡ ∇e(m0 ) (35)
d = X̃m0 − e(m0 ) (36)
C̃m = (X̃ T X̃ + Cm
−1 −1
) (37)
T −1
m̃ = C̃m (X̃ d + Cm m̄) (38)
The gradient X̃ is evaluated by finite differences, and the scheme can be it-
erated by replacing m0 ← m̃ at the end. It converges quadratically to the
true mode if the path crosses no layer-pinchouts. Typically, after minimisation
using, say, BFGS methods over the dominant parameters in m, only a few New-
ton iterates are required to get decent estimates of the remaining parameters,
and the posterior covariance matrix C̃m comes along for free.
For this reason, we always perform a small set of Newton updates (say 5) to
the prior based on the isopach likelihood before commencing a minimisation
step for the entire posterior. The “partial–posterior” formed from the product
of the prior and isopach constraint will be approximately multi–Gaussian, and
the mode and covariance of this distribution are then used to choose starting
points for a set of subsequent minimisation calls. The subset of parameters
used in the modified BFGS minimisation routines are typically the layer times
ti , the impermeable layer p–velocities vp and the net–to–gross NG . Other com-
binations are conceivable and perhaps better. The initial starting values of the
non–time parameters are taken from the “partial–posterior” mean, and a set
23
of starting times are assembled as per the description in Appendix 3. A run
of subspace minimisations looping over this set of starting configurations will
yield a set of points hopefully in the vicinity of global modes. Duplicates are
eliminated. Figure 3 indicates how the sequence of dispersed starting points,
subspace minimisation, and global Newton methods converge to the desired
separate modes.
Newton steps
start
subspace minima
start
Newton steps
vs
The remaining candidate points then undergo a set of Newton updates (about
10) for the full model. Subsequently, any modes that look wildly implausible
are eliminated 11 . If a mode is acceptable, the Newton-updated mean and
covariance are stored, along with a measure of the mode weight such as the
likelihood at the peak of the mode, and the Laplace–estimator weight, equa-
tion (60), described in Appendix 4. Diagnostic routines capable of drawing
graphs of the log posterior centered at the mode are useful in checking the
character of the mode and the adequacy of the Gaussian approximation.
In cases where the seismic likelihood value obtained after these local optimi-
sations looks poor (or for other reasons), a global optimisation step may be
initiated operating on the subvector of layer times, with non-time properties
fixed at their prior+isopach means. The initiation conditions for this step are
11the remaining freedom in the frozen parameters is unlikely to redeem a bad loop–
skip
24
designed to detect trapping of the BFGS/Newton optimisers in poor local so-
lutions. The globally optimal t values are then used as starting points for the
Newton optimiser. The global optimiser uses the Differential Evolution genetic
algorithm of Storn and Price (1997).
4.3 Sampling
The overall Markov Chain for the sampling is generated by a kernel that flips
rapidly between two kinds of proposals: (a) jumps to different models, and (b)
jumps within the current model. Currently the scheme chooses one of these
moves with a 50% probability at each step, and performs a within–model jump
or model–jump according to the schemes described below. This is the mixture
hybrid–kernel of section 2.3 in Brooks (1998). Special “moves” for unusual
situations can also be devised, and are generally safe to use providing they are
drawn independently of the current state of the chain.
Hence, for random walks within a fixed model, we use the RWM sampler where
the covariance used is that produced by the sequence on Newton updates at
25
t NG
9 9
8 8
7 7
6 6
5 5
4 4
4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 0.2 0.4 0.6 0.8 1.0 1.2
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
8600 8800 9000 9200 9400 9600 9800 3200 3400 3600 3800 4000 4200 2.30 2.32 2.34 2.36 2.38 2.40 2.42
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
8400 8600 8800 9000 9200 9400 3200 3400 3600 3800 4000 4200 0.26 0.28 0.30 0.32 0.34 0.36
the mode. The initial state is taken at the mode peak. Jumps that produce an
unphysical state are treated as having zero likelihood. The acceptance proba-
bility for a jump mold → mnew is the usual rule
Π(mnew |S)
α = min(1, . (40)
Π(mold |S)
26
Note that the actual proposal probability q does not need to be evaluated, but
the appropriate scaling for it is crucial.
where m′21 is drawn from a suitable proposal distribution q12 (m′21 ), which we
usually take to be the prior for those components, since the fluid parameters
are uncorrelated with the rest of the model in the prior:
Y
q12 (m′21 ) = pj (m′21,j ) (42)
new fluid components parameters j in model
where π(·) is the full posterior density of equation (28), including the dimension–
varying terms in the prior.
In the thought–experiment limit that the likelihood functions are trivially unity
(we “switch” them off) this jumping scheme will result in certain acceptance,
27
as the qij (·) densities will cancel exactly with corresponding terms in the prior.
The likelihoods are expected to be moderately weak functions of the fluid prop-
erties, so this seems a reasonable choice. Further, in many practical scenarios,
the fluid constants may be sufficiently well known that they are fixed for the
inversion, in which case the models share all the parameters, and the q terms
disappear from equation (43).
Mixing has generally been found to be very good when jumps are performed
between various fluid states using this scheme. When multimodality in the
posterior due to loop–skipping occurs, mixing can be more problematic, and
the posterior samples should be checked carefully to ensure mixing is adequate.
Various options for detecting and auto–correcting for this problem, using dec-
imation estimates computed from the time–series in the Markov chain (Ch. 7,
Gelman et al. (1995)), have been added to the code.
This brief description has not elaborated on the machinery to deal with mul-
tiple local modes occuring for specific fluid states, and how to do the sampling
in this context. This detail can be found in Appendix 6.
5 The software
28
to be as fast as C on the platforms we use.
In practice, the inversion code will likely be run in an operating system that
allows piping of input and output data in the form of SU streams, and the
attractive possibility of clustering the calculation means that it will likely be
run on some flavour of unix or Linux.
Some explanation of the files required for input/output and usage is given in
Appendix 8. The inversion is chiefly driven by an XML file that specifies the
necessary rock–physics, layer descriptions, and information about the seismic.
A quality schema–driven GUI editor expressly developed for the construction
of this XML file is also available at the website: further details are in the
Appendix.
The code is available for download (Gunning, 2003), under a generic open–
source agreement. Improvements to the code are welcome to be submitted to
the author. A set of simple examples is available in the distribution.
6 Examples
A simple but useful test problem is one where a wedge of sand is pinched out by
surrounding shale layers. This is modelled as a three layer problem, where we
expect the inversion to force pinching–out in the region where no sand exists,
if the signal–to–noise ratio is favourable and the sand–shale acoustic contrast
is adequate.
In this example, the prior time parameters are independent of the trace, so
there is no information in the prior to help detect the wedge shape. The inter-
nal layer times have considerable prior uncertainty (20ms, 50ms respectively).
Figure 5 illustrates pre and post inversion stochastic samples of the layers,
displayed with many realisations per trace. Here the noise strength is about
1/4 of the peak reflection response when the wedge is thick (i.e. away from
tuning effects).
This test problem shows how the inverter can readily unscramble tuning effects
29
1.7 1.7
1.88
1.8 1.8
1.90
1.9 1.9
1.92
layer-times
1.94
2.0 2.0
1.96
2.1 2.1
1.98
2.00
100
1.85 1.85
Thickness (m)
80
1.90 1.90
60
1.95 1.95
40
2.00 2.00
20
2.05 2.05
0
2.10
Fig. 5. (a) Truth case wedge model, (b) truth case seismic, (d) pre-inversion stochas-
tic samples (i.e. drawn from prior distribution only, (e) post inversion samples
(100 per trace), (c) p10 , p50 , p90 quantiles for the layer times of the wedge, and (f)
p10 , p50 , p90 quantiles of the wedge thickness. Note how the wedge layer time un-
certainty increases to the prior when the wedge pinches out, as there is no longer
a significant reflection event to facilitate the detection of this horizon: the inverter
does not care where the horizon is, as long as it pinches out. The wedge thickness
(zero) is still well constrained though.
Another useful test problem is where a slab of shaly sand is gradually made
cleaner from left to right across a seismic section, and embedded in the mixing
shale. As the net–to–gross increases, the reflection strength improves, and the
posterior distribution of the net–to–gross and layer thickness is of interest.
In this example, the only parameters that varies areally is the mean net–to–
30
gross: this is fixed to be the same as the “truth–case” model, but has a (very
broad) prior standard deviation of σNG = 0.3. Figure 6 illustrates pre and post
inversion estimates of the net–to–gross distribution parameters, superposed on
the truth case. This example is also used to demonstrate the convergence of the
MCMC sampler, for which we show some characteristic plots of the random
walk for the inversion on trace 8. Here the prior was loosened considerably to
NG ∼ N (0.45, 0.72 ). The chains are always started at the maximum likelihood
point, so ’burn–in’ times are generally very short.
This test problem shows that the inverter produces a relatively unbiased inver-
sion for the net–to–gross, but subject to increasing uncertainty as the reflecting
layer dims out.
31
1.80
1.85
time (secs)
1.90
1.95
2.00
PRIOR POSTERIOR
0.8 0.8
Net-to-gross
Net-to-gross
0.6 0.6
p90
0.4 0.4
p90
p10
p10
0.2 p50 0.2 p50
0.0 0.0
(b) trace number (c) trace number
0.6 1.0
0.8
0.4
Net-to-gross
0.6
0.3
0.2 0.4
0.1
0.2
0.0
-0.1 0.0
(d) trace number (e) trace number
30 0.20
Near reflection coefficient
20 0.16
15 0.14
10 0.12
5 0.10
0 4 0.08
0 100 2000 4000 6000 8000 10 0.0 0.2 0.4 0.6 0.8
(f) Iteration (g) Net-to-gross
Fig. 6. (a) Truth case seismic for net–to–gross (NG ) wedge model, (b) pre–inversion
(prior) spread of NG as show by p10, p50, p90 quantiles; truth–case shown dashed, (c)
as per (b), but post–inversion. Inset (d) shows the near–stack reflection coefficient
variations, and (e) the associated samples of NG. The signal–to–noise ratio here
is very favourable, with noise 1/10th of the peak seismic response. The residual
uncertainty in the NG is due to the fact that there is also uncertainty in the layer
velocity variations. The seismic constrains primarily the reflection coefficient, but the
system indeterminacy will still allow appreciable variation in the NG . Convergence
& mixing of the MCMC scheme is illustrated for a highly non–informative prior
NG ∼ N (0.45, 0.72 ) on trace 8: (f) trace plot of log-likelihood, with early iterations
magnified, (g) trajectory of Markov chain walk for the NG of layer 2 vs. its reflection
coefficient at the top.
32
6.3 Simple single–trace fluid detection problem
In this simple test a slab of soft sand surrounded by shale layers has oil present,
and the prior model is set to be uninformative with respect to fluid type (so
the reservoir has 50% prior chance of brine, 50% oil). The rest of the story
is told in the caption of figure 7. In summary, the inverter will detect fluid
contrasts reliably if the signal quality is sufficient, the rocks sufficiently “soft”
and porous, and the wavelet well calibrated.
-0.10 -0.05 0 0.05 0.10 -0.10 -0.05 0 0.05 0.10 -0.05 0 0.05 0.10
(a) brine prior samples (b) oil prior samples (c) synthetics of posterior
samples
Fig. 7. Three “spaghetti” graphs showing synthetic seismic traces drawn from (a)
the prior models with brine in the reservoir (b) prior models with oil, and (c) the
posterior distribution, which puts the oil probability at ≈ 97%. the noise level is
0.01. The “truth case” trace is shown in gray.
We expect in the near future to publish several extended papers focussed on the
inversion of some real field data using Delivery and the workflow of developing
the prior. For the moment, we will confine our discussion to a simple test for an
actual field, shown in figure 8. This is a ’test’ inversion at the well location. An
8–layer model for a set of stacked turbidite sands has been built with proven
hydrocarbons in the second–bottom layer. The sands are quite clean and have
high porosities (≈ 30%), so the effects of Gassman substitution are very strong
in the reservoir layers. The layers are constructed from log analysis, but their
boundaries are set to have a broad prior uncertainty around the 10-15ms range.
Low net–to–gross layers (NG < 0.1) are often set as pure shales (NG = 0) to
gkeep the model dimensionality down. The prior for fluid type (brine:oil) is set
as 50:50, as in the previous example.
The inversion at the well location not only confirms the presence of oil (>
80% probability) but also demonstrates that the posterior distribution of layer
33
Well Result
Fig. 8. Synthetic seismic traces overlaid by the observed seismic trace at a particular
location in the survey. (a) Sample models drawn from the prior with brine filling
the reservoir layer. (b) The same, but with oil in the reservoir layer. (c) Traces
computed from models drawn from the posterior distribution, conditional on oil
present in the reservoir layer. (d) Histogram of reservoir thickness drawn from prior
model and after seismic inversion (posterior). Pinchouts and very thin reserves are
obviously precluded. The posterior distribution of thickness is also consistent with
the observed thickness.
7 Conclusions
The inverter is driven from a combination of XML files and SU/BHP SU data,
and outputs are in SU/BHP SU form. The SU backbone means the inverter
interfaces nicely with other free seismic software, such as the BHP viewer 14
and BHP’s SU extensions 15 . We believe the “small–is–beautiful” philosophy
associated with backbone designs improves the flexibility and maintainability
of the software.
34
The authors hope that this tool will prove useful to reservoir modellers working
with the problem of seismic data integration, and encourage users to help
improve the software or submit suggestions for improvements. We hope that
the newer ideas on probabilistic model–comparison and sampling (vis–à–vis
the petroleum community) prove useful and applicable to related problems in
uncertainty and risk management.
Acknowledgement
This work has been supported by the BHP Billiton Technology Program.
References
Abrahamsen, P., et al., 1997. Seismic impedance and porosity: support effects.
In: Geostatistics Wollongong ’96. Kluwer Academic, pp. 489–500.
Abramowitz, M., Stegun, I. A., 1965. Handbook of Mathematical Functions.
Dover.
Andrieu, C. A., Djurić, P. M., Doucet, A., 2001. Model selection by MCMC
computation. Signal Processing 81, 19–37.
Brooks, S. P., 1998. Markov chain Monte Carlo and its application. The Statis-
tician 47, 69–100.
Buland, A., Kolbjornsen, A., Omre, H., 2003. Rapid spatially coupled AVO
inversion in the Fourier domain. Geophysics 68 (3), 824–83.
Buland, A., Omre, H., 2000. Bayesian AVO inversion. Abstracts, 62nd
EAGE Conference and Technical Exhibition, 18–28See extended paper at
http://www.math.ntnu.no/~omre/BuOmre00Bpaper.ps.
Castagna, J. P., Backus, M. M. (Eds.), 1993. Offset-dependent reflectivity –
Theory and practice of AVO Analysis. Society of Exploration Geophysicists.
Chu, L., Reynolds, A. C., Oliver, D. S., Nov. 1995. Reservoir Description from
Static and Well–Test Data using Efficient Gradient Methods. In: Proceed-
ings of the international meeting on Petroleum Engineering, Beijing, China.
Society of Petroleum Engineers, Richardson, Texas, paper SPE 29999.
Cohen, J. K., Stockwell, Jr., J., 1998. CWP/SU: Seismic Unix Release 35: a free
package for seismic research and processing,. Center for Wave Phenomena,
Colorado School of Mines, http://timna.mines.edu/cwpcodes.
Denison, D. G. T., et al., 2002. Bayesian Methods for Nonlinear Classification
and Regression. Wiley.
Deutsch, C. V., 2002. Geostatistical Reservoir Modelling. Oxford University
Press.
Eide, A. L., 1997. Stochastic reservoir characterization conditioned on seismic
data. In: Geostatistics Wollongong ’96. Kluwer Academic.
35
Eide, A. L., Omre, H., Ursin, B., 2002. Prediction of reservoir variables based
on seismic data and well observations. Journal of the American Statistical
Association 97 (457), 18–28.
Eidsvik, J., et al., 2002. Seismic reservoir prediction using Bayesian integra-
tion of rock physics and Markov random fields: A North Sea example. The
Leading Edge 21 (3), 290–294.
Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B., 1995. Bayesian Data
Analysis. Chapman and Hall.
Gilks, W., Richardson, S., Spiegelhalter, D., 1996. Markov Chain Monte Carlo
in Practice. Chapman and Hall.
Gunning, J., 2000. Constraining random field models to seismic data: getting
the scale and the physics right. In: ECMOR VII: Proceedings, 7th European
conference on the mathematics of oil recovery, Baveno, Italy.
Gunning, J., 2003. Delivery website: http://www.csiro.au/products/Delivery.html,
accessed 2 Jan 2010.
Gunning, J., Glinsky, M., 2004. Delivery: an open-source model-based Bayesian
seismic inversion program. Computers and Geosciences 30 (6), 619–636.
Gunning, J., Glinsky, M., 2006. WaveletExtractor: A Bayesian well–tie and
wavelet extraction program. Computers and Geosciences 32, 681–695.
Huage, R., Skorstad, A., Holden, L., 1998. Conditioning a fluvial reservoir on
stacked seismic amplitudes. In: Proc. 4th annual conference of the IAMG.
http://www.nr.no/research/sand/articles.html.
Kelley, C. T., 1987. Iterative Methods for Optimization. SIAM.
Koontz, R. S. J., Weiss, B., 1982. A modular system of algorithms for uncon-
strained minimization. Tech. rep., Comp. Sci. Dept., University of Colorado
at Boulder.
Leguijt, J., 2001. A promising approach to subsurface information integration.
In: Extended abstracts, 63rd EAGE Conference and Technical Exhibition.
Mackay, D. J. C., June 2002. Information Theory, Inference & Learning Algo-
rithms, 1st Edition. Cambridge University Press.
Malinverno, A., 2002. Parsimonious bayesian Markov chain Monte Carlo inver-
sion in a nonlinear geophysical problem. Geophysical Journal International
151, 675–688.
Mavko, G., Mukerji, T., Dvorkin, J., 1998. The Rock Physics Handbook. Cam-
bridge University Press.
Oliver, D. S., 1996. On conditional simulation to inaccurate data. Mathemat-
ical Geology 28, 811–817.
Oliver, D. S., 1997. Markov Chain Monte Carlo Methods for Conditioning a
Permeability Field to Pressure Data. Mathematical Geology 29, 61–91.
Omre, H., Tjelmeland, H., 1997. Petroleum Geostatistics. In: Geostatistics
Wollongong ’96. Kluwer Academic.
Phillips, D. B., Smith, A. F. M., 1996. Bayesian model comparison via jump
diffusions. In: Markov Chain Monte Carlo in Practice. Chapman and Hall.
Raftery, A. E., 1996. Hypothesis testing and model selection. In: Markov Chain
Monte Carlo in Practice. Chapman and Hall.
36
Storn, R., Price, K., 1997. Differential evolution – a simple and efficient heuris-
tic for global optimisation over continuous spaces. Journal of Global Opti-
mization 11, 341–359.
Tarantola, A., 1987. Inverse Problem Theory, Methods for Data Fitting and
Model Parameter Estimation. Elsevier Science Publishers, Amsterdam.
37
Appendices
Here we illustrate how the effective rock properties for a single layer might be
computed from a full suite of fundamental model parameters (equation (1))
for the layer.
Sh 1 − Sh −1
Kfl = ( + )
Kh Kb
• Using literature values of Kg , compute Ksat,fl using equation (13). This comes
out to be
" Ã ! #−1
1 1 1 1
Ksat,fl = Kg / 1 + − +
φs Kg /Kfl − 1 Kg /Kb − 1 Kg /Ksat,b − 1
(49)
This equation has regions of parameter space with no positive solution (of-
ten associated with, say, small φ). The forward model must flag any set of
parameters that yield a negative Ksat,fl and furnish a means of coping sanely
with such exceptions. The same remark applies to equations (47) and (52)
below.
• Compute the new effective density
38
• Compute the impermeable–rock moduli
2
µm = ρm vs,m (50)
2
Mm = ρm vp,m (51)
4
K m = Mm − µ m (52)
3
• Mix the fl–substituted permeable rock with the shale properties using the
rock–mixing Backus averaging formula (9). Compute also the mixed density
from (10).
• From the mixed rock, back out the velocities vp,eff = (Meff /ρeff )1/2 , and
vs,eff = (µeff /ρeff )1/2 .
The sampling rate in equation (53) is about 2 times the Nyquist rate associated
with the peak frequency. If the error is acquired with a faster sampling rate
than this, adjacent error samples will be strongly correlated and the error
term will become large. Since the error covariance is unknown, and bearing
in mind that we seek an error measure corresponding to a maximum number
of “independent” data points, we use the rate given by equation (53) and
model the error points as i.i.d. values. The peak frequency fpeak is estimated
at initialisation from the FFT spectrum of the wavelet.
39
The fundamental variables associated with multi–modality are the layer times,
in that the posterior likelihood surface is frequently oscillatory in the layer
times. Convergence to a secondary or minor solution is usually called loop–
skipping, and may or may not be desirable. In situations where loop skipping
may be permissible or even desirable, the prior uncertainty on the layer times
is usually set to be quite broad, as these secondary solutions may be feasible.
Convergence to these solutions can then be prompted by setting the initial layer
times fed to the minimisation routine at values likely to be within the basin
of attraction of the desired solution. The hueristic method used for defining
these starting points is as follows:
(3) Now, if σ∆ti /∆ti > 0.5, this amounts to a ≈ 5% chance of the layer
pinching out (∆ti < 0) before taking into account the seismic, so this is
likely enough to warrant investigation. The starting points
are then used as suitable starting times for the times ti , ti+1 . The remain-
ing times tj , j 6= i, i + 1 are set at the posterior means tj = m̃j . If more
than one layer is potentially pinched out, we form a set of starting points
by looping over the set of potentially–pinched–out layers, and setting only
one of these layers at a time to the pinched–out starting configuration just
described, with remaining times at the values in m̃.
40
Appendix 4: An Independence Sampler
or leave the state of the chain unchanged. This method has had somewhat
mixed success - it is perfect when drawing from the prior (since the prior is
multi–Gaussian), but is susceptible of being trapped in particularly favourable
states for long periods when the posterior departs sharply from Gaussianity.
Further, any thick tails in the posterior density are not likely to be explored
by the proposal density, since it has compact Gaussian tails. The proposal
overdispersion recommended by Gelman (Gelman et al., 1995, Ch. 11) does
not work well, since the dimensionality is so high that even mildly overdispered
t–distributions dramatically reduce the acceptance rates, even when sampling
from the prior.
41
Appendix 5: Modified SU trace formats for properties
The local prior information for the inversion at each trace is communicated
to the inversion code by a SU data set with a set of traces on a pattern and
ordering that matches the SU seismic data set exactly. The xml Modelparame-
ters.xml file which sets up the inversion will specify which prior quantities vary
on a trace-by–trace basis, and the block indices k at which they are located.
Suppose there are Np properties which vary, and there are Nl layers. The value
for property k in layer l is then the (l + (k − 1)Nl )th float value in the current
trace of prior data. There are some special rules as well
• The layer times must always be present in the prior trace, and they must
always be the last block (k = Nk ). Further, one additional float value is
appended for the base time of the last layer tbase ).
• All values are expected to be positive, and are converted to positive numbers
if not. The value -999.0 signifies a “bad value”, and will be ignored (the
default prior will then apply)
• All times are in milliseconds.
• Layers can be effectively suppressed from the model by providing the isopach
constraint ∆Z = N (0, 0) (see eqn (27)), either from the XML or the model
traces file for some subset of traces. If this is supplied, the layer is deemed in-
active. This is advantageous when a layer is known to permanently pinchout
in some region. Any model reduction helps in speeding up the inversion.
Delivery will write stochastic samples from the posterior to the requested
output file in a very similar format to the model–prior traces. A sequence of
blocks is written, each of size Nl , and the last block of Nl + 1 floats is either
time or depth, as specified by the obvious entry in the ModelDescription.xml
file. The names of the properties can be written to a single–line ascii file by
supplying -BHPcommand filename to the delivery command line.
Similarly, the deliveryAnalyser program can collate the large statistical ensem-
ble produced by the delivery output into a compact set of traces representing
either means, standard deviations or quantiles of all the salient quantities of
interest. The set of property names of the blocks in these summary traces is
obtained by adding -BHPcommand filename to the deliveryAnalyser com-
mand line.
42
Appendix 6: Handling multiple modes and multiple fluid states (mod-
els) in the MCMC sampling schemes
wj ∼ Π(j|S)p̂j (62)
Thus, for example, a typical mode–jumping scheme which operates when there
43
are multiple modes (Nm > 1), is
(1) From the current mode j, propose a new mode j ′ with probability Q(j ′ |j) ∼
wj ′ (1 − δ(j, j ′ )) (this forces a jump out of the existing mode).
(2) Perform the mappings and draw any new (non–common) model parame-
ters as per the algorithms in section 4.3.2.
(3) Augmenting equation (43), and with the same model–vector partitioning
definitions surrounding that equation, the acceptance probability is then
The full posterior π(·) is evaluated as per equation (29), i.e. using the
fluid–state probabilities.
Wavelets are expected to be single trace SU files, with the fields ns, dt and
f1 set to reflect the number of samples, sampling interval, and starting time
of the first sample (beginning of precursor). The ns and f1 fields are set so
as to make the wavelet as compact as possible with respect to the tapering,
so excessive samples are discouraged, but the tapering must also be smooth
to the first and last sample. Wavelets generated by the WaveletExtractor
code (Gunning and Glinsky, 2006) distributed with Delivery are automatically
properly initialized.
44
Appendix 8: Usage of the code
8.1 Inputs
In distributed systems, the latter two items may refer to named pipes on a
unix/Linux system.
The inversion is chiefly driven by an XML file that specifies the necessary rock–
physics, layer descriptions, and information about the seismic. A well formed
XML file is crucial to this end, so the file can be created using a schema–
driven GUI editor expressly built for the purpose, which is also available at the
website. “Help” documentation for the inverter is available through standard
menus. The editor is also capable of creating, running and controlling complex
scripts on distributed systems, using a schema for common unix, SU, BHP SU
and delivery commands.
The structure of the XML document is controlled by the schema, and the
meaning of the various entries is readily inferred by reference to the examples
in the distribution.
Some particular entries in the XML file are worth further explanation
• The attribute “varies areally” may be set to true or false for a certain prop-
45
erty. If true, the quantity in question will vary on a trace–by–trace basis, and
is expected to be read in from the SU model file. The “<property indices>”
block then will contain an entry “property link” which defines the block of
the SU model file containing the property (see Appendix 5).
• Various properties may contain the attribute “same as layer above”. This
is set to true when the property in question should be a replica of the
same property in the layer above. For example, the low–frequency–interval–
velocity (LFIV) is usually a coarse scale measure representing an average
across many layers, so it is best fixed at the same value (or mapped to the
same stochastic variable) for all of the relevant layers. Likewise, the “depth”
input is an approximation used to sample from the trend curves, and may
safely be set to the same value for many adjacent thin layers. Prudent use
of this attribute helps to reduce the dimensionality of the final model.
• Each “<layer>” entry may have the attribute “synchronise rock matrix” set
to true. This is a flag to indicate that any stochastic rock properties of a given
rock type in this layer must be matched to those of the matching rock type
in the layer above (for example, the vp , vs and φs of the permeable member
in the two layers is mapped to the same underlying stochastic variable By
this means, two adjacent layers with matching end–member rocks are made
acoustically indistinguishable but for fluid property differences. Any internal
reflections are then due to fluid contrasts, and this feature can then be used
to make the layer boundaries track a fluid interface.
• Linear trend curves are of form vs = slope × vp + intercept ± sigma. The
Gardner-Gardner-Gregory regression curves for impermeable rocks are of
exponent
form ρ = factor × vP ± sigma.
• The “<output>” block has the following useful flags.
· “<density ordering>” is set to one of full/partial/none, with the meanings
as described in section 2.2.2.
· “<realisation number header field>” denoted the SU header word used to
store the realisation number.
· “<master depth layer number>” denotes the layer number from which
depths of the other layers will be computed using summations of layer
thicknesses (vp,eff × one–way time thickness). The actual depth of the mas-
ter layer is the depth variable d in the model vector m.
8.2 Outputs
46
salient quantities of interest include
χ2seismic
χ2seismic,normalised ≡ (67)
Nerror-sampling-points
Like delivery and SU programs, the program has self documentation, but
some typical analysis of a suite of realisations realisation.su might be
47
Appendix 9: Incorporation of Floating Grain models into Delivery
The effects of floating grains in the rock matrix can be taken into account
using regressions curves of the form
−φflt
φ= + aφ + bφ Zeff + ǫφ (68)
1 − fc
vp = ap + bp (φflt + (1 + g)φ) + ǫp (69)
where fc is the capture fraction (volume of second material taken into sup-
porting matrix divided by total secondary material). The effective stress de-
pendence appears is some well chosen function Zeff , with available data, which
may be, e.g. of form Zeff = (1 − exp(−P/P0 )). The ǫ terms are noise terms.
( Ã
Kg 3(1 − νm )
vp (φ, φflt ) = β(φ, φflt )
ρg + (ρb − ρg )φ 1 + νm
!)1/2
(1 − β(φ, φflt ))2
+ (70)
(φ + φflt )((Kg /Kf (φ, φflt )) − 1) + 1 − β(φ, φflt )
about a mean porosity φ̄ and φflt = 0, which arises from the following strong
assumptions. The grains are assumed to have the same properties as the matrix
(same material – subscript g), and variations in the Poisson’s ratio νm with
φ and φflt are neglected. We model the rock beta behaviour as a function of
structural porosity φs = φ + φflt as
and the ’effective fluid’ modulus by the Reuss average of floating grains and
brine moduli à !−1
φ/Kb + φflt /Kg
Kf (φ, φflt ) = . (72)
φ + φflt
48
The parameters λ, φ̂0 can be obtained by a nonlinear regression of the clean-
rock {vp , φ} data using φflt = 0 in the above formulation (equation 70), using
known values for Kg , ρg , ρb etc. The critical porosity φ̂0 is very likely to be in
the range 0.4 − 0.45. If a value for Poisson’s ratio representative of the average
porosity is chosen (e.g. νm = 0.15), the dominant unknown in equation (70)
is λ, which can be simply regressed for. Note that using an assumed value for
νm (e.g. νm = 0.15), appropriate for the average of data porosity will not yield
the correct pure quartz velocity vp (0, 0) since a pure quartz Poisson’s ratio is
closer to 0.07.
The slope and intercept terms in eqn (69) emerge naturally from the lineari-
sation of the nonlinear fit to (68) of clean rock data.
We define (1 + g) by
¯ ¯
∂vp (φ, φflt ) ¯¯ ∂vp (φ, φflt ) ¯¯
(1 + g) ≡ ¯ / ¯ (73)
∂φ ¯
φ=φ̄,φ =0
∂φflt ¯φ=φ̄,φ
flt flt =0
and sufficiently accurate numerical values for the derivatives are most simply
obtained by finite differences.
• Assume Kg , Kb , ρg , ρb , νm
• Compute the mean porosity φ̄
• Fit the vp data for the non-linear parameters λ, φ̂0
• Compute the bp , bp (1 + g) coefficients of eqn (69) from the derivative expres-
sions in the Appendix at the mean porosity. Then ap = vp (φ̄, 0) − bp (1 + g)φ̄.
• For this clean rock data, here are no φflt terms in (68) and (69), so fit
the loading–behaviour
q
data to get the coefficients aφ , bφ and residuals RMS
2
σǫφ = hǫφ i
• Fit the linearisedqvelocity trend data to get the coefficients ap , bp , and resid-
uals RMS σǫp = hǫ2p i
Now, for each group of float-polluted well data 16 (i.e. which can be modelled
with an common effective φflt )
16Regressions for the data with floating grains can be performed using EM
(Expectation-Maximisation) techniques, a possible future development
49
φ = a′φ + bφ Zeff + ǫφ (74)
vp = a′p + bp (1 + g)φ + ǫp (75)
A simple check of the theory is to confirm the new regressions have the same
slope as the clean rock data.
will be used, using the existing notation in Delivery documents. The shear
relation is unchanged. The pressure dependence will be effected by taking the
loading depth d → Zeff (e.g. (1 − exp(−P/P0 ))) computed from e.g. basin
modelling (use the depth rock curves entry in the XML, to distinguish from
model depth). The LFIV will be dropped by setting Cp = 0. Approximate
conversion to this Delivery–style form of the coupled regression can be obtained
with
Ap = ap + bp (1 + g)aφ (80)
Bp = bp bφ (1 + g) (81)
1+g
Dp = bp (1 − ) (82)
1 − fc
ap
Aφ = − (83)
bp (1 + g)
1
Bφ = (84)
bp (1 + g)
1
Cφ = − (85)
1+g
q
σǫ′ p = σǫ2p + (bp (1 + g))2 σǫ2φ (86)
σ ǫp
σǫ′ φ = (87)
bp (1 + g)
50
Numerical Example
We use the loading depth term Zeff = (1 − exp(−P/800PSI)) for the loading
regressions, which is input to Delivery as the parameter depth rock curves.
At a particular location of interest, Zeff = 0.976 at P = 3000 PSI, roughly
around the depth 11000ft.
9.4.1 Shales
The loading behaviour for non–reservoir rocks (shales) has also to be converted
to use the Zeff variable. For simplicity, we took typical velocities from the
existing depth–based trends, then set Bp = 0 to yield
9.4.2 Sands
From the estimated noise RMS values σǫφ = 750 and σǫp = 0.0024 we compute
σǫ′ p ≈ 700ft/s and σǫ′ φ ≈ 0.02 as working guesses.
51
9.5 Model system
1.9 1.9
SHALE
2.0 2.0
SAND
2.1 2.1
Near Far
Fig. 9. Two–layer model system with truth case (thick red) seismic traces and syn-
thetics from the posterior (black) for the inversion case iii) described in the text.
The absolute noise level is set at 0.0025 for both stacks.
For inversion, the prior on floating fraction is taken as N (0, 0.052 ), and we
attempt to compute the posterior floating fraction fraction from 3 cases: i)
N G fixed at 1, near–stack only, ii) (N G ∼ N (1, 0.22 ), near–stack only, and iii)
(N G ∼ N (1, 0.22 ), near and far stacks.
For case i), figs. 10– 12 shows 3 possible forms of the prior – varied for illustra-
tive purposes – on which we superpose the original log data, which illustrates
how the floating grain effect smears out the regional prior. The actual prior
used int he inversion cases coincides with the inset (c).
52
1.4 ×104
1.3 ×104
1.2 ×104
1.1 ×104
1.0 ×104
9 ×103
8 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.050.100.150.200.250.300.358×1039×1031.0×1041.1×1041.2×1041.3×1041.4×104
Fig. 10. This and figs. 11 and 12: three ’pedagogic’ priors for velocity versus porosity
in pure sand layer. Clean well data points (+) and float-polluted well data points (*)
plotted on all three graphs: dots (.) are draws from model prior. (a) Prior constructed
with artificially narrow σφ = 0.002, showing how parameters arise from a clear 50:50
mixture of ’clean–rock regression’ points and an elliptical smear from the effects of
the floating grains. The clean rock trend is obviously far too narrow to embrace the
clean well (+) measurements, but the ’clean trend’ is clearly visible and centred.
1.4 ×104
1.3 ×104
1.2 ×104
1.1 ×104
1.0 ×104
9 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Fig. 11. Illustrative prior (b): prior drawn from clean rocks only (float–fraction
∼ N (0, 0)), with realistic σφ = 0.02. The tails of the distribution do not contain
the floating grain data comfortably. Clean well data points (+) and float-polluted
well data points (*), dots (.) are draws from model prior.
53
1.4 ×104
1.3 ×104
1.2 ×104
1.1 ×104
1.0 ×104
9 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35
Fig. 12. Illustrative prior (c): actual prior with float–fraction distributed as
∼ N (0, 0.052 ). Only a few measurements appear to lie at the periphery of the distri-
bution, but the mixture character is not as clear to the eye as in fig. 10. Clean well
data points (+), float-polluted well data points (*), dots (.) are draws from model
prior.
Figure 13 shows salient scatterplots of properties of the sand layer before and
after inversion. The peak–signal to noise ratio is set at about 6:1.
Figure 14 shows salient scatterplots of properties of the sand layer after inver-
sion, where the model has additional net–to–gross freedom free in the prior
NG ∼ N (1, 0.22 ).
Inversion using shear data may help narrow down floating grain porosities
better, as the shear carries additional information. The far stack for this test
case is set at about 30 degrees (c.f. a few degrees for the near) and the reflected
amplitude is much weaker (AVO effects). The noise level was set at the same
value as for the far stack.
54
0.14 2.6 0.2
0.12
A 2.5
B C
rho_s (gm/cc)
0.1
0.10
R_near
float_fraction
PRIOR
2.4
0.08
0.0
0.06
2.3
0.04
-0.1
2.2
0.02
0.12
D 2.5 E 0.1
F
POSTERIOR
0.10
R_near
float_fraction
rho_s (gm/cc)
2.4
0.08
0.0
0.06
2.3
0.04
-0.1
2.2
0.02
Fig. 13. Scatterplots from prior distribution (top row) and posterior (bottom row)
for 2–layer model with sand below shale. (A,D) float fraction vs sand porosity. A
perceptible narrowing around the true answer of float fraction=5% is visible. Fewer
clean sands are produced in the posterior. (B,E) The velocity (vp ) density scat-
terplot narrows more obviously. (C,F) The layer–time (t) versus effective reflection
coefficient Reff is clearly pinned down sharply. As usual, these parameters are most
heavily constrained.
Recall that Delivery works with two versions of the model vector m. The
vector m has a fully Gaussian prior, with no truncations or restrictions on
values. The physical model vector m′ , which is used in the forward model
and its associated likelihoods (seismic, isopachs) is obtained by applying time
orderings and truncations (e.g. of NG or saturations) to m, i.e. m′ = f (m), for
some suitable f . The truncation effectively induces a prior which, for simple
properties like NG, is a mixture of a truncated Gaussian distribution and delta
functions at endpoints.
With the augmented models defined by equations (78) and (79), the linearity
means the prior is still Gaussian, but the truncation of φflt in m′ must be
handled with care. The extra coefficients Dp , Cφ have the effect of placing the
prior on inclined ellipsoids in e.g. the {vp , φflt } plane, so pure truncation on φflt
has the effect of smearing the tail of the distribution onto the plane φflt = 0
in a direction off the principal axes. This is clearly a poor way to handle the
prior. Fig 16 shows a scatter plot of points produced from a prior constructed
55
0.14 2.6
0.12
2.5
0.10 A B
float_fraction
2.4
0.08
rho_eff
0.06
2.3
0.04
2.2
0.02
0.00 2.1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8 ×103 9 ×103 1.0 ×104 1.1 ×104 1.2 ×104 1.3 ×104 1.4 ×104
phi vp_s
1000 1.0
0.9
800
C 0.8
600
0.7
frequency
NG
0.6
400
0.5
200
0.4
D
0 0.3
−0.05 0.00 0.05 0.10 0.15 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
float_fraction float_fraction
Fig. 14. Posteriors from model with loosened NG distribution in prior. (A) float–frac-
tion vs sand porosity. (B) The velocity (vp ) vs density. (C) Histogram of float–frac-
tion from the posteriors of the models with fixed NG = 1 and free NG ∼ N (1, 0.2).
No obvious effect on the univariate float–fraction distribution is caused by allowing
the NG to become free: the peaks of the two delta functions are shown with ar-
rows. (D) Scatterplot of NG vs float–fraction. Again, no obvious strong correlation
appears here, with the density strongest near the truth case values (0.05, 1.0).
in this naive way, with the obvious artifacts. A more reasonable way to handle
the truncation is with the mappings (only for φflt < 0):
which forces the remapping to occur along directions parallel to the principal
axes.
This mapping minimises the difference (m′ − m)T CP−1 (m′ − m), subject to the
positivity constraint, which seems a reasonable formulation.
Note that the actual Gassman fluid substitution calculation that occurs later
56
0.14 2.6
0.12
0.10
A 2.5
B
2.4
0.08
float_fraction
rho_eff
0.06
2.3
0.04
2.2
0.02
0.00 2.1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8 ×103 9 ×103 1.0 ×104 1.1 ×104 1.2 ×104 1.3 ×104 1.4 ×104
phi_s vp_s
1000 1.0
0.9
800
C 0.8
600
0.7
frequency
NG
0.6
400
0.5
200
0.4
D
0 0.3
−0.05 0.00 0.05 0.10 0.15 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
float_fraction float_fraction
Fig. 15. Scatterplots as per figure 14. The Posterior is more concentrated on the
solution NG = 1 (D), and similarly has less weight on the clean–rock solution
float–fraction = 0 (C and D) (i.e. note the reduced relative height of the spike
in (C)).
1.4 ×104 1.4 ×104
9 ×103 9 ×103
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
phi_flt phi_flt
Fig. 16. Left: pure truncation of φflt resulting in smearing of prior along φflt = 0
plane. The right figure illustrates the remappings of equations (94)–(97) which seem
more reasonable.
in the forward model uses only the pure fluids (oil, gas etc), as the Gassman–
like effect of the floating grain presence is implicitly accounted for by the
floating–grain terms in the modified regressions.
9.8 Inputs/Outputs
57
default. Outputs will be the modified porosity (1−SL )φs (actual fluid porosity)
and the additional field fL . The SU header word duse will encode output file
format variations for compatibility with older models (it is rarely used, and
defaults to zero).
An additional feature added to Delivery since the published paper is the ca-
pacity to vertically stack models, which is a clandestine way to study models
where transverse correlations between parameters at different spatial locations
can be important, or at least more tightly constrained than the allowable
(regression-error) variations in the regional rock curves.
The idea here is that the coupled inversion at two or more separate spatial loca-
tions – where there exists significant transverse correlation of layer properties
– can be effected by stacking several models on top of one another at a single
spatial location, and including coupling terms for the likelihood which connect
properties in arbitrarily different layers. The idea is illustrated in figure 17.
The seismic data from the several locations is “stacked” in a similar way. Con-
sider a “two–stack” case for example: a three-layer model with time–window
of interest t ∈ [1, 1.5]s, can be stacked into the total interval t ∈ [1, 2.5] with
location 1 in [1, 1.5]s and location 2 in [2, 2.5]s. A suitable buffering time in-
terval (in this case [1.5, 2]s) is used between the two packages. Dummy shale
packages can also be inserted if desired.
The user must specify a set of one or more layer mappings, each of which
specifies a base layer and one or more target layers via an xml entry like
<mapping>1:5,6</mapping> which has base layer=1 and target layers = 5,6.
These then enable the coupling of properties between the base layer and the
target layers. This is a quite general form – in simple two–location models
there is only one target layer for each base layer.
58
1
base_layers
2
4 coda
layer mappings
3σt wavelet
t exclusion
3σt window
5 precursor
6
target_layers
8 time t
Fig. 17. Illustration of construction of a stacked 8–layer model from two 3–layer
models with buffering shale layers. Mappings between comparable layers are shown
on the left. To exclude seismic–mismatch coupling effects from the region where
the two packages join, a seismic–likelihood exclusion–window is constructed, where
the XML specifies the parameters (start layer=5, end layer=4). The constructed
exclusion zone starts from a “safe” distance above the top of the start layer (i.e.
including prior time uncertainties and the effect of the wavelet coda) down to a
“safe” time below the bottom of the named end layer (using the wavelet precursor
and 3 standard deviations of the prior time pick).
(c.f. equation (29), with the fluid–choice subscript k suppressed for simplicity)
corresponding to this construction is of form
1 X X X (mI(p,ip ) − mI(p,jip ) )2
− log(P (m)) = .
2 properties p applicable target layers jip σp2
base layers ip
(98)
where I(p, j) denotes the index of property p, layer j, in m. The RHS in (98)
is a quadratic form which we may write as mT CV−1S m/2, where CV S is a ma-
trix assembled from a sum of block matrix type entries of the general type
( −1 1 )/σp2 , inserted in the appropriate rows and columns.
1 −1
The net effect of this term is that the existing prior specified by the rock–
physics curves and time–picks can be simply updated using the usual Bayesian
formulas to the new Gaussian prior with parameters
59
′
Cm = (CV−1S + Cm
−1 −1
)
′ −1 −1 −1 −1
m = (CV S + Cm ) Cm m (99)
Thus, the prior is the same multi–Gaussian expression as equation (29), but
with the modified mean and covariance as per (99). In particular, specification
of “vertical stacking” terms for p ∈ {vp , vs , φs , ρm , NG}, or their modification,
does not alter the overall prior probability of the model.
Note that the thickness constraint does not contribute a pure quadratic term
that can be absorbed rigorously into the Gaussian prior. Though we would
like to treat all vertical–stacking requirements as part of the prior (and this
is also often true for isopachs), this requires that the normalisation constants
are explicitly available, and these are not analytically available for terms that
do not appear directly as Gaussian. Hence the vertical–stacking constraint
for thickness is imposed via a likelihood term, very similarly to the isopach
constraint. This means that changes in the vertical–stacking thickness term will
alter the underlying probability of a model, just as the isopach and seismic data
does. For completeness, we write the likelihood from the thickness difference
as
1
LVSth (m) =
(2π)dth /2 |CVSth |1/2
1 X X (∆Zi (m) − ∆Zji (m))2
exp(− ).(100)
2 applicable target layers ji σZ2 i
base layers ip
where CVSth = diag{σZ2 i } contains dth entries, one for each term of the double
sum in the exponent.
A minor complication with this trick of vertical stacking is that the reflected
seismic amplitudes at the interfaces between each “package” of layers will
couple the properties between the top and bottom of adjoining packages, which
is usually undesirable. In this context, the amplitudes at these times are of no
interest, and should be excluded from the inversion. The user has the option
of excluding these using one or more xml entries of typical form
<seismic_likelihood_exclusion_range>
<start_layer>5</start_layer>
<end_layer>4</end_layer>
</seismic_likelihood_exclusion_range>
Each such entry defines a time–window where seismic–mismatch terms are not
accumulated in the likelihood equation (26). The window is defined by the
interval
60
t > t̄start layer − 3σtstart layer − |wavelet–coda|
t < t̄end layer+1 + 3σtend layer − |wavelet–precursor| (101)
where the mean and standard deviations t̄i and σti are those of the prior
picks of layer times, c.f. section 2.2.1. This definition enables the exclusion–
window(s) to track the model over space. Figure 17 illustrates how this works,
for a 8–layer model.
A typical XML chunk to perform two–location stacking for the 8–layer model
shown in this figure would be
<stacked_models_info>
<layer_mappings>
<mapping>1:5</mapping>
<mapping>2:6</mapping>
<mapping>3:7</mapping>
</layer_mappings>
<property_difference_std_deviations>
<vp_m applicable_layers="1,3">350</vp_m>
<rho_m applicable_layers="1,3">0.02</rho_m>
<vp_s applicable_layers="2">250</vp_s>
<phi_s applicable_layers="2">0.1</phi_s>
<NG applicable_layers="2">0.1</NG>
<thickness applicable_layers="2">15</thickness>
</property_difference_std_deviations>
<seismic_likelihood_exclusion_range>
<start_layer>5</start_layer>
<end_layer>4</end_layer>
</seismic_likelihood_exclusion_range>
</stacked_models_info>
For convenience, the runtime flag -IVS switches off all the inter–layer coupling
induced by the terms (98),(100), which is convenient for quick comparisons.
61
The default operating mode of the inversion uses the supplied fluid probabilities
to construct an enumerated list of fluid combinations for the entire model,
where each combination assumes that layers are completely occupied by the
nominated fluid type. In situations where a particular fluid contact intersects
many different layers across the inversion domain, this style of modelling can
be messy to construct, especially if layers are introduced as a clandestine way
of capturing the fluid contacts.
gas
‘go’ contact
oil
‘ob’ contact
oil brine
‘ob’ contact brine sub SHALE
lay
sub er 1
lay
er 2
Fluid unit 1
SHALE
Fluid unit 2
SHALE
• A fluid unit is a set of contiguous layers, and within a fluid unit there may
be 1, 2 or 3 of the following contacts {go, gl, gb, ol, ob, lb} (oil (o), gas (g),
brine (b), low–saturation gas (l)), under usual full–density ordering.
• The set of fluid units must account for all the layers in the model
• Storage efficiencies require the total number of contacts from all fluid units
62
not exceed the total number of layers in the model (in practice a very unlikely
restriction).
• Two adjacent fluid units must have an impermeable layer at the bottom of
the upper fluid unit or top of the lower fluid unit. This allows unusual fluid
sequences between fluid units, analogous to the ’partial–density–ordering’
which models layer sequences not in hydraulic communication
• Fluid units are itemised in the XML in descending depth order
• Contacts in each fluid unit specify the upper and lower fluid type, and are
modelled by Gaussian distributions Z ∼ N (Zc , σZ2 c ) whose mean and stan-
dard deviation have default values in the XML, but which may be spatially
variable via use of the model–prior traces SU file. Contacts contribute a di-
agonal elements σZ2 c to Ck in equation (29) with associated mean Zc inserted
in the mean vector mk .
• Any layer which can in–principle host a fluid phase inferred from the fluid
units and contacts must specify the fluid in the fluid nameable in each
<reservoir endmember> subsection. E.g., if fluid unit 1 identifies a ’bo’ con-
tact, all permeable layers spanned by fluid unit 1 must specify the oil name
and saturation (the probability field is ignored). The oil name is expected
to match throughout the fluid unit.
• The fluid unit XML attribute fluid properties synchronised can force the
stochastic properties of all hydrocarbon properties within a fluid unit to be
mapped from a common underlying variable if set to true. This may be use-
ful, if, e.g. similar saturations are expected for, say, gas, in all reservoir lay-
ers in the fluid unit. If synchronisation is activated, the XML+model traces
specification of the uppermost active reservoir layer 17 controls the prior dis-
tribution of the fluid properties, even if this layer is above a contact excluding
the fluid.
It is important to note that the supplied fluid probabilities are ignored when
fluid occupations are determined using contacts in this mode.
63
gives the true layer thickness, i.e. equal to the sum of the thicknesses of the
sublayers delimited by contact depths and the layer top/bottom depth.
• density rho eff and saturation S h are thickness weighted averages (the
latter lumping all non–brine fluids), which should give reasonable results
for forward calculation of rock mass and hydrocarbon in–place. For mixed
hydrocarbon phases, the fields S oil, S lsg, and S gas are probably more
useful. These last three are phase–specific thickness weighted averages.
• Reflection coefficients R near fc, R far fc, which are near and far stack
reflection coefficients arising only from fluid contact reflections occurring
within layers.
• Times t fc of these reflection coefficients, in the same order. No associa-
tion of these times with a particular layer, via the layer–slot in the SU file
from which they are retrieved, can be made. The layer “owning” the reflec-
tions must be determined by comparison with the usual “t” entries. Times
are computed and written even for zero-reflections occuring within purely
impermeable layers.
• Depths d fc of these reflection coefficients, in the same order.
• fluid type is interpreted as an 2 byte integer (cast from the float value in
the SU trace), with legacy values 0=brine, 1=oil, 2=lsg, 3=gas. These values
require only 2 bits of information. For layers split by contact(s), the spare
bits of the integer are encoded to communicate the sub-layer fluid sequence
thus:
Bits 16. . . 13 12,11 10,9 8,7 6,5 4,3 2,1
| {z }| {z } | {z } | {z } | {z } | {z } | {z }
unused A B C D E F
This encoding reduces to the legacy 0,1,2,3 values for layers not split by
contacts.
• S oil, S lsg, and S gas. These “effective” saturation components conserve
correct hydrocarbon–phase volume if multiplied by the layer thickness, in
cases where internal contacts exist. For example, if a oil–brine contact bisects
a layer (in depth), S oil is discounted by 1/2. For contact–free layers, their
values are sensible and equal the legacy S h field, for the appropriate phase.
• The Analyser program can now generate net-oil, net-lsg and net-gas,
64
which are correct special cases of net-hydrocarbon, using the S oil, S lsg,
and S gas fields described above
Seismic traces
~60m
gas
‘go’ contact
SHALE
oil
‘ob’ contact
brine
Fluid unit 1
SHALE
Fig. 19. Simple 3 layer, single fluid unit, 2 contact problem: gas over oil over brine
<fluid_units_and_contacts>
<fluid_unit fluid_properties_synchronised="false">
<top_layer>1</top_layer>
<bottom_layer>3</bottom_layer>
<contact>
<type>GO</type>
<depth varies_areally="false">2050</depth>
<sigma_depth varies_areally="false">0</sigma_depth>
</contact>
<contact>
65
<type>OB</type>
<depth varies_areally="false">2120</depth>
<sigma_depth varies_areally="false">0</sigma_depth>
</contact>
</fluid_unit>
</fluid_units_and_contacts>
Fig. 20 shows synthetics for this model, where the Gassmann fluid effect is
very strong: indeed, fluid–contrasts are stronger than shale–sand contrasts.
2.00 2.00
2.05 2.05
2.10 2.10
2.15 2.15
2.20 2.20
2.25 2.25
2.30 2.30
Fig. 20. Synthetic seismic data for the model of fig. 19 at 200Hz (unrealistic) and
80Hz (realistic) bandwidth.
Fig. 21 shows typical realisations from the inversion of this model (which in-
cludes an isopach constraint of 60 ± 10m) on the reservoir. The brighter reflec-
tions, including fluid contrasts, yield sharper layer and internal contact times.
In the brine leg, the dimmer amplitudes produce greater uncertainties. The
peak–signal to RMS noise ratio is about 14 in the gas, diminishing to about 2–
3 in the brine leg. A fascinating aspect of the use of contacts is that the depth
uncertainty of layers is seriously reduced when a visible contact occurs in the
trace, as per the lower graph in fig.21. Here, the bright peaks sharply define the
reflectivities (thus velocities) and reflection times, thus the layer thickness can
be better nailed, and all that is needed is a known event in depth upon which
to hang the model. Notice the top depth of layer 1 is appreciably better pinned
by this mechanism, even though the reflection occurs in the layer below.
The simple fluid contact model described here uses the usual optimisation
machinery which enumerates multiple modes by accruing local–optimisation
66
2.00
layer 1
2.05
2.10 layer 2
2.15
time (s)
layer 3
2.20
2.25
2.30
25
depth std. deviation
20
15
10 top, layer 1
5
top, layer 2
0
Fig. 21. Typical inversion realisations showing internal contacts well resolved in gen-
eral. The lower graph is depth uncertainty, which reduces markedly when a contact
is visible.
solutions started from an assortment of dispersed starting points. Multimodal-
ity associated with loop skipping will still occur, and can be expected at traces
where internal fluid contacts are close to layer boundaries. Tunnelling-like be-
haviour of the MCMC sampler is not unlikely if the modes are very different
in curvature and the mode–jumping strategies of section 4.3.2 are not well–
adapted. Some of this character is already evident in the brine leg of the
example (fig. 21). Careful examination of sample outputs for good mixing is
encouraged.
Another not–yet thoroughly tested issue is that the posterior surface will
have jump discontinuities associated with contacts pinching out, analogous
to the discontinuities arising from layer pinchouts. The latter are dealt with
via quadratic-programming (QP) techniques tacked on to the Gauss-Newton
schemes, but this strategy is much harder for internal fluid–contact discon-
67
tinuities, as the fluid contact is not specified in time. Possible linearised QP
strategies are under consideration.
using the notation of equation (28), and interpreting S liberally to mean all
the “data”: seismic, isopachs etc. The priors P (mk ) and likelihoods used to
construct the posterior Π(mk |S) must be normalised correctly.
Between any two models k and j, the marginal likelihoods can be used to
construct the Bayes Factor
Π(k|S)
Bkj =
Π(j|S)
which is usefully interpreted as the posterior odds ratio of model k to model
j. More generally, among k = 1 . . . N models with prior probability Pk , the
posterior probability of model k is
Π(k|S)Pk
P (k|S) = PN .
j=1 Π(j|S)pj
The chief challenge is performing the integral in (103). A first useful approx-
imation is the Laplace approximation (Raftery, 1996), based on a quadratic
expansion about the MAP point m̃ obtained at the termination of the Newton
optimisation scheme ( omitting the label k where context permits):
2 (m̃)
Π(mk |S)Pk = e−χ exp(−(m − m̃)T C̃m
−1
(m − m̃)/2). (104)
Here χ2 (m̃) is the minimum of the objective function used in the optimiser,
evaluated at the termination point m̃. For completeness, we define this as
68
χ2 (m) = χ2full-seismic (m) + χ2iso (m) + χ2VSth (m) + χ2prior (m) + χ2fluid–prior
(105)
where
1 X
χ2full-seismic (m) =
2 stacks i
error–sampling points j
( )
ferror (Ssyn,j (m) − Sij )2 2
2
+ [log 2π + log(σs,i /ferror )]
σs,i
1 X (vj,p,eff (tj − tj−1 ) − ∆Zj )2
χ2iso (m) = 2
2
+ [log 2π + log(σ∆Z )]
2 isopachs j σ∆Z j
j
1 X
χ2VSth (m) =
2 applicable
base layers ip ,
target layers ji
( )
(∆Zi (m) − ∆Zji (m))2
+ [log 2π + log(σZ2 i )]
σZ2 i
1 −1 dm 1
χ2prior (m) = (m − m′ )T C ′ m (m − m′ ) + [ log 2π + log(|Cm′
|)]
2 2 2
χ2fluid–prior = − log(Pk ) (106)
The likelihood and priors are exactly as per the original equations (26),(27),(29)
(with (99), and (100) if vertical stacking is used), but with the normalisation
constants (terms in [] brackets) explicitly delared. The log 2π terms matter
since models of different dimensionality may be compared. All non–thickness
terms from vertical stacking, and/or fluid–contact terms (if contacts are part
of the model), are explicitly absorbed in the multi–Gaussian prior. If there is
′
no vertical stacking, we drop the primes on Cm , m′ , and omit χ2VSth .
The integration for the marginal model likelihood in equation (103), using this
approximation, is then straightforward, and yields
2 (m̃)
Π(k|S)Pk = (2π)dm /2 |C̃m |1/2 e−χ (107)
69
be meaningfully compared between different runs of the code, with different
modelling assumptions. Ratios of exponentials of -log(MML) can then be in-
terpreted directly as Bayes factors.
Simple comparisons, like the substitution of rock types, or variations in the cho-
sen net–to–gross models, should in general be meaningful. In situations where
severe pinchouts occur, the accuracy of the Laplace approximation may be
problematic, so careful investigations are recommended. For example, pseudo
parabolic cross sections through the mode optima can be dumped to ascii files
PosteriorXsection.* by invoking --Xsections 2, which is useful as a quick
check of the adequacy of the quadratic approximation.
70
Appendix 13: MAP point estimates and fluid model probabilities in
noisy problems
This “Occam factor” boosts the model probability for models that have diffuse
posterior distributions, relative to models that may have better maximum–
likelihood points, but narrow posteriors. Thus it is not necessarily true that
the model with the smallest χ2 fit is the most probable model, which may be
confusing to some users.
71
timing–offset}. Naively, we expect the best parameters to be b = 1, τ = 0, but
some interesting behaviour arises when the noise gets large.
We will assume Gaussian noise, even though the signal in this case is noise-
free. We will also use flat Bayesian priors on the parameters b, τ , but we need
a finite range on τ , so −1 < τ < 1 seems reasonable.
1.0
B(t)
0.8
0.6
0.4
b B(t-τ)
0.2
t
-2 -1 0 τ 1 2 3 4
72
1.0
0.5
0.0
0.5
1.0
0.0 0.5 1.0 1.5 2.0
b
Since we have in mind a problem where b might take on a few discrete values,
we might like to find the relative probability of those values by integrating
out all the other parameters in the problem. Thus, an interesting quantity to
compute is the marginal distribution of b,
2 /2σ 2 2
Z 1 e−N (1−b) (1 − e−N b/σ )
Π(b) = Π(b, τ )dτ ∼ .
−1 Nb
For N = 1.1, curves of Π(b) at various noise levels are shown in fig. 24. One
can see from this figure that for low noise levels, the “most–likely” value from
the marginal for b is close to 1. However, for high noise levels, it shrinks away
from 1 towards smaller values (but remembering that the uncertainty is rather
high). This comes from the extra probability mass from larger τ values that
are reasonable if the event amplitude is smaller, as per the asymmetry of the
joint density in fig. 23. Thus we can see that if the support of b was confined
to a few discrete values, the “most–likely” choices of b would fall on weaker
signals if the noise level is high.
73
Π(b) σ=0.6
σ=0.4
σ=0.2
σ=0.1
b
1.0 1.5 2.0
1.0
Π(τ|b)
0.8
0.6 b=1
0.4
b=0.25
0.2
τ
-1.0 -0.5 0.0 0.5 1.0
Fig. 25. Conditional distributions Π(τ |b) for b = 1, b = 0.25 (slices through Fig 23).
Both cases have comparable overall probability, even though the maximum likelihood
point for the case b = 1 seems dominant.
The phenomenon is generic. For example, one can do the calculation for the
“stretched box–car” model with a box–car synthetic plateau between τ1 <
t < τ2 , so free parameters are {b, τ1 , τ2 }. With everything as before, but just
including an extra marginalisation over τ2 as well as τ1 , the marginal for b is
much the same as before: see fig. 26.
74
Π(b)
12
10
6
σ=0.6
4
2 σ=0.4
σ=0.2
σ=0.1 b
1.0 1.5 2.0
Fig. 26. Marginal distributions of b for various noise levels, for stretched box–car
problem (τ1 , τ2 )
A few points are helpful to clarify the counter-intuitive nature of this be-
haviour. 1) The MML machinery is the most logical approach, granted the
assumptions of the noise model. The box–car functions used above are clearly
“error–free”, so our instinctive reaction as to what the right criterion should
be is formed by a model with very small noise levels – which we automatically
intuit when inspecting the data. Had the signal and synthetic been drawn with
noise deviates added which are consistent with the noise level in the likelihood
function, the MML procedure would seem much less counter-intuitive. Fig. 27
shows a cartoon of the sorts of signals we might be trying to model with, for
example, a noise level of σ = 0.5. 2) The behaviour is a fairly sensitive function
of the peak/noise–RMS ratio, and for most signals where the ratio is 4:1 or
better, the effect is weak. This is the usual regime we like to work in.
data
1.5
1.0
0.5
t
4 -2 2 4 6
- 0.5
- 1.0
Fig. 27. Example random telegraph signal realisations with boxcar truth case masked
by noise of roughly σ = 0.5
75
code 18 . The demo problem is a two–layer model of a flank of a hydrocarbon
deposit, with shale cap–rock, and a sand halfspace underneath. The “true”
fluid changes from brine to oil to gas as we go updip. For reference, the peak
seismic amplitude (gas) is about 0.15, and the oil and brine peaks are about
0.05 and 0.03 respectively. With small noise (σ = 0.015), the MAP model and
MAP fluid classification is “correct”, and shown in fig. 28. With the noise at
1.65E0 1.65E0
1.7E0 1.7E0
1.75E0 1.75E0
1.8E0 1.8E0
1.85E0 1.85E0
1.9E0 1.9E0
t
t
1.95E0 1.95E0
2E0 2E0
2.05E0 2.05E0
2.1E0 2.1E0
2.15E0 2.15E0
2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample
Fig. 28. Gas–oil flank problem. Left: true(red) and synthetic(black) MAP traces for
15 cdp locations, going updip. Right: fluid classification in lower reservoir under
shale (brown), for small noise level (peak/noise–rms≈10). Fluids codes: blue=brine,
red=oil, orange=gas.
σ = 0.08, the MAP classification misses oil, and puts oil where the gas is: see
fig. 29. The maximum likelihood solution, (or minimum χ2 ) solution, found
using --ML instead of --MAP makes the visual coherence of the fits closer, as
one would expect: see fig. 30. The misclassification of the MAP method is of
no particular significance in view of the uncertainty of the fluid type at these
noise levels: compare figs 31 and 32.
By way of comparison to the preceding box–car problem, for trace 15, which
has gas, at a noise level σ = 0.059, the ∆χ2 between the gas and oil cases is
again 1.3 (in favour of gas), but the posterior Bayes factor in favour of gas is
only 1.45, less than half of the likelihood ratio e1.3 ≈ 3.7.
13.3 Recommendations
There are clearly cases where, if the noise level is high enough, a model–
selection based on marginalising out free parameters will favour weaker signals
18 The file location is relative to the root of the source tree
76
1.65E0 1.65E0
1.7E0 1.7E0
1.75E0 1.75E0
1.8E0 1.8E0
1.85E0 1.85E0
1.9E0 1.9E0
t
t
1.95E0 1.95E0
2E0 2E0
2.05E0 2.05E0
2.1E0 2.1E0
2.15E0 2.15E0
2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample
Fig. 29. Gas–oil flank problem. MAP solution and fluid classification, for large noise
level σ = 0.08 (peak/noise–rms≈1.9). True seismic in red, MAP synthetic in black.
Fluids codes: blue=brine, red=oil, orange=gas.
1.65E0 1.65E0
1.7E0 1.7E0
1.75E0 1.75E0
1.8E0 1.8E0
1.85E0 1.85E0
1.9E0 1.9E0
t
1.95E0 1.95E0
2E0 2E0
2.05E0 2.05E0
2.1E0 2.1E0
2.15E0 2.15E0
2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample
Fig. 30. Gas–oil flank problem. Maximum likelihood solution and fluid classification,
for large noise level σ = 0.08.
77
1.65E0 1.65E0
1.7E0 1.7E0
1.75E0 1.75E0
1.8E0 1.8E0
1.85E0 1.85E0
1.9E0 1.9E0
t
t
1.95E0 1.95E0
2E0 2E0
2.05E0 2.05E0
2.1E0 2.1E0
2.15E0 2.15E0
2.2E0 2.2E0
0E0 1E2 2E2 3E2 4E2 5E2 6E2 7E2 8E2
seismic amplitudes sample
Fig. 31. Gas–oil flank problem. Posterior samples for small noise σ = 0.015. True seis-
mic in red, MAP synthetic in black. Fluids codes: blue=brine, red=oil, orange=gas.
1.65E0 1.65E0
1.7E0 1.7E0
1.75E0 1.75E0
1.8E0 1.8E0
1.85E0 1.85E0
1.9E0 1.9E0
t
1.95E0 1.95E0
2E0 2E0
2.05E0 2.05E0
2.1E0 2.1E0
2.15E0 2.15E0
2.2E0 2.2E0
0E0 1E2 2E2 3E2 4E2 5E2 6E2 7E2 8E2
seismic amplitudes sample
Fig. 32. Gas–oil flank problem. Posterior samples for large noise σ = 0.08. True seis-
mic in red, MAP synthetic in black. Fluids codes: blue=brine, red=oil, orange=gas.
of relying on point estimates: in this regime, the relative model probabilities
are so comparable that any forecast of interest must average over all of them
in some useful way.
Some users like to see maximum–likelihood (minimum χ2 ) fits, and are not
comfortable with the use of the “Occam” factor. A practical solution is to
provide each of these outputs:
• the model and parameters with the maximum likelihood point, omitting
marginalisation, obtained with the flag --ML.
78
• the MAP point of the most–likely model, as judged from the marginalised
evidence. Use --MAP.
• full samples from the joint posterior over all models and parameters, so any
forecast can take into account the uncertainty in both models and param-
eters. Typically this is some combination like -RWS -N 1000 with perhaps
--Raftery-auto-decimation
This way users have access to whichever quantities are of interest to them, and
are free to interpret as they please.
79
Appendix 14: Use of additional impermeable end–members in the
rock–physics
There are situations where the impermeable material in a reservoir layer may
be a composite of two or more impermeable facies, typically, for example, a
mixture of shale and a cemented facies. In such cases, it is useful to introduce
a third volumetric mixing fraction, and treat the impermeable rock composite
via effective medium theory in the same way we model the mix of reservoir and
non–reservoir material. To this end, we have defined an additional parameter
NGhard as the volume–fraction of impermeable material occupied by the second
impermeable (“hard”) rock type, as per the depiction:
NG
|{z} 1-NG
reservoir–endmember
✛ permeable ✲✛ impermeable ✲
1 − NG
| {z hard}
NG
| {zhard}
nonreservoir–endmember nonreservoir–hard–endmember
where M is either the p-wave (M ) or shear (µ) modulus, and the effective
impermeable density is
There are two modelling situations that are of interest. Either i) the hard rock is
known beforehand, and its properties and distributions are desired to be fixed
for the reservoir layer for all fluid models, or ii) the hard rock present may
vary in accordance with the fluid type present. This might be to reflect some
geochemistry, or perhaps a sleight–of–hand trick to get the hard rocks to occur
in vertical sequences with particular orderings, just as fluids do under density–
ordering. Thus, the hard rock is specified by (optional) XML fragments under
/inversion/model- description/* layer/nonreservoir hard end-member, and comes
in two flavours:
80
<sigma_net_to_gross_hard>0.1</sigma_net_to_gross_hard>
</nonreservoir_hard_endmember>
The prior model and distributions for the rock type are then referenced as
per a standard impermeable rock (see e.g. equation (8)).
• Fluid–dependent rock–type. In this case the XML structure is, for example
<nonreservoir_hard_endmember>
<fluid_dependent_nonreservoir_hard_endmembers>
<brine_case>rock_A</brine_case>
<gas_case>rock_B</gas_case>
<oil_case>rock_C</oil_case>
</fluid_dependent_nonreservoir_hard_endmembers>
<net_to_gross_hard>0.5</net_to_gross_hard>
<sigma_net_to_gross_hard>0.05</sigma_net_to_gross_hard>
</nonreservoir_hard_endmember>
Under this specification, the hard–rock used in the stochastic model for
this particular layer then varies according to the fluid state or configura-
tion. XML entries under /inversion/rock fluid properties/rock- properties/-
-nonreservoir- endmember are then expected for the rocks named under
brine case, oil case etc. This can be a useful tool for modelling discrete
facies–uncertainty if the rock–type uncertainty can be emulated by a fluid–
ordering model family.
In section 2.2.2 we have described the default mode of assembling discrete fluid
states to be examined in the inversion problem, and how the prior probabil-
ity of these states is constructed from the layer–based probabilities of fluids
and the density–ordering rules via equation (5). This algorithm guarantees
that the prior probability of the various fluid states is consistent with the
density–ordering effect. Thus, it invariably results in increasing probabilities
of lighter fluids (e.g. gas) when ascending a stack of communicating layers, and
conversely increasing heavy–fluid (e.g. brine) probabilities when descending.
81
· ¸
o g g
an obvious shorthand notation as o o g
, then the observed abundancy of gas
in the top layer ought to be 2/3, and 1/3 in the lower layer. In default mode,
this has to be achieved by setting the (pre–ordering) prior probabilities to 1/2
for each of oil and gas in each layer. For general observed abundances, reverse–
engineering the needed pre–ordering probabilities to match these post–ordering
target–abundances is a rather tedious fitting problem.
The code has an optional mode of operation controlled by the output/take fluid-
- probabilities as post ordering tag in the XML file, where these fluid–state
probabilities are recomputed after the usual recursive density–ordering has
enumerated the set of legal configurations, so as to generate the desired abun-
dances given by the layer–based probabilities. In other words, the prior fluid
probabilities are interpreted as post–ordering abundancy targets.
Some care is required in using this mode. Since the density–ordering induces
inequalities on these probabilities, it is possible for a user to enter a sequence
of fluid–probabilities that are inconsistent. For example, in the simple two–
layer case above, specifying a lower probability of gas in the top layer than
the bottom layer does this. Inconsistency will not occur if the abundances
are estimated by counting of samples and the ordering–model is correct, but
in general it might be expected to occur. Thus, the current implementation
seeks a bound–constrained least–squares fit of the target probabilities. For the
interested, the technical details of how this is obtained constitute the remainder
of this appendix.
82
so the new problem in terms of PR is
(F )
χ2FP = 12 ||X (F ) U PR − (P − X (F ) b)||2 ≡ 12 ||XR .PR − PR ||2 (113)
(F )
with XR = X (F ) U and PR = P − X (F ) b. This problem leaves only the
inequality constraints
NX
F −1
using the methods outlined in section 5.4 of Kelley (1987). For infeasible
points, the projection onto the polytope defined by (114) is a bit non-
trivial, so some care is required. Clearly a general SQP method would
be better for this problem, but (a) the problem is usually well–scaled, so
steepest descent is not too bad. (b) the dimensionality is often very low,
so the inefficiencies don’t matter (c) a well–formulated problem shouldn’t
enter this fallback method anyway, and (d) the fluid–state probability so-
lutions are cached, and often these do not have to be recomputed every
trace.
(3) Cache the problem solution, so if the problem is identical at the next
trace, the solution can be retrieved from cache very efficiently (a very
common case).
83