0% found this document useful (0 votes)
89 views83 pages

Delivery Paper

This document introduces Delivery, an open-source toolkit for Bayesian seismic inversion. Delivery uses a trace-local layer stack model with rock physics properties from logs and initial layer times from picks. It allows for uncertainty in fluid type and saturation in reservoir layers. Delivery implicitly performs full AVO inversion using approximate Zoeppritz equations by supporting multiple stacks. It captures uncertainties by generating stochastic models from the Bayesian posterior using MCMC. Delivery is written in Java and suited for Unix/Linux environments and cluster systems. It facilitates answering commercially useful questions about reservoirs from post-inversion analysis of stochastic models.

Uploaded by

L00king 4MAU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views83 pages

Delivery Paper

This document introduces Delivery, an open-source toolkit for Bayesian seismic inversion. Delivery uses a trace-local layer stack model with rock physics properties from logs and initial layer times from picks. It allows for uncertainty in fluid type and saturation in reservoir layers. Delivery implicitly performs full AVO inversion using approximate Zoeppritz equations by supporting multiple stacks. It captures uncertainties by generating stochastic models from the Bayesian posterior using MCMC. Delivery is written in Java and suited for Unix/Linux environments and cluster systems. It facilitates answering commercially useful questions about reservoirs from post-inversion analysis of stochastic models.

Uploaded by

L00king 4MAU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Delivery: an open–source model–based

Bayesian seismic inversion program ⋆

James Gunning a,∗ , Michael E. Glinsky b


a CSIRO Division of Petroleum Resources, Ian Wark Laboratory, Bayview Ave.,
Clayton, Victoria, 3168, Australia, ph. +61 3 9545 8396, fax +61 3 9545 8380
b BHP Billiton Petroleum, 1360 Post Oak Boulevard Suite 150, Houston, Texas
77056, USA

Abstract

We introduce a new open–source toolkit for model–based Bayesian seismic inver-


sion called Delivery. The prior model in Delivery is a trace–local layer stack,
with rock physics information taken from log analysis and layer times initialised
from picks. We allow for uncertainty in both the fluid type and saturation in reser-
voir layers: variation in seismic responses due to fluid effects are taken into account
via Gassman’s equation. Multiple stacks are supported, so the software implicitly
performs a full AVO inversion using approximate Zoeppritz equations. The likeli-
hood function is formed from a convolutional model with specified wavelet(s) and
noise level(s). Uncertainties and irresolvabilities in the inverted models are captured
by the generation of multiple stochastic models from the Bayesian posterior (using
Markov Chain Monte Carlo methods), all of which acceptably match the seismic
data, log data, and rough initial picks of the horizons. Post-inversion analysis of
the inverted stochastic models then facilitates the answering of commercially use-
ful questions, e.g. the probability of hydrocarbons, the expected reservoir volume
and its uncertainty, and the distribution of net sand. Delivery is written in java,
and thus platform independent, but the SU data backbone makes the inversion
particularly suited to Unix/Linux environments and cluster systems.

Key words: seismic, inversion, AVO, Bayesian, stochastic, geostatistics, Markov


Chain Monte Carlo, open–source,

⋆ Updated manuscript with additional Appendices from citation (Gunning and Glin-
sky, 2004). Please cite as both (Gunning and Glinsky, 2004) and the website.
∗ Corresponding author.
Email address: [email protected] (James Gunning).

Article originally published in Computers & Geosciences 30 (2004)


Contents

1 Introduction 5

2 Outline of the model 7

2.1 Description and notation of the local layer model 9

2.2 Construction of the prior 11

3 The forward model 15

3.1 Computing the synthetic seismic 16

3.2 Isopach constraints 19

4 Sampling from the posterior 20

4.1 Mode location and local covariance 22

4.2 Mode enumeration 23

4.3 Sampling 25

5 The software 28

6 Examples 29

6.1 Sand Wedge 29

6.2 Net–to–gross wedge 30

6.3 Simple single–trace fluid detection problem 33

6.4 Field Example 33

7 Conclusions 34

Appendix 1: Typical computation of effective rock properties 38

Appendix 2: Error sampling rates 39

Appendix 3: Mode–location starting points 39

Appendix 4: An Independence Sampler 41

Appendix 5: Modified SU trace formats for properties 42

5.1 Model–prior trace formats 42

2
5.2 Stochastic trace outputs 42

Appendix 6: Multiple modes and fluid states in the MCMC algorithms 43

Appendix 7: Wavelet format 44

Appendix 8: Usage of the code 45

8.1 Inputs 45

8.2 Outputs 46

Appendix 9: Incorporation of Floating Grain models into Delivery 48

9.1 Floating grain effects 48

9.2 Recommended methodology 49

9.3 Conversion to Delivery formats 50

9.4 Rock trends 51

9.5 Model system 52

9.6 Inversion Analysis of Posteriors 54

9.7 Delivery Implementation details 55

9.8 Inputs/Outputs 57

Appendix 10: Vertical model–stacking functionality 58

Appendix 11: Internal fluid–contact modelling 61

11.1 Upscaling rules 63

11.2 New outputs in the realisations SU file 64

11.3 Example Inversion 65

11.4 Likely Issues 66

Appendix 12: Model selection facilities 68

Appendix 13: MAP point estimates and fluid model probabilities 71

13.1 An illustrative toy problem 71

13.2 Demonstration on the Gas–Oil–Flank toy Delivery problem 75

3
13.3 Recommendations 76

Appendix 14: Use of additional impermeable end–members 80

Appendix 15: Alternative settings for fluid and fluid–state probabilities 81

4
1 Introduction

The basic purpose of the seismic experiment in the oil exploration business has
always been the extraction of information regarding the location, size and na-
ture of hydrocarbon reserves in the subsurface. To say this is to also grant that
the analysis of seismic data is necessarily and always an inversion problem:
we do not measure reservoir locations and sizes; we measure reflected wave-
forms at the surface, from which information about potential reservoir zones
is extracted by a sequence of processing steps (stacking, migration etc) that
are fundamentally inversion calculations designed to put reflection responses
at their correct positions in time and space.

Such inversion calculations invariably depend on assumptions about the char-


acter of the geological heterogeneity that are usually described informally
(“gentle dips”, “weak impedance contrasts” etc), but could equally well be
couched in formal probabilistic language. Further, such assumptions often hold
well across a range of geological environments, and are of a nature that leads
to generic processing formulae (e.g. Kirchoff migration) that may be applied
by the practitioner with merely formal assent to the assumptions underlying
their validity. From this point of view, we can say that traditional inversion
is fundamentally based on a model of the subsurface, but no particular details
of that model appear explicitly in the resulting formulae. These inversions can
also be claimed to have integrated certain portions of our geological knowledge,
in that the empowering assumptions have been made plausible by observation
and experience of typical geological structures.

By this line of argument, traditional inversion can be seen as an attempt to


explain the observed surface data by appeal to a broad (but definite) model of
the subsurface, which yields congenial formulae when we incorporate certain
geological or rock-physics facts (“weak reflections” etc). It is then no great step
of faith to assert that inversion techniques ought to incorporate more definite
forms of knowledge about the subsurface structure, such as explicit surfaces or
bodies comprised of certain known rock types whose loading behaviour may
be well characterised.

Such a step is in fact no longer contentious: it has long been agreed in the geo-
physical community that seismic inversion tools ought to use whatever rock–
physics knowledge is regionally available in order to constrain the considerable
non–uniqueness encountered in traditional inversion methodologies. Further,
in an exploration or early–appraisal context, some limited log or core data are
usually available, from which additional constraints on likely fluid types, time
and depth horizons etc can be constructed.

In practice, such knowledge is difficult to incorporate into “model–free” meth-

5
ods like sparse–spike inversion, even though these methods have a sound geo-
physical basis. Conversely, detailed multi–property models of the reservoir –
such as geostatistical earth models – are often weak in their connection to
geophysical principles: the relationship to seismic data is often embodied in
arbitrary regression or neural–network mappings that implicitly hope that the
correct rock physics has been divined in the calibration process.

Another long–settled consensus is that the inversion process is incurably non–


unique: the question of interest is no longer “what is such and such a quantity”,
but “what is the multi–component distribution of the quantities of interest”
(fluids, pay zone rock volume etc) and the implications of this distribution
for development decisions. The development of this distribution by the inte-
gration of information from disparate sources and disciplines with the seismic
data is now the central problem of seismic inversion. An interesting aspect of
this consensus is that most workers in seismic inversion (and closely related
problems like reservoir history matching) now concede that the most satis-
factory way to approach the non–uniqueness and data–integration problems
is via a Bayesian formalism 1 . These approaches use explicit models for the
quantities of interest: typically a suite of layers or facies, whose location and
internal properties are the properties we wish to invert for. Examples of such
work are Omre and Tjelmeland (1997), Eide et al. (2002), Buland and Omre
(2000), Buland et al. (2003), Eidsvik et al. (2002), Leguijt (2001) and Gunning
(2000) in the context of seismic problems, and the many papers of Oliver and
his school in the context of reservoir dynamics problems, e.g. Chu et al. (1995),
Oliver (1997), Oliver (1996). Newer kinds of Bayesian inversion are also ap-
pearing, where the model uncertainty itself (e.g. the number of layers) is taken
into account, e.g. Malinverno (2002). It is recognised also that the posterior
distribution of interest is usually quite complicated and impossible to extract
analytically; its impact on decisions will have to be made from Monte Carlo
type studies based on samples drawn from the posterior.

Nevertheless, the use of model–based Bayesian techniques in seismic inversion


at this point in time is still novel or unusual, for a mixture of reasons. The
main reason is undoubtedly the lack of accessible software for performing such
inversions, and the associated lack of fine control of such systems even when
they are available as commercial products. Bayesian inversion methodologies
are unlikely to ever become “black–box” routines which practitioners can apply
blindly, and successful inversions will usually be the result of some iterative
process involving some adjustment of the model–prior parameters and choice
of algorithms. Such flexibility is hard to achieve in “black–box” algorithms.
A second reason is that the amount of effort required to construct a suitable

1 other approaches to stabilising the inverse problem via regularisation or parameter


elimination are perhaps best seen as varieties of Bayesian priors, perhaps augmented
with parsimony–inducing devices like the Bayes information criterion.

6
prior model of the rock physics and geological surfaces of interest will always
be non–trivial, though repeated experience of such an exercise will reduce the
required effort with time. This effort is justified by the fact that a Bayesian
inversion constructed around an appropriate prior will always produce more
reliable predictions than an inversion technique which does not integrate the
regional rock physics or geological knowledge. This fact will be obvious if we
do the thought experiment of asking what happens when the seismic data
has poor signal to noise ratio. We assert that the use of Bayesian model–
based inversion techniques should become far more widespread once the first
mentioned obstacle above is overcome.

We do not presume in this paper to offer even a partial critique of non–model


based inversion techniques from a Bayesian point of view: the reader will be
able to do this for themselves after consideration of the methods and principles
outlined below. The aim of this paper is rather to introduce a new open–source
software tool Delivery for Bayesian seismic inversion, and demonstrate how
this code implements a model–based Bayesian approach to the inversion prob-
lem. The software described in this paper is a trace–based inversion routine,
and is designed to draw stochastic samples from the posterior distribution of
a set of reservoir parameters that are salient for both reservoir volumetrics
and geophysical modelling. The exposition will cover the choice of the model
parameters, construction of the prior model, the development of the forward
seismic model and the associated likelihood functions, discuss the mapping
of the posterior density, and sketch the methods used for sampling stochastic
realisations from the posterior density. The latter involves use of sophisticated
Markov Chain Monte Carlo (MCMC) techniques for multiple models that are
relatively new in the petroleum industry.

The main content of the paper is laid out as follows; in section 2 the overall
framework and design of the inversion problem is outlined. Section 2.1 describes
the basic model and suitable notation, section 2.2 outlines the construction of
the prior model, and section 3 describes the forward model and associated
likelihood. Section 4 covers the problems of mode mapping and sampling from
the posterior. An outline of the software is provided in section 5: it is released
under a generic open–source licence rather like the popular GNU and open–
BSD style licenses. A discussion of a suite of example/test cases is given in
section 6, and conclusions are offered in section 7.

2 Outline of the model

The inversion routine described in this paper is a trace–based algorithm, de-


signed to operate in a computing environment where a local seismic trace (usu-
ally post-stack, post-migration) in Seismic Unix (SU) format (Cohen and Stockwell,

7
1998) is piped to the routine in conjunction with a set of parameters describing
the local prior model, also in SU format. Stochastic realisations are then drawn
from the posterior and written out, also in SU format. The details of these for-
mats are discussed in section 5. Inverting on a trace-by–trace basis amounts
to an assumption that the traces are independent, and this can be guaranteed
by decimating the seismic data to a transverse sampling scale equal to the
longest transverse correlation appropriate for the local geology. Spacings of a
few hundred metres may be appropriate. Working with independent traces has
the great advantage that the inversion calculation is massively parallelizable,
and the computation may be farmed out by scatter–gather operations on clus-
ter systems. Finer scale models may then be reconstructed by interpolation if
desired.

Inversion calculations on systems with inter–trace correlations are very difficult


to sample from rigorously. Sketches of a suitable theory are contained in Eide
(1997), Eide et al. (2002), Abrahamsen et al. (1997), Huage et al. (1998) and Gunning
(2000), from which we offer the following potted summary. If the variables to be
inverted are jointly multivariate Gaussian, some analytical work can be done
which yields a sequential trace–based algorithm, but the matrix sizes required
to account for correlations from adjacent traces are very large. Models which
use non–Gaussian models for distributing facies (e.g. the indicator model used
in Gunning (2000)) require methods that involve multiple inversions over the
whole field in order to develop certain necessary marginal distributions. These
calculations are very demanding, even for purely multi–Gaussian models.

From another point of view, the sheer difficulty of rigorously sampling from
inter–trace correlated inversion problems is the price of modelling at a scale
finer than the transverse correlation length of the sediments (and/or surfaces)
of interest, which is commonly several hundred metres or more. We know from
the Nyquist theorem that any random signal can be largely reconstructed by
sampling at the Nyquist rate corresponding to this correlation length, and
intermediate values can be recovered by smooth interpolation. This is a strong
argument for performing inversion studies at a coarser scale than the fine scale
(say 10-30m) associated with the acquisition geometry 2

By assuming the inter–trace correlation is negligible, the inversion can proceed


on an independent trace basis, and the incorporation of non–linear effects like
fluid substitution, and discrete components of the parameter space (what type
of fluid, presence or absence of a layer etc) become computationally feasible.
In short, the correct sampling from systems with inter–trace correlations is
2 The choise of the transverse sampling rate depends also on the form of the trans-
verse correlation dependence, since, e.g. smooth surfaces are better reconstructed
by interpolation than noisy ones. This requires some judgement. Possible correla-
tion lengths appropriate to various geological environments are discussed in Deutsch
(2002).

8
probably only possible in systems with fully multi–Gaussian distributions of
properties, but such a restriction is too great when we wish to study systems
with strong nonlinear effect like fluid substitution and discrete components like
layer pinchouts or uncertain fluids. The latter, more interesting problems only
become possible if we reduce the transverse sampling rate.

The model used in this inversion is somewhat coarser than that commonly
used in geocellular models. At each trace location, the time–region of interest
is regarded as a stack of layers, typically a metre to several tens of metres
in size. Each layer is generically a mixture of two “end-member” rock types:
a permeable reservoir member, and an impermeable non–reservoir member.
the balance of these two is determined by a layer net–to–gross (NG ), and the
internal structure of mixed layers (0 < NG < 1) is assumed to be finely hori-
zontally laminated, so acoustic properties can be computed using an effective
medium theory appropriate for this assumption. Seismic energy reflects at the
boundaries between layers, producing a surface signal that may be syntheti-
cally computed, given an incident wavelet and all the requisite rock properties.

2.1 Description and notation of the local layer model

At the current trace location, the set of rock layers in the inversion region is
fundamentally modelled in time rather than depth. The depth d enters as a
relatively weakly controlling parameter of the rock properties, but time is a
fundamental variable in terms of computing seismic responses, so is the better
choice of variable for the basic parameterisation.

Models in depth are more fundamentally useful for reservoir decision making,
but we adopt a flow wherein depth models can be created from any time
model by a simple post–processing step. Depth constraints can take the form
of thickness constraints or absolute depth constraints, and both of these can be
applied either through the prior parameters or the isopach criterion we discuss
later. Either way, the generation of depth models will require the specification
of a reference layer, from which all layer depths will be hung or supported
as required. This scheme ensures that depth and time models are mutually
consistent.

The model consists of Nl layers, with ti the top of layer i. Layer i is bounded
by the times (ti , ti+1 ), i = 1...Nl . An additional parameter, tbase , is required to
specified the bottom of the model. Figure 1 shows a cartoon of the model.

Each layer is modelled as a mixture of two finely–laminated end–member rock


types; a permeable member like sand or carbonate, and an impermeable mem-
ber, such as shale or mudstone. The subscript f is used generically to denote
these facies, but also s for the permeable member (think “sand”) and m for the

9
Reflection coefficients Synthetic seismic

Model parameters per la


yer:
wavelet
m = {ttop, NG,f R, vp,R, vs,R, r NR, vp,NR, vs,NR, r b,vp,b,r h,vp,h,Sh} shale
R= reservoir rocks
NR = impermeable rocks oil-filled laminated sand
b= brine
brine-filled laminated sand
h= hydrocarbon
shale
Full suite of model par
ameters: brine-filled laminated sand
M = {m1,m2,m3,...}
shale

t t

Fig. 1. Schematic of the local layer–based model and its notation. For description
of the model parameters see the main text. The reflection coefficient sequence and
synthetic seismic are part of the forward model of section 3.

impermeable (“mud”). The net-to-gross NG specifies the ratio of permeable to


impermeable rock by volume. Pure shales or other impermeable rocks can be
modelled by NG = 0. Hydrocarbons may be present only in the permeable
members of the laminated mixture. A more elaborate model that also permits
a second impermeable rock type is described in Appendix 14.

The properties of permeable rocks that we explicitly model are the p–wave
velocity vp,s , the shear velocity vs,s , and porosity φs , but for impermeable
members we use vp,m , vs,m , and density ρm . Facies are assumed isotropic. These
rock properties are in general controlled by a loading curve which depends
primarily on depth but also possibly a low–frequency interval velocity (LFIV)
(derived perhaps from the migration), as explained in section 2.2.3.

Permeable members that are susceptible of fluid substitution will also require
knowledge of the dry matrix grain properties, and saturations, densities and p–
wave velocities of the fluids undergoing substitution. For a particular rock type,
the grain properties are taken as known, but the fluid saturations, densities
and velocities can form part of the stochastic model.

The set of parameters describing the full acoustic properties for layer i, bounded
by times ti−1 , ti , with hydrocarbon h present, is then

m = {d, LFIV, NG , φs , vp,s , vs,s , ρm , vp,m , vs,m , ρb , vp,b , ρh , vp,h , Sh }, f = s,m,


(1)

at each trace. A i subscript is implicit for all quantities. If the layer is a pure
impermeable rock (NG = 0), this simplifies to

m = {d, LFIV, ρm , vp,m , vs,m }. (2)

The union of these parameters with the layer times ti (and the final bottom-
layer base time tbase ) then forms the full model vector of interest. Since the
times are the dominant variables, it is convenient to arrange them to occur

10
first in the full model vector, so this is assembled as

m = {t1 , t2 , . . . , tNl , tNl ,base , mlayer–1 , mlayer–2 , . . . mlayer–Nl , A, B}. (3)

The extra stochastic factors A, B affect the synthetic seismic and are explained
in section 3.1.4. Hereon we use dm for the dimensionality of the model m.

An additional feature is that some parameters may be common to several


layers (e.g. in many cases the LFIV is an average across many layers), so the
underlying model vector will map all duplicate instances to the one variable
(so, e.g. there may be only one LFIV in the overall model vector).

Also, it is sometimes useful to “synchronise” the rock–matrix properties be-


tween two layers, so that if they have the same end–members, acoustic con-
trasts at the interface must be entirely due to fluid effects. In this case the
underlying parametrisation will map all rock-matrix affecting parameters in
the lower layer to those of the layer above, and remove all redundancies in the
model vector.

2.2 Construction of the prior

2.2.1 Layer times

Before inversion, the times ti are not known precisely, but are modelled as
stochastic variables with prior distributions N (t¯i , σt2i ). The mean and standard
deviation are estimated from horizon picking and approximate uncertainty es-
timates (e.g. a half–loop or so may be chosen for the σti ). The prior distribution
on layer times is supplemented by an ordering criterion which says that layers
cannot swap:
ti ≥ ti−1 , (4)
so the prior for the layer times is actually a product of truncated Gaussians
with delta functions at the endpoints absorbing the mis–ordered configurations.
A typical scenario is that the top and bottom of a large packet of layers have
been picked reasonably accurately, but the time–location of the intermediate
surfaces is not well known.

The truncation rules allow the useful possibility of pinchouts. For example,
layer 2 may disappear if we have t3 = t2 , in which case the interface at this
time comprises a contrast between layer 1 properties and layer 3 properties,
providing t3 > t1 . Modelling pinchouts in this way then requires a specific
algorithm for determining where the pinchouts occur from a given set of (un-
truncated) layer times ti . We have used this scheme:

(1) Form a sort order based on the layer prior time distributions uncertain-

11
ties σti (sorting in increasing order), and from the top layer down as a
secondary sort order if some σti ’s are identical.
(2) From the set of candidate time samples {ti } (which will not in general
satisfy (4)), proceed to fix the times in the sort order above, truncating
values to preserve the ordering criterion (4) as we proceed. For example,
if the sort order is {1,4,2,3}, and we have a set t1 < t4 < t3 < t2 , this will
end up truncated at t1 , (t2 = t4 ), (t3 = t4 ), t4 .

This recipe is designed to allow the “better picked” horizons higher priority in
setting the layer boundary sequence.

Sometimes it is desirable to force a layer to permanently pinchout in some


region of space, and thus effectively drop out of the model: see Appendix 5.1
for a discussion.

2.2.2 Prior beliefs about hydrocarbons

The modeller will have formed beliefs about the probability of certain kinds
of hydrocarbons in each layer, informed by non-seismic sources such as pres-
sure data or resistivity logs. A relative prior probability is assigned to each
hydrocarbon type in each layer on this basis. Specifically, each layer i may
bear fluids; oil (o), gas (g), brine (b), or low–saturation gas (l). The modeller
must specify the prior probabilities of each of these phases, on a layer basis,
as Fio , Fig , Fib , Fil respectively, with Fio + Fig + Fib + Fil = 1.

Depending on the likely hydraulic communication between layers, the hydro-


carbons allowed to be present in the permeable layers may be subjected to a
density–ordering criterion, e.g. oil is not permitted above gas in two adjacent
permeable layers. At least three types of density ordering rule can be envisaged:

(1) None. Any fluids are allowed in any permeable layer.


(2) Partial. Fluids are density ordered for all adjacent permeable layers not
separated by an impermeable layer.
(3) Full. Fluids are density ordered across the entire reservoir model, regard-
less of whether there are impermeable layers separating permeable ones.

The default density ordering is {b, l, o, g}, with low–saturation gas placed
between brine and oil because it usually arises as a residual saturation trail
behind migrating hydrocarbons.

The set of possible fluids in each layer (as implied by the prior probabilities)
are combined with a rule for density ordering to then enumerate a discrete set
of possible fluid combinations k = 1 . . . NF . 3 Suppose the fluid–combination

3 E.g. a two layer system under ordering rule (3), where it is known that gas (and

12
k corresponds to the set of fluid labels fik ∈ {b, l, o, g}, i = 1 . . . Nl . Then the
prior probability of this fluid–combination is taken to be
Q
Fi,f
Pk = PNF i Q ik . (5)
k′ =1 i Fi,fik′

Note that in multi–layer systems, this makes the marginal prior probability
of obtaining a certain fluid in a given layer quite different to the prior prob-
ability specified on a layer basis, simply because of the ordering criterion 4 .
In subsequent discussions, the prior probability of each distinct fluid combi-
nation k enumerated in this way is denoted Pk . Sometimes, it is desirable to
set fluid probabilities based on data observed after the density–ordering mech-
anisms are applied, or, equivalently, “post–ordering joint prior probabilities”.
Alternative mechanisms to achieve this end are described in Appendix 15.

Supplementing this fluid categorisation, it is desirable also to model the fluid


saturations Sif , f = o, g, b, l for each fluid type. The two phases present are
taken as brine (b) & hydrocarbon (o,g,l). The petrophysicist can assign Gaus-
sian stochastic priors Sif ∼ N (Sif , σSif ), truncated 5 outside [0, 1] to these
saturation parameters, based on regional knowledge.

2.2.3 Prior information about rock properties

Net–to–gross

2
The net–to–gross prior distribution is taken as N (NG , σN G
), truncated within
[0, 1]. The mean and standard deviation of this distribution can be determined
by consultation with the geologist. A broad prior may be used to reflect un-
certain knowledge.

low sat. gas) cannot occur (Fig = Fil = 0), may have the allowable set {(brine,
brine) : (oil, brine) : (oil, oil)}, so NF = 3.
4 For example, in the two–layer problem of the preceeding footnote, if the prior

probability of oil in each layer had been specified as, say, 0.6, the 3 combina-
tions would have probabilities proportional to 0.42 , 0.6 × 0.4, 0.62 respectively, or
0.21, 0.316, 0.474 after normalisation, so the post–ordering prior probability of oil in
layer 2 is 0.474, and in layer 1 is 0.316 + 0.474 = 0.79. For an alternative style of
fluid probability specification, see Appendix 15
5 The algorithms in this code treat truncated Gaussian distributions as a mixture

of a true truncated Gaussian plus delta functions at the trunction points, of weight
equal to the truncated probability. This has the advantage of giving non–vanishing
probability to certain physically reasonable scenarios, like NG = 0, or a layer pin-
chout.

13
Trend curves

From logging information in wells drilled through intervals deemed to be repre-


sentative of the rock behaviour in the region to be inverted, a set of regression
relationships for useful acoustic properties in all the facies is developed. The
points used in these local “trend curves” are computed using a known reference
fluid (typically, a brine) in place of the in–situ fluid, and the basic acoustic
properties (ρ, vp , vs ) of these reference fluids have to be established (the refer-
ence fluid may vary with facies for various reasons). These “reference–fluid”
properties may also be slightly uncertain, with Normal distributions modelling
their variation about a mean.

The trend curves are a set of rock-matrix velocities (p–wave, s–wave) vpf ,
vsf , and density ρf (or porosity φf ) regressions for all end-member rocks. For
reservoir–members, the porosity is used in preference to the density 6 , and the
set of regressions is

φf = (Aφ + Bφ vpf ) ± σφf


vsf = (Avs + Bvs vpf ) ± σsf (7)
vpf = (Avp + Bvp d + Cvp LFIV) ± σpf ,

where d=depth. The last equation models vp as linear in d = depth (com-


paction etc) and linear in the low–frequency interval velocity (LFIV); a lo-
cal mean vertical p-wave velocity obtained from pure seismic data like VSP,
moveout or stacking considerations. Such a regression can capture most strati-
graphic, compaction and overpressure effects.

For impermeable rocks, the porosity is of no direct interest, and the density
is regressed directly on vpf using a Gardner–Gardner–Gregory (GGG) type
relationship:

B
log ρf = (log Aρ + Bρ log vpf ) ± σρf or: ρf = Aρ vpfρ ± σρf
vsf = (Avs + Bvs vpf ) ± σsf (8)
vpf = (Avp + Bvp × depth + Cvp × LFIV) ± σpf .

Typically, Bρ ≈ 0.25. Since the range of densities and velocities in a single


rock type is not large, this GGG type regression can also be cast as a linear
regression over a suitable range.
6 Density is computed from

ρsat = (1 − φ)ρg + φρfluid (6)

for permeable members.

14
The regression errors (σpf ,σsf , . . . ) used in these relationships are the predic-
tion errors formed from linear regression studies, which yield t–distributions
for the predictive distribution. We approximate this result by taking the prior
to be of Normal form, with variance set to the regression variance. For exam-
2
ple, the prior for vp,f is N (Avp + Bvp d + Cvp LFIV, σpf ). This approximation is
exact in the limit of large data.

In most cases, the possibility of a rock affecting the elastic properties of a


layer introduces 3 new parameters to the inverse model. Sometimes we use
simple point–distributions focussed at a particular depth range, where we set
Bvp = Cvp = Bvs = Bρ,φ = 0, and choose the intercept (A∗ ) and variance (σ∗ )
terms to match observed means and variances from available core or log data
of vp , vs , ρ. Such a case has an uncorrelated prior, but still 3 free parameters.
Occasionally, one wants to introduce a rock with completely fixed parameters.
This can be achieved by supplementing these uncorrelated priors with the
zero–dispersion setting σp = σs = 0, σρ,φ = 0.

2.2.4 Fluid properties

We have also, from measurements or prior information, Gaussian prior distri-


butions for the fluid p–wave velocities N (vp , σvp ) and densities N (ρ, σρ ) (from
which the bulk moduli distribution can be computed) for the reference brines
and any possible hydrocarbons.

3 The forward model

The Bayesian paradigm requires a likelihood function which specifies how prob-
able the data are, given a particular model. This requires calculation of a syn-
thetic seismic trace from the suite of layers and their properties, and forming
the likelihood by comparing the seismic data and the synthetic seismic. The
forward seismic model is a simple convolutional model, which treats layers as
isotropic homogeneous entities with effective properties computed from the
successive application of Gassman fluid substitution in the permeable rocks
and Backus averaging with the impermeable rocks.

There may be additional constraints in the form of isopach specifications, where


a particular layer–thickness is known to within some error by direct observa-
tion, or other source of knowledge. The isopach constraint, being independent
from the seismic, forms its own likelihood, and the product of the synthetic
seismic likelihood and the isopach likelihood form the overall likelihood for the
problem.

15
3.1 Computing the synthetic seismic

3.1.1 Rock mixtures

When two different isotropic rocks are mixed at a fine (sub–seismic resolu-
tion) scale with strong horizontal layering, it is well known that the effective
acoustic properties of the mixture can be computed using the Backus average
(Mavko et al. (1998), p. 92). This assumes that the wavelengths are long com-
pared to the fine–scale laminations represented by the net–to–gross measure.
The standard formulae are
1 NG 1 − NG
= + , (9)
Meff Mpermeable Mimpermeable

where M can stand for either the p-wave (M ) or shear (µ) modulus (µeff =
2
ρeff vs,eff , Meff = Keff + 43 µeff = ρeff vp,eff
2
), and the effective density is

ρeff = NG ρpermeable + (1 − NG )ρimpermeable . (10)

3.1.2 Fluid substitution in permeable rocks

When the saturation of a particular hydrocarbon is not trivially 0 or 1, we


take the hydrocarbon and brine phases to be well mixed on the finest scale, so
the fluid can be treated as an effective fluid (see p. 205, Mavko et al. (1998))
whose properties are computed using the Reuss average

−1 Six 1 − Six
Kfluid = + . (11)
Kix Kib

When the effective fluid replaces brine in the pore space, the saturated rock has
effective elastic parameters that are computed from the usual low-frequency
Gassman rules (Mavko et al., 1998): viz

Ksat Kdry Kfluid


= + . (12)
Kg − Ksat Kg − Kdry φs (Kg − Kfluid )

The Mpermeable in the Backus relation (9) will be the p–wave modulus for perme-
able rock after fluid substution via Gassman (only the permeable end–member
will undergo fluid substitution). Here Kdry is the dry rock bulk modulus, Kg is
the bulk modulus of rock mineral grains, Kfluid the bulk modulus of the substi-
tuted fluid (gas/oil/brine/low-sat. gas), and Ksat the effective bulk modulus of
the saturated rock. The shear modulus µ is unchanged by fluid substitution.

Under replacement of the reference fluid b by a fluid fl (a two–phase mix of

16
7
brine and hydrocarbon h), the Gassman law can be written

Ksat,fl Ksat,b Kfl Kb


− = − . (13)
Kg − Ksat,fl Kg − Ksat,b φs (Kg − Kfl ) φs (Kg − Kb )

3.1.3 Typical computation sequence

A typical computation sequence for computing the set of effective properties for
a laminated, fluid–substituted rock layer would run as described in Appendix 1.
Prior to computing the synthetic seismic, the effective properties of all the
layers must be computed following this recipe.

3.1.4 The synthetic seismic and seismic–likelihood

Given a wavelet w, an observed seismic S, and an estimate of the seismic noise


power σs2 , we can use the reflectivities R associated with effective–property
contrasts between layers to construct the synthetic seismic appropriate to any
particular stack. The synthetic is taken to be

Ssyn ≡ w ∗ R, (14)

where we use an FFT for the convolution, and w and R will be discretised at
the same sampling rate ∆t as the seismic data set S for the trace. The set of
delta functions in R are projected onto the discretised time–grid using a 4–
point Lagrange interpolation scheme (Abramowitz and Stegun, 1965) based on
the nearest 4 samples to the time of a spike. This ensures the synthetic seismic
has smooth derivatives with respect to the layer times, a crucial property in
the minimisation routines that are described in section 4.1.

Available data traces S may be near or far-offset (or both), and an appropriate
wavelet w will be provided for each. The P–P reflection coefficient for small
layer constrasts and incident angles θ is 8
 ³ ´
à ! ∆ρ 2 ∆v s
1 ∆ρ ∆v p ∆v p 2 vs 2 ρ
+ vs
Rpp = + + θ2  − , (15)
2 ρ vp 2 vp vp 2

with

7 Assuming no pressure effects on the dry modulus Kdry .


8 From the small–contrast Zoeppritz equation for Rpp (θ), expanded to O(θ2 ) (top
of p.63, Mavko (1998))

17
ρ = (ρ1 + ρ2 )/2 (16)
vp = (vp,1 + vp,2 )/2 (17)
vs = (vs,1 + vs,2 )/2 (18)
∆ρ = ρ2 − ρ1 (19)
∆vp = vp,2 − vp,1 (20)
∆vs = vs,2 − vs,1 (21)

and layer 2 below layer 1. All properties in these formulae are effective prop-
erties obtained as per Appendix 1. The coefficient of θ2 is usually called the
AVO gradient.

Due to anisotropy and other effects related to background AVO rota-


tion (Castagna and Backus, 1993), some corrections to this expression may
be required, so the reflection coefficient will take the form

 ³ ´
à ! 2 ∆ρ 2 ∆v s
A ∆ρ ∆v p ∆v p 2 vs ρ
+ vs
Rpp (A, B) = + + Bθ2  − , (22)
2 ρ vp 2 vp vp 2

where the factors A and B may be stochastic, typically Gaussian with means
close to unity and small variances: A ∼ N (Ā, σA ), B ∼ N (B̄, σB ) 9 .

Explicitly, θ2 for a given stack is obtained from the Dix equation. The stack
has average stack velocity Vst , event–time Tst and mean–square stack offset

3 3
(Xst,max − Xst,min )
hXst2 i = , (23)
3(Xst,max − Xst,min )

from which the θ2 at a given interface is computed as

2
vp,1
θ2 = . (24)
Vst4 Tst2 /hXst2 i

Comparable forms of this reflectivity for PS converted–wave data are also


available(Mavko et al., 1998). Here we make the approximation that the PS
data – imaged in PP time – is also modelled by a convolution with a suitable
wavelet. With β = vs /vp , the linearisation in small contrasts and out to O(θ3 )
is

9 In some cases, there are uncertainties with the overall wavelet scale. The facility
to lock A and B also exists, so B = A ∼ N (Ā, σA ) is then used in the prior.

18
à !
θ ∆ρ ∆ρ ∆vs
Rps = − + 2β( +2 )
2 ρ ρ vs
" #
θ3 2 ∆ρ ∆vs
+ (1 + 8β + 9β ) + β(16 + 24β) . (25)
12 ρ vs

The likelihood function associated with the synthetic seismic mismatch is con-
structed as
X
Lseis = exp(−ferror (Ssyn − S)2 /2σs2 )), (26)
error–sampling points

where the range of the error–sampling points is computed from the prior mean–
layer time range and the wavelet characteristics as per figure 2. Within this
range, the points used in the error sum are spaced at the maximum multiple
of the seismic sampling time p∆t which is smaller than the optimal error
sampling time ∆Ts derived in Appendix 2 (i.e. p∆t ≤ ∆Ts ). The correction
factor ferror ≡ p∆t/∆Ts , is designed to recovery the error that would obtain if
the error trace were discretised at exactly ∆Ts . The signal–to–noise ratio SN
is implied by the noise level σs supplied by the user, with typically SN ≈ 3.

If both normal–incidence (small θ) and far–offset seismic data are available


(“large” θ), we model the overall likelihood as a product of likelihoods for
normal–incidence and far–offset angles.

The stack information required by the likelihood model thus comprises the pa-
rameters Vst (stack velocity to event), Xst,max (far offset), Xst,min (near offset),
and Tst (event time), a wavelet appropriate for the stack, and the noise level
σs .

3.2 Isopach constraints

For certain layers there may be a constraint on the thickness which is obtained
from well data, or kriged maps of layer thicknesses constrained to well data.
Let j = 1, . . . Nt label the layers on which such a constraint is imposed. The
thickness of the layer must match a known thickness ∆Zj , within an error σ∆Zj
specified. This can be represented in a likelihood function
X (vj,p,eff (tj − tj−1 ) − ∆Zj )2
Liso = exp(− 2
) (27)
j 2σ∆Z j

Similarly, constraints on the net–to–gross may be imposed by kriging net–


to–gross estimates from well data. These will then be used to populate the
prior–model traces with a spatially–variable net–to–gross.

19
Region above model in which
Error trace
assumption of no reflectors is implied

wavelet coda

t=0
Tabove

<t2>=top reflector (prior)


model interval (<t1> to <tbase> from prior)

Range of error - sampling points

<tN >=bottom reflector (prior)


l

Tbelow
t=0

wavelet precursor

Region below model in which


assumption of no reflectors is implied

Fig. 2. Diagram indicating range over which the error samples are computed. In
general Tabove,below are free parameters, but we take Tabove = wavelet–coda and
Tbelow = wavelet–precursor as a sensible choice, since this allows the full reflection
energy from the top and bottom reflections to enter the error count for models not
too different to the mean prior model. The error is usually acquired by a sum over
a subsampled range (circled samples), the sub–sampling rate computed as per the
main text.

4 Sampling from the posterior

The posterior distribution for the inversion problem is a simultaneous model


selection and model sampling problem. The selection problem is primarily
that of choosing a model from the suite of models associated with the fluid
combinations described in section 2.2.2. For a particular combination k of
fluids in the permeable layers, the model vector mk can be constructed from
the union of all relevent parameters in equation (3). Hydrocarbon terms are

20
omitted if the layer contains brine. Any fixed or irrelevant parameters are
discarded. Different possible fluid combination will then yield model vectors
mk of different dimension.

A further model–selection problem can also emerge from strong multimodality


that may appear in the posterior for any particular fluid combination. If, for a
fixed fluid combination, the posterior contains strongly isolated modes, sam-
pling from this density is best approached by regarding the modes as separate
models and incorporating them into the model–selection problem.

Model–selection problems can proceed in two ways. The first is to seek the
marginal probability of the kth model P (k|d) by integrating out all the conti-
nous variables mk in the problem and then drawing samples via the decomposi-
tion P (k, mk |d) = P (mk |k, d)P (k|d). The problem is to find reliable estimates
of P (k|d) when the necessary integrals cannot be done analytically. Methods
to form such estimates using MCMC methods are described in Raftery (1996),
and tend to focus on harmonic means of the likelihood computed with MCMC
samples drawn from the conditional P (mk |k, d). Preliminary experiments have
shown some difficulty in stabilising the various estimators described by Raftery.
The best estimator we have found is the full Laplace estimator, based on the
posterior covariance matrix obtained by Newton updates at the mode. This
forms the basis of an “independence sampler”, whose implementation details
are described in Appendix 4.

The second method samples for the full posterior P (k, mk |d) by constructing
a hybrid MCMC scheme that make jumps between the various models as well
as jumps within the continuous parameter space of each model. Such methods
are variously described as model jumping methods, e.g. Andrieu et al. (2001),
Phillips and Smith (1996); we implement the methods described by Adrieu.

For a fixed fluid combination k (of prior probability Pk ), the posterior dis-
tribution of the model parameters mk should clearly be proportional to the
product of the prior distributions with all the applicable likelihoods:

Π(mk |S) ∼ Lseis (m′k |S)Liso (m′k )P (mk ), (28)

where the full prior is


Pk 1
P (mk ) = dmk /2
exp(− (mk − mk )T Ck−1 (mk − mk )) (29)
(2π) |Ck |1/2 2
Note that the likelihood functions are evaluated for the model m′k obtained
from mk after applying time ordering (section 2.2.1) and truncation of any
values to their appropriate physical range of validity (e.g. 0 ≤ NG ≤ 1). In a
great majority of cases in realistic models, m′k = mk . The prior mean vector
mk and inverse covariance Ck−1 can be built directly from the description of
the prior in section 2.2.

21
Efficient sampling from the posterior will require location of all the feasible
modes for a particular fluid combination, plus an estimate of the dispersion at
each mode. The latter is usually taken as the local covariance associated with
a Gaussian approximation to the local density.

4.1 Mode location and local covariance

Location of the modes is usually performed by searching for minima of the neg-
ative log–posterior starting from strategic points. For each fluid combination,
the full model vector m can easily have dimension of d = O(50) or more, so
this minimisation is quite demanding. The log–posterior surface is smooth for
all parameters except where a layer pinches out, since the disappearing layer
causes an abrupt discontuity in the reflection coefficient. Such problems can
be tackled with variable metric methods like BFGS minimizers 10 specially
adapted to detect solutions that lie along the lines of dicontinuity (typically at
some ti+1 = ti ). Still, very best-case d–dimension minimisation usually requires
O(d2 ) function evaluations, with the coefficient being in the order of 10-100.
Observe also that the function evaluation is a synthetic seismic (over Nt sam-
ples), plus prior, plus isopach, so the work count for this is O(d2 + Nt log(Nt )),
which is typically O(103 ) flops.

This accounting shows that the overall minimisation, performed over all d pa-
rameters will be quite slow. In practice, some of the parameters affect the
posterior much more strongly than others. Layer times, p–velocities and den-
sities have strong effects, whereas, e.g. hydrocarbon saturations have rather
weak ones. This observation leads to a scheme wherein the minimisation is
carried out over a subset of important parameters (say 2 or three per layer),
and the remainder are estimated after the minimisation terminates using a
sequence of Newton iterates as follows.

4.1.1 Newton updates

If the prior distribution for mk is multi–Gaussian, with negative log–likelihood


−1
− log(P (mk )) ∼ (mk − m̄)Cm (mk − m̄), and the likelihood function is of form
1
log(L(m|D)) ∼ − (f (mk ) − y)T .diag(σi−2 ).(f (mk ) − y), (30)
2
then the posterior mode m̃ can be estimated by linearising f (m) about an
initial guess m0 using the standard inverse theory result (Tarantola, 1987)
10 The code actually uses an adaptation of Verrill’s java UNCMIN rou-
tines (Koontz and Weiss, 1982), which implement both BFGS and Hessian–based
Newton methods.

22
X = ∇f (m0 ) (31)
C̃m = (X T CD
−1
X + Cm −1 −1
) (32)
m̃ = C̃m (X T CD
−1 −1
(y + Xm0 − f (m0 )) + Cm m̄) (33)

where CD = diag(σi2 ). Here the f (m) will be read from the seismic (26) or
isopach likelihoods (27).

The updates encountered in the code are all cases where the error covariance
is diagonal, so the formulae are more simply stated in terms of the scaled
residuals e:

−1/2
e(m0 ) = CD (f (m0 ) − y) (34)
X̃ ≡ ∇e(m0 ) (35)
d = X̃m0 − e(m0 ) (36)
C̃m = (X̃ T X̃ + Cm
−1 −1
) (37)
T −1
m̃ = C̃m (X̃ d + Cm m̄) (38)

The gradient X̃ is evaluated by finite differences, and the scheme can be it-
erated by replacing m0 ← m̃ at the end. It converges quadratically to the
true mode if the path crosses no layer-pinchouts. Typically, after minimisation
using, say, BFGS methods over the dominant parameters in m, only a few New-
ton iterates are required to get decent estimates of the remaining parameters,
and the posterior covariance matrix C̃m comes along for free.

4.2 Mode enumeration

Incorporation of the isopach likelihood alone will likely yield an approximately


quadratic log–posterior function, which has a single mode. This mode may
also be quite tight, especially at the wells. Conversely, the synthetic seismic
likelihood is likely to be strongly multimodal, especially in the time parameters.

For this reason, we always perform a small set of Newton updates (say 5) to
the prior based on the isopach likelihood before commencing a minimisation
step for the entire posterior. The “partial–posterior” formed from the product
of the prior and isopach constraint will be approximately multi–Gaussian, and
the mode and covariance of this distribution are then used to choose starting
points for a set of subsequent minimisation calls. The subset of parameters
used in the modified BFGS minimisation routines are typically the layer times
ti , the impermeable layer p–velocities vp and the net–to–gross NG . Other com-
binations are conceivable and perhaps better. The initial starting values of the
non–time parameters are taken from the “partial–posterior” mean, and a set

23
of starting times are assembled as per the description in Appendix 3. A run
of subspace minimisations looping over this set of starting configurations will
yield a set of points hopefully in the vicinity of global modes. Duplicates are
eliminated. Figure 3 indicates how the sequence of dispersed starting points,
subspace minimisation, and global Newton methods converge to the desired
separate modes.

full-space local minima


t

Newton steps

start

subspace minima

start

Newton steps

vs

Fig. 3. Minimisation scheme consisting of subspace minimisations (the subspace


represented by t) starting from dispersed points, followed by global Newton steps,
yielding a set of local minima in the full parameter space.

The remaining candidate points then undergo a set of Newton updates (about
10) for the full model. Subsequently, any modes that look wildly implausible
are eliminated 11 . If a mode is acceptable, the Newton-updated mean and
covariance are stored, along with a measure of the mode weight such as the
likelihood at the peak of the mode, and the Laplace–estimator weight, equa-
tion (60), described in Appendix 4. Diagnostic routines capable of drawing
graphs of the log posterior centered at the mode are useful in checking the
character of the mode and the adequacy of the Gaussian approximation.

In cases where the seismic likelihood value obtained after these local optimi-
sations looks poor (or for other reasons), a global optimisation step may be
initiated operating on the subvector of layer times, with non-time properties
fixed at their prior+isopach means. The initiation conditions for this step are
11the remaining freedom in the frozen parameters is unlikely to redeem a bad loop–
skip

24
designed to detect trapping of the BFGS/Newton optimisers in poor local so-
lutions. The globally optimal t values are then used as starting points for the
Newton optimiser. The global optimiser uses the Differential Evolution genetic
algorithm of Storn and Price (1997).

4.3 Sampling

The overall Markov Chain for the sampling is generated by a kernel that flips
rapidly between two kinds of proposals: (a) jumps to different models, and (b)
jumps within the current model. Currently the scheme chooses one of these
moves with a 50% probability at each step, and performs a within–model jump
or model–jump according to the schemes described below. This is the mixture
hybrid–kernel of section 2.3 in Brooks (1998). Special “moves” for unusual
situations can also be devised, and are generally safe to use providing they are
drawn independently of the current state of the chain.

4.3.1 Within–model jumps

For single–model d–dimensional problems that have smooth densities, a Metro-


polis–Hastings sampler that is both efficient and robust is the scaled reversible–
jump random–walk sampler (RWM) described in Ch. 11 of Gelman et al.
(1995). In this sampler, the jumps are drawn from the distribution q(mnew |mold ) =
N (mold , s2 C), where C√is the estimated covariance at the mode, and s is a scale
factor set to s = 2.4/ d. For multi–Gaussian distributions, this scale factor
leads to optimal sampling efficiency for this class of samplers (acceptance rates
near 0.23), and the sampling efficiency is then about 0.3/d. The proposals mnew
drawn from q(mnew |mold ) are then time–ordered and truncated if necessary (as
per section 4) to produce m′new and then used to compute the full posterior
density
Π(mnew |S) = Lseis (m′new |S)Liso (m′new )P (mnew ). (39)

If the posterior were perfectly multi–Gaussian, samples could be drawn from


an independence sampler (Brooks, 1998) using a Gaussian proposal density,
which would be 100% efficient since successive samples are independent, but
the assumption of a compact Gaussian distribution for the proposal density
in a Metropolis technique will force undersampling of regions of the posterior
that may have thick tails. Such regions appear reasonably likely when cross
sections of the log–posterior are plotted as diagnostic output: significant non–
parabolicity is usually evident in the time–parameters (see figure 4).

Hence, for random walks within a fixed model, we use the RWM sampler where
the covariance used is that produced by the sequence on Newton updates at

25
t NG
9 9

8 8

7 7

6 6

5 5

4 4
4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 0.2 0.4 0.6 0.8 1.0 1.2

vp_m vs_m rho_m


9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4
8600 8800 9000 9200 9400 9600 9800 3200 3400 3600 3800 4000 4200 2.30 2.32 2.34 2.36 2.38 2.40 2.42

vp_s vs_s phi_s


9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4
8400 8600 8800 9000 9200 9400 3200 3400 3600 3800 4000 4200 0.26 0.28 0.30 0.32 0.34 0.36

Fig. 4. Series of log–posterior cross–sections taken through the maximum–likelihood


point found from the optimisation step and the sequence of Newton updates. Cross—
sections are shown for the parameters shown, in all relevant layers, over a range of 2
standard deviations as computed from the local covariance matrix. Near–Gaussian
parameters will show symmetric parabolas with a rise of +2 at the endpoints, but
non–Gaussianity is evidenced by skewed or a-symmetric profiles, most obvious in
the case of the time parameters in this instance.

the mode. The initial state is taken at the mode peak. Jumps that produce an
unphysical state are treated as having zero likelihood. The acceptance proba-
bility for a jump mold → mnew is the usual rule

Π(mnew |S)
α = min(1, . (40)
Π(mold |S)

26
Note that the actual proposal probability q does not need to be evaluated, but
the appropriate scaling for it is crucial.

4.3.2 Model jumping

Jumps between models have to be made with care so as to preserve reversibility.


A notable feature of the set of models to be sampled from is that a great many
parameters are common to all models (times, rock parameters etc), and only
the fluid characteristics distinguish the models corresponding to different fluid
states. For example, if a jump involves introducing gas into a layer, the model
vector is augmented by the gas velocity, saturation, and density (if they are not
fixed). The models are thus good examples of the nested structures discussed
in Andrieu et al. (2001), and the algorithms below are simple instances of the
methods discussed in section 3 of that paper.

Suppose the models m1 and m2 are partitioned into components as m1 =


{m∗12 , m′12 }, m2 = {m∗21 , m′21 }, the asterisk denoting common elements and
the prime non–common parameters (so m∗12 = m∗21 is the same set). We have
estimates of the maximum likelihood model values m1 = {m∗12 , m′12 } and
m2 = {m∗21 , m′21 } from the sequence of minimisation steps for each mode,
and it is invariably the case that the shared parameters will have different
values at each of these maximum–likelihood points. A jump from model 1 to
2 is then performed by the mapping

m2,new = {m∗12,old + (m∗21 − m∗12 ), m′21 } (41)

where m′21 is drawn from a suitable proposal distribution q12 (m′21 ), which we
usually take to be the prior for those components, since the fluid parameters
are uncorrelated with the rest of the model in the prior:
Y
q12 (m′21 ) = pj (m′21,j ) (42)
new fluid components parameters j in model

A reverse jump, from 2 to 1, would have invoked the proposal probability


q21 (m′12 ) for the components of model 1 not present in model 2. The jump
from model 1 to 2 is then accepted with probability

π(m21 )q21 (m′12 )


α = min(1, ). (43)
π(m12 )q12 (m′21 )

where π(·) is the full posterior density of equation (28), including the dimension–
varying terms in the prior.

In the thought–experiment limit that the likelihood functions are trivially unity
(we “switch” them off) this jumping scheme will result in certain acceptance,

27
as the qij (·) densities will cancel exactly with corresponding terms in the prior.
The likelihoods are expected to be moderately weak functions of the fluid prop-
erties, so this seems a reasonable choice. Further, in many practical scenarios,
the fluid constants may be sufficiently well known that they are fixed for the
inversion, in which case the models share all the parameters, and the q terms
disappear from equation (43).

The linear translation embodied in equation (41) is based on an plausible as-


sumption that the ’shape’ of the posterior density for the common parameters
remains roughly the same across all the models, but the position may vary.
Corrections to this assumption can be attempted by rescaling the mapping
using scale–factors estimated from covariance estimates – which introduces a
constant Jacobian term into equation (43) – but numerical experiments have
not shown that this embellishment significantly improves the acceptance rates.

Mixing has generally been found to be very good when jumps are performed
between various fluid states using this scheme. When multimodality in the
posterior due to loop–skipping occurs, mixing can be more problematic, and
the posterior samples should be checked carefully to ensure mixing is adequate.
Various options for detecting and auto–correcting for this problem, using dec-
imation estimates computed from the time–series in the Markov chain (Ch. 7,
Gelman et al. (1995)), have been added to the code.

This brief description has not elaborated on the machinery to deal with mul-
tiple local modes occuring for specific fluid states, and how to do the sampling
in this context. This detail can be found in Appendix 6.

5 The software

The Delivery inversion software is written in java, a design decision rather


unusual in the context of numerically intensive scientific software. It should
run on java implementations from 1.2 on. The advent of hotspot and just–
in–time (JIT) compiler technology has made java a very much more efficient
language than in its early days as a purely interpreted language (see the Java
Numerics website as an entry point 12 . The core cpu demands in the code are
(a) linear algebra calls, for which we use the efficient CERN colt library 13
and (b) FFTs, for which we provide a library operating on primitive double[]
or colt DenseDoubleMatrix1D arrays, and the former have been demonstrated

12 Boisvert, R., Pozo, R., 2003. The Java Numerics website,


http://math.nist.gov/javanumerics/.
13 Hoschek, W., 2003. The CERN colt java library,
http://dsd.lbl.gov/~hoschek/colt.

28
to be as fast as C on the platforms we use.

On platforms with modern hotspot or JIT compilers, the code is estimated


to run somewhere within a factor of 2-5 of the speed of a C++ implementa-
tion, and has the advantage of being platform independent, free of memory–
management bugs, and has been simpler to develop using the large suite of
libraries now standard in java 1.2 and its successors.

In practice, the inversion code will likely be run in an operating system that
allows piping of input and output data in the form of SU streams, and the
attractive possibility of clustering the calculation means that it will likely be
run on some flavour of unix or Linux.

Some explanation of the files required for input/output and usage is given in
Appendix 8. The inversion is chiefly driven by an XML file that specifies the
necessary rock–physics, layer descriptions, and information about the seismic.
A quality schema–driven GUI editor expressly developed for the construction
of this XML file is also available at the website: further details are in the
Appendix.

The code is available for download (Gunning, 2003), under a generic open–
source agreement. Improvements to the code are welcome to be submitted to
the author. A set of simple examples is available in the distribution.

6 Examples

6.1 Sand Wedge

A simple but useful test problem is one where a wedge of sand is pinched out by
surrounding shale layers. This is modelled as a three layer problem, where we
expect the inversion to force pinching–out in the region where no sand exists,
if the signal–to–noise ratio is favourable and the sand–shale acoustic contrast
is adequate.

In this example, the prior time parameters are independent of the trace, so
there is no information in the prior to help detect the wedge shape. The inter-
nal layer times have considerable prior uncertainty (20ms, 50ms respectively).
Figure 5 illustrates pre and post inversion stochastic samples of the layers,
displayed with many realisations per trace. Here the noise strength is about
1/4 of the peak reflection response when the wedge is thick (i.e. away from
tuning effects).

This test problem shows how the inverter can readily unscramble tuning effects

29
1.7 1.7

1.88
1.8 1.8
1.90

1.9 1.9
1.92

layer-times
1.94
2.0 2.0
1.96

2.1 2.1
1.98

2.00

(a) (b) (c)


1.80 1.80

100
1.85 1.85

Thickness (m)
80
1.90 1.90

60
1.95 1.95

40
2.00 2.00

20
2.05 2.05

0
2.10

(d) (e) (f)

Fig. 5. (a) Truth case wedge model, (b) truth case seismic, (d) pre-inversion stochas-
tic samples (i.e. drawn from prior distribution only, (e) post inversion samples
(100 per trace), (c) p10 , p50 , p90 quantiles for the layer times of the wedge, and (f)
p10 , p50 , p90 quantiles of the wedge thickness. Note how the wedge layer time un-
certainty increases to the prior when the wedge pinches out, as there is no longer
a significant reflection event to facilitate the detection of this horizon: the inverter
does not care where the horizon is, as long as it pinches out. The wedge thickness
(zero) is still well constrained though.

from rock–physics to produce an unbiased inversion of a layer pinchout.

6.2 Net–to–gross wedge

Another useful test problem is where a slab of shaly sand is gradually made
cleaner from left to right across a seismic section, and embedded in the mixing
shale. As the net–to–gross increases, the reflection strength improves, and the
posterior distribution of the net–to–gross and layer thickness is of interest.

In this example, the only parameters that varies areally is the mean net–to–

30
gross: this is fixed to be the same as the “truth–case” model, but has a (very
broad) prior standard deviation of σNG = 0.3. Figure 6 illustrates pre and post
inversion estimates of the net–to–gross distribution parameters, superposed on
the truth case. This example is also used to demonstrate the convergence of the
MCMC sampler, for which we show some characteristic plots of the random
walk for the inversion on trace 8. Here the prior was loosened considerably to
NG ∼ N (0.45, 0.72 ). The chains are always started at the maximum likelihood
point, so ’burn–in’ times are generally very short.

This test problem shows that the inverter produces a relatively unbiased inver-
sion for the net–to–gross, but subject to increasing uncertainty as the reflecting
layer dims out.

31
1.80

1.85

time (secs)
1.90

1.95

2.00

(a) trace number


1.0 1.0

PRIOR POSTERIOR
0.8 0.8
Net-to-gross

Net-to-gross
0.6 0.6
p90

0.4 0.4
p90
p10
p10
0.2 p50 0.2 p50

0.0 0.0
(b) trace number (c) trace number
0.6 1.0

0.5 POSTERIOR POSTERIOR


Near reflection coefficient

0.8
0.4
Net-to-gross

0.6
0.3

0.2 0.4

0.1
0.2
0.0

-0.1 0.0
(d) trace number (e) trace number
30 0.20
Near reflection coefficient

25 POSTERIOR 0.18 POSTERIOR


Log(likelihood)

20 0.16

15 0.14

10 0.12

5 0.10

0 4 0.08
0 100 2000 4000 6000 8000 10 0.0 0.2 0.4 0.6 0.8
(f) Iteration (g) Net-to-gross

Fig. 6. (a) Truth case seismic for net–to–gross (NG ) wedge model, (b) pre–inversion
(prior) spread of NG as show by p10, p50, p90 quantiles; truth–case shown dashed, (c)
as per (b), but post–inversion. Inset (d) shows the near–stack reflection coefficient
variations, and (e) the associated samples of NG. The signal–to–noise ratio here
is very favourable, with noise 1/10th of the peak seismic response. The residual
uncertainty in the NG is due to the fact that there is also uncertainty in the layer
velocity variations. The seismic constrains primarily the reflection coefficient, but the
system indeterminacy will still allow appreciable variation in the NG . Convergence
& mixing of the MCMC scheme is illustrated for a highly non–informative prior
NG ∼ N (0.45, 0.72 ) on trace 8: (f) trace plot of log-likelihood, with early iterations
magnified, (g) trajectory of Markov chain walk for the NG of layer 2 vs. its reflection
coefficient at the top.
32
6.3 Simple single–trace fluid detection problem

In this simple test a slab of soft sand surrounded by shale layers has oil present,
and the prior model is set to be uninformative with respect to fluid type (so
the reservoir has 50% prior chance of brine, 50% oil). The rest of the story
is told in the caption of figure 7. In summary, the inverter will detect fluid
contrasts reliably if the signal quality is sufficient, the rocks sufficiently “soft”
and porous, and the wavelet well calibrated.
-0.10 -0.05 0 0.05 0.10 -0.10 -0.05 0 0.05 0.10 -0.05 0 0.05 0.10

1.86 1.86 1.86

1.88 1.88 1.88

1.90 1.90 1.90

1.92 1.92 1.92

1.94 1.94 1.94

1.96 1.96 1.96

1.98 1.98 1.98

2.00 2.00 2.00

(a) brine prior samples (b) oil prior samples (c) synthetics of posterior
samples

Fig. 7. Three “spaghetti” graphs showing synthetic seismic traces drawn from (a)
the prior models with brine in the reservoir (b) prior models with oil, and (c) the
posterior distribution, which puts the oil probability at ≈ 97%. the noise level is
0.01. The “truth case” trace is shown in gray.

6.4 Field Example

We expect in the near future to publish several extended papers focussed on the
inversion of some real field data using Delivery and the workflow of developing
the prior. For the moment, we will confine our discussion to a simple test for an
actual field, shown in figure 8. This is a ’test’ inversion at the well location. An
8–layer model for a set of stacked turbidite sands has been built with proven
hydrocarbons in the second–bottom layer. The sands are quite clean and have
high porosities (≈ 30%), so the effects of Gassman substitution are very strong
in the reservoir layers. The layers are constructed from log analysis, but their
boundaries are set to have a broad prior uncertainty around the 10-15ms range.
Low net–to–gross layers (NG < 0.1) are often set as pure shales (NG = 0) to
gkeep the model dimensionality down. The prior for fluid type (brine:oil) is set
as 50:50, as in the previous example.

The inversion at the well location not only confirms the presence of oil (>
80% probability) but also demonstrates that the posterior distribution of layer

33
Well Result

Fig. 8. Synthetic seismic traces overlaid by the observed seismic trace at a particular
location in the survey. (a) Sample models drawn from the prior with brine filling
the reservoir layer. (b) The same, but with oil in the reservoir layer. (c) Traces
computed from models drawn from the posterior distribution, conditional on oil
present in the reservoir layer. (d) Histogram of reservoir thickness drawn from prior
model and after seismic inversion (posterior). Pinchouts and very thin reserves are
obviously precluded. The posterior distribution of thickness is also consistent with
the observed thickness.

thickness for the reservoir is consistent with the well observation.

7 Conclusions

We have introduced a new open–source tool for model–based Bayesian seis-


mic inversion called Delivery. The inverter combines prior knowledge of the
reservoir model with information from the seismic data to yield a posterior
uncertainty distribution for the reservoir model, which is sampled to produce
a suite of stochastic inversions. The Monte Carlo samples produced by the
inverter summarise the full state of knowledge about the reservoir model after
incorporating the seismic data, and can be used to generate predictive dis-
tributions of most commercial or engineering quantities of interest (net–pay,
column–thickness etc).

The inverter is driven from a combination of XML files and SU/BHP SU data,
and outputs are in SU/BHP SU form. The SU backbone means the inverter
interfaces nicely with other free seismic software, such as the BHP viewer 14
and BHP’s SU extensions 15 . We believe the “small–is–beautiful” philosophy
associated with backbone designs improves the flexibility and maintainability
of the software.

14see link at http://timna.mines.edu/cwpcodes


15BHP (Miller and Glinsky) extensions to Seismic Unix,
http://timna.mines.edu/cwpcodes

34
The authors hope that this tool will prove useful to reservoir modellers working
with the problem of seismic data integration, and encourage users to help
improve the software or submit suggestions for improvements. We hope that
the newer ideas on probabilistic model–comparison and sampling (vis–à–vis
the petroleum community) prove useful and applicable to related problems in
uncertainty and risk management.

Acknowledgement

This work has been supported by the BHP Billiton Technology Program.

References

Abrahamsen, P., et al., 1997. Seismic impedance and porosity: support effects.
In: Geostatistics Wollongong ’96. Kluwer Academic, pp. 489–500.
Abramowitz, M., Stegun, I. A., 1965. Handbook of Mathematical Functions.
Dover.
Andrieu, C. A., Djurić, P. M., Doucet, A., 2001. Model selection by MCMC
computation. Signal Processing 81, 19–37.
Brooks, S. P., 1998. Markov chain Monte Carlo and its application. The Statis-
tician 47, 69–100.
Buland, A., Kolbjornsen, A., Omre, H., 2003. Rapid spatially coupled AVO
inversion in the Fourier domain. Geophysics 68 (3), 824–83.
Buland, A., Omre, H., 2000. Bayesian AVO inversion. Abstracts, 62nd
EAGE Conference and Technical Exhibition, 18–28See extended paper at
http://www.math.ntnu.no/~omre/BuOmre00Bpaper.ps.
Castagna, J. P., Backus, M. M. (Eds.), 1993. Offset-dependent reflectivity –
Theory and practice of AVO Analysis. Society of Exploration Geophysicists.
Chu, L., Reynolds, A. C., Oliver, D. S., Nov. 1995. Reservoir Description from
Static and Well–Test Data using Efficient Gradient Methods. In: Proceed-
ings of the international meeting on Petroleum Engineering, Beijing, China.
Society of Petroleum Engineers, Richardson, Texas, paper SPE 29999.
Cohen, J. K., Stockwell, Jr., J., 1998. CWP/SU: Seismic Unix Release 35: a free
package for seismic research and processing,. Center for Wave Phenomena,
Colorado School of Mines, http://timna.mines.edu/cwpcodes.
Denison, D. G. T., et al., 2002. Bayesian Methods for Nonlinear Classification
and Regression. Wiley.
Deutsch, C. V., 2002. Geostatistical Reservoir Modelling. Oxford University
Press.
Eide, A. L., 1997. Stochastic reservoir characterization conditioned on seismic
data. In: Geostatistics Wollongong ’96. Kluwer Academic.

35
Eide, A. L., Omre, H., Ursin, B., 2002. Prediction of reservoir variables based
on seismic data and well observations. Journal of the American Statistical
Association 97 (457), 18–28.
Eidsvik, J., et al., 2002. Seismic reservoir prediction using Bayesian integra-
tion of rock physics and Markov random fields: A North Sea example. The
Leading Edge 21 (3), 290–294.
Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B., 1995. Bayesian Data
Analysis. Chapman and Hall.
Gilks, W., Richardson, S., Spiegelhalter, D., 1996. Markov Chain Monte Carlo
in Practice. Chapman and Hall.
Gunning, J., 2000. Constraining random field models to seismic data: getting
the scale and the physics right. In: ECMOR VII: Proceedings, 7th European
conference on the mathematics of oil recovery, Baveno, Italy.
Gunning, J., 2003. Delivery website: http://www.csiro.au/products/Delivery.html,
accessed 2 Jan 2010.
Gunning, J., Glinsky, M., 2004. Delivery: an open-source model-based Bayesian
seismic inversion program. Computers and Geosciences 30 (6), 619–636.
Gunning, J., Glinsky, M., 2006. WaveletExtractor: A Bayesian well–tie and
wavelet extraction program. Computers and Geosciences 32, 681–695.
Huage, R., Skorstad, A., Holden, L., 1998. Conditioning a fluvial reservoir on
stacked seismic amplitudes. In: Proc. 4th annual conference of the IAMG.
http://www.nr.no/research/sand/articles.html.
Kelley, C. T., 1987. Iterative Methods for Optimization. SIAM.
Koontz, R. S. J., Weiss, B., 1982. A modular system of algorithms for uncon-
strained minimization. Tech. rep., Comp. Sci. Dept., University of Colorado
at Boulder.
Leguijt, J., 2001. A promising approach to subsurface information integration.
In: Extended abstracts, 63rd EAGE Conference and Technical Exhibition.
Mackay, D. J. C., June 2002. Information Theory, Inference & Learning Algo-
rithms, 1st Edition. Cambridge University Press.
Malinverno, A., 2002. Parsimonious bayesian Markov chain Monte Carlo inver-
sion in a nonlinear geophysical problem. Geophysical Journal International
151, 675–688.
Mavko, G., Mukerji, T., Dvorkin, J., 1998. The Rock Physics Handbook. Cam-
bridge University Press.
Oliver, D. S., 1996. On conditional simulation to inaccurate data. Mathemat-
ical Geology 28, 811–817.
Oliver, D. S., 1997. Markov Chain Monte Carlo Methods for Conditioning a
Permeability Field to Pressure Data. Mathematical Geology 29, 61–91.
Omre, H., Tjelmeland, H., 1997. Petroleum Geostatistics. In: Geostatistics
Wollongong ’96. Kluwer Academic.
Phillips, D. B., Smith, A. F. M., 1996. Bayesian model comparison via jump
diffusions. In: Markov Chain Monte Carlo in Practice. Chapman and Hall.
Raftery, A. E., 1996. Hypothesis testing and model selection. In: Markov Chain
Monte Carlo in Practice. Chapman and Hall.

36
Storn, R., Price, K., 1997. Differential evolution – a simple and efficient heuris-
tic for global optimisation over continuous spaces. Journal of Global Opti-
mization 11, 341–359.
Tarantola, A., 1987. Inverse Problem Theory, Methods for Data Fitting and
Model Parameter Estimation. Elsevier Science Publishers, Amsterdam.

37
Appendices

Appendix 1: Typical computation of effective rock properties

Here we illustrate how the effective rock properties for a single layer might be
computed from a full suite of fundamental model parameters (equation (1))
for the layer.

• Compute the brine+matrix density

ρsat,b = φs ρb + (1 − φs )ρg . (44)

• Compute the associated moduli


2
µsat,b = ρsat,b vs,s (45)
2
Msat,b = ρsat,b vp,s (46)
4
Ksat,b = Msat,b − µsat,b (47)
3
µsat,fl = µsat,b (48)
2 2
• Use the fluid properties to compute Kb = ρb vp,b and Kh = ρh vp,h , and then

Sh 1 − Sh −1
Kfl = ( + )
Kh Kb
• Using literature values of Kg , compute Ksat,fl using equation (13). This comes
out to be
 " Ã ! #−1 
 1 1 1 1 
Ksat,fl = Kg / 1 + − +
 φs Kg /Kfl − 1 Kg /Kb − 1 Kg /Ksat,b − 1 

(49)

This equation has regions of parameter space with no positive solution (of-
ten associated with, say, small φ). The forward model must flag any set of
parameters that yield a negative Ksat,fl and furnish a means of coping sanely
with such exceptions. The same remark applies to equations (47) and (52)
below.
• Compute the new effective density

ρsat,fl = (1 − φs )ρg + φs (Sh ρh + (1 − Sh )ρb ),

• Compute also the new effective p–wave modulus from


4
Msat,fl = Ksat,fl + µsat,fl
3

38
• Compute the impermeable–rock moduli
2
µm = ρm vs,m (50)
2
Mm = ρm vp,m (51)
4
K m = Mm − µ m (52)
3
• Mix the fl–substituted permeable rock with the shale properties using the
rock–mixing Backus averaging formula (9). Compute also the mixed density
from (10).
• From the mixed rock, back out the velocities vp,eff = (Meff /ρeff )1/2 , and
vs,eff = (µeff /ρeff )1/2 .

Appendix 2: Error sampling rates

When comparing synthetic seismic traces to co–located actual seismic, it is a


question of some nicety as to what rate of sampling of the difference trace must
be applied to quantify the error between the two. It is trivial to show that if a
random process has, eg. a Ricker-2 power spectrum (i.e. ∼ f 2 exp(−(f /fpeak )2 )),
then 95% of the spectral energy in the process can be captured by sampling
at times
∆Ts = 0.253/fpeak , (53)
where fpeak is the peak energy in the spectrum (and often about half of the
bandwidth). Most practical seismic spectra will yield similar results.

The sampling rate in equation (53) is about 2 times the Nyquist rate associated
with the peak frequency. If the error is acquired with a faster sampling rate
than this, adjacent error samples will be strongly correlated and the error
term will become large. Since the error covariance is unknown, and bearing
in mind that we seek an error measure corresponding to a maximum number
of “independent” data points, we use the rate given by equation (53) and
model the error points as i.i.d. values. The peak frequency fpeak is estimated
at initialisation from the FFT spectrum of the wavelet.

Appendix 3: Mode–location starting points

The location, characterisation, and enumeration of local modes in the posterior


distribution is performed by a loop over a set of strategically chosen starting
points in the model parameter space. Since one of the major functions of the
prior distribution is to exclude improbable posterior modes, it makes sense to
base the choice of starting points on the prior distribution.

39
The fundamental variables associated with multi–modality are the layer times,
in that the posterior likelihood surface is frequently oscillatory in the layer
times. Convergence to a secondary or minor solution is usually called loop–
skipping, and may or may not be desirable. In situations where loop skipping
may be permissible or even desirable, the prior uncertainty on the layer times
is usually set to be quite broad, as these secondary solutions may be feasible.
Convergence to these solutions can then be prompted by setting the initial layer
times fed to the minimisation routine at values likely to be within the basin
of attraction of the desired solution. The hueristic method used for defining
these starting points is as follows:

Form a set of starting configurations, for each fluid state, by:

(1) Forming an approximate multi–Gaussian posterior distribution for the


properties by updating the prior to account for any isopach constraints
and the depth consistency requirement. This posterior is characterised by
the mean m̃ and covariance C̃. Recall that the layer times are the first
i = 1 . . . (n + 1) elements in the vector m̃. Fix all non-time parameters
i = n + 2 . . . at the values in m̃.
(2) From this posterior, compute the expected layer time thickness ∆ti , layer
time uncertainties σti and layer time thickness uncertainty σ∆ti , from

∆ti = m̃i+1 − m̃i (54)


σt2i = C̃i,i (55)
σt2i+1 = C̃i+1,i+1 (56)
2
σ∆t i
= C̃i,i + C̃i+1,i+1 − 2C̃i,i+1 . (57)

(3) Now, if σ∆ti /∆ti > 0.5, this amounts to a ≈ 5% chance of the layer
pinching out (∆ti < 0) before taking into account the seismic, so this is
likely enough to warrant investigation. The starting points

σti+1 m̃i + σti m̃i+1


t′i = (58)
σti+1 + σti
′ ′
ti+1 = ti − 2ǫF D (59)

are then used as suitable starting times for the times ti , ti+1 . The remain-
ing times tj , j 6= i, i + 1 are set at the posterior means tj = m̃j . If more
than one layer is potentially pinched out, we form a set of starting points
by looping over the set of potentially–pinched–out layers, and setting only
one of these layers at a time to the pinched–out starting configuration just
described, with remaining times at the values in m̃.

40
Appendix 4: An Independence Sampler

For each fixed fluid combination k, a successful location of a mode in the


full parameter space will produce a “maximum likelihood” vector m̃kj and
local covariance C̃kj , by successive Newton iterates (equations (34)–(38)). Here
the index j runs over the set of distinct modes we have found for each fluid
combination by starting from separated starting points, as per Appendix 3 (in
many cases there will be only one). To save notational baggage, we will use k
henceforth to mean kj - the context makes the meaning clear.

The Laplace approximation to the mode posterior marginal likelihood is formed


by assuming the posterior surface is fully Gaussian, centred at the maximum–
likelihood point, and characterised by the covariance C̃k . Integrating over all
the model parameters then gives the marginal probability to be proportional
to
qk ∼ (2π)dmk |C̃k |1/2 Π(m̃k |S), (60)
where the posterior Π is evaluated as per (28,29), and the determinant and
dimension–varying terms in (29) are carefully retained.

A full independence sampler can then be implemented using a Metropolis Hast-


ings method. With the chain in mode k, with model vector mk , we propose a
new mode k ′ with probability ∼ Pk′ , whose new model vector m′k is drawn from
q(m′k′ ) = N (m′k′ |m̃′k′ , C̃k′ ). We accept this candidate with the usual probability

Π(m′k′ |S)Pk q(mk )


α = min(1, , (61)
Π(mk |S)pk′ q(m′k′ )

or leave the state of the chain unchanged. This method has had somewhat
mixed success - it is perfect when drawing from the prior (since the prior is
multi–Gaussian), but is susceptible of being trapped in particularly favourable
states for long periods when the posterior departs sharply from Gaussianity.
Further, any thick tails in the posterior density are not likely to be explored
by the proposal density, since it has compact Gaussian tails. The proposal
overdispersion recommended by Gelman (Gelman et al., 1995, Ch. 11) does
not work well, since the dimensionality is so high that even mildly overdispered
t–distributions dramatically reduce the acceptance rates, even when sampling
from the prior.

It has proven useful to implement an approximate sampler where the modes


are proposed from the posterior marginal (∼ qk′ ) with normal approximations
at each mode (q(m′k′ ) = N (m′k′ |m̃′k′ , C̃k′ ), and mandatory acceptance (α = 1).
Such candidates appear to very rarely produce unlikely χ2 values for the seismic
and isopach likelihoods.

41
Appendix 5: Modified SU trace formats for properties

5.1 Model–prior trace formats

The local prior information for the inversion at each trace is communicated
to the inversion code by a SU data set with a set of traces on a pattern and
ordering that matches the SU seismic data set exactly. The xml Modelparame-
ters.xml file which sets up the inversion will specify which prior quantities vary
on a trace-by–trace basis, and the block indices k at which they are located.
Suppose there are Np properties which vary, and there are Nl layers. The value
for property k in layer l is then the (l + (k − 1)Nl )th float value in the current
trace of prior data. There are some special rules as well

• The layer times must always be present in the prior trace, and they must
always be the last block (k = Nk ). Further, one additional float value is
appended for the base time of the last layer tbase ).
• All values are expected to be positive, and are converted to positive numbers
if not. The value -999.0 signifies a “bad value”, and will be ignored (the
default prior will then apply)
• All times are in milliseconds.
• Layers can be effectively suppressed from the model by providing the isopach
constraint ∆Z = N (0, 0) (see eqn (27)), either from the XML or the model
traces file for some subset of traces. If this is supplied, the layer is deemed in-
active. This is advantageous when a layer is known to permanently pinchout
in some region. Any model reduction helps in speeding up the inversion.

5.2 Stochastic trace outputs

Delivery will write stochastic samples from the posterior to the requested
output file in a very similar format to the model–prior traces. A sequence of
blocks is written, each of size Nl , and the last block of Nl + 1 floats is either
time or depth, as specified by the obvious entry in the ModelDescription.xml
file. The names of the properties can be written to a single–line ascii file by
supplying -BHPcommand filename to the delivery command line.

Similarly, the deliveryAnalyser program can collate the large statistical ensem-
ble produced by the delivery output into a compact set of traces representing
either means, standard deviations or quantiles of all the salient quantities of
interest. The set of property names of the blocks in these summary traces is
obtained by adding -BHPcommand filename to the deliveryAnalyser com-
mand line.

42
Appendix 6: Handling multiple modes and multiple fluid states (mod-
els) in the MCMC sampling schemes

At the end of the optimisation/enumeration sweep in the code, the book-


keeping will have examined k = 1 . . . NF “fluid states” (combinations), as per
section 2.2.2, and collected along the way a set of j = 1 . . . Nm local modes,
where often Nm = NF (one mode per fluid–state), but also cases where there
may be differing numbers of local modes per fluid state, due to pinchouts,
loop–skipping etc. The description of section 4.3 discusses model (fluid–state)–
jumping, but this is actually implemented in the code via mode–jumping, so it
is helpful to clarify how this is done, when a mode–jump may not necessarily
correspond to a fluid–state (model) jump.

Recall the prior probability of a fluid–state is defined as Pk , assembled from ei-


ther equation (5), or the solutions for Pk in the alternative style of Appendix 15.
In the optimisation, each local mode j acquires an approximate marginal model
likelihood estimate corresponding to the Π(j|S) of equation (107), which omits
any factor associated with its prior fluid–state probability. Since the marginal
model likelihood is an integral over all parameters, one might expect a good
estimate of the posterior probability of a mode to be (up to a constant)
wj = Π(j|S)Pkj , where kj is the fluid state label associated with the mode
j. In the majority of our experience, though, multiple modes occur with pin-
chouts, loop skips etc, such that the integral is likely to be an overestimate,
since various flanks of the pdf will be truncated. A working approximation
then is to distribute the probability Pkj over all the modes sharing the same
fluid state, so the estimated mode probability is

wj ∼ Π(j|S)p̂j (62)

with the constraint X


p̂j ′ = Pkj (63)

{j |kj ′ = kj }
There is clearly freedom in how to choose the partitioning of the fluid–state
probability as long as this constraint is met. The usual rule in MCMC, for
good performance, is to set such probabilities in proportion to the expected
posterior probabilities. Hence we define
Π(j|S)
p̂j = Pkj P ′
(64)
{j ′ |kj ′ = kj } Π(j |S)
which sets the mode prior–probabilities in proportion to the expected posterior,
but also preserving the overall prior fluid–state probability. A final normalisa-
P m
tion step fixes the constant in equation (62) such that N j=1 wj = 1.

Thus, for example, a typical mode–jumping scheme which operates when there

43
are multiple modes (Nm > 1), is

(1) From the current mode j, propose a new mode j ′ with probability Q(j ′ |j) ∼
wj ′ (1 − δ(j, j ′ )) (this forces a jump out of the existing mode).
(2) Perform the mappings and draw any new (non–common) model parame-
ters as per the algorithms in section 4.3.2.
(3) Augmenting equation (43), and with the same model–vector partitioning
definitions surrounding that equation, the acceptance probability is then

π(mj ′ )qj ′ ,j (m′j,j ′ )Q(j|j ′ )


α = min(1, )
π(mj )qj,j ′ (m′j ′ ,j )Q(j ′ |j)
π(mj ′ )qj ′ ,j (m′j,j ′ )wj /(1 − wj ′ )
= min(1, ) (65)
π(mj )qj,j ′ (m′j ′ ,j )wj ′ /(1 − wj )

The full posterior π(·) is evaluated as per equation (29), i.e. using the
fluid–state probabilities.

One of the advantages of this defensive algorithm for fluid–state probability


partitioning among modes is that it induces robustness in estimated posterior
fluid probabilities formed by partial sums over wj . Since the optimiser can have
fluctuating accounts of the number of distinct modes associated with a fluid
state, due to stochastic components in the genetic algorithms, delicate termi-
nation issues, or variability in the exhaustiveness of the global enumeration
strategy used etc, such robustness is very desirable.

Appendix 7: Wavelet format

Wavelets are expected to be single trace SU files, with the fields ns, dt and
f1 set to reflect the number of samples, sampling interval, and starting time
of the first sample (beginning of precursor). The ns and f1 fields are set so
as to make the wavelet as compact as possible with respect to the tapering,
so excessive samples are discouraged, but the tapering must also be smooth
to the first and last sample. Wavelets generated by the WaveletExtractor
code (Gunning and Glinsky, 2006) distributed with Delivery are automatically
properly initialized.

44
Appendix 8: Usage of the code

8.1 Inputs

Typical usage of the delivery inversion code demands 3 input files:

• An XML file ModelDescription.xml specifying the disposition of the lay-


ers, the rock and fluid physics, and so meta–information about the su traces
containing the variable prior information.
• One or two SU files for the seismic data. These will represent near (and
possibly far) offset data.
• An SU file containing the variable prior information.

In distributed systems, the latter two items may refer to named pipes on a
unix/Linux system.

A typical inversion would be run with the command


% delivery ModelDescription.xml -m prior traces.su -v 3 -PD -o re-
alisations.su -N 1000
which will generate (-N) 1000 realisations per trace and write them (-o) to
the file realisations.su, using the pinchout detection (-PD) BFGS meth-
ods described in section 4.1. The prior model (-m) is read from the SU file
prior traces.su. Many other commandline options are available, and can be
perused in the code’s SU–style self documentaion.

8.1.1 The ModelDescription.xml file

The inversion is chiefly driven by an XML file that specifies the necessary rock–
physics, layer descriptions, and information about the seismic. A well formed
XML file is crucial to this end, so the file can be created using a schema–
driven GUI editor expressly built for the purpose, which is also available at the
website. “Help” documentation for the inverter is available through standard
menus. The editor is also capable of creating, running and controlling complex
scripts on distributed systems, using a schema for common unix, SU, BHP SU
and delivery commands.

The structure of the XML document is controlled by the schema, and the
meaning of the various entries is readily inferred by reference to the examples
in the distribution.

Some particular entries in the XML file are worth further explanation

• The attribute “varies areally” may be set to true or false for a certain prop-

45
erty. If true, the quantity in question will vary on a trace–by–trace basis, and
is expected to be read in from the SU model file. The “<property indices>”
block then will contain an entry “property link” which defines the block of
the SU model file containing the property (see Appendix 5).
• Various properties may contain the attribute “same as layer above”. This
is set to true when the property in question should be a replica of the
same property in the layer above. For example, the low–frequency–interval–
velocity (LFIV) is usually a coarse scale measure representing an average
across many layers, so it is best fixed at the same value (or mapped to the
same stochastic variable) for all of the relevant layers. Likewise, the “depth”
input is an approximation used to sample from the trend curves, and may
safely be set to the same value for many adjacent thin layers. Prudent use
of this attribute helps to reduce the dimensionality of the final model.
• Each “<layer>” entry may have the attribute “synchronise rock matrix” set
to true. This is a flag to indicate that any stochastic rock properties of a given
rock type in this layer must be matched to those of the matching rock type
in the layer above (for example, the vp , vs and φs of the permeable member
in the two layers is mapped to the same underlying stochastic variable By
this means, two adjacent layers with matching end–member rocks are made
acoustically indistinguishable but for fluid property differences. Any internal
reflections are then due to fluid contrasts, and this feature can then be used
to make the layer boundaries track a fluid interface.
• Linear trend curves are of form vs = slope × vp + intercept ± sigma. The
Gardner-Gardner-Gregory regression curves for impermeable rocks are of
exponent
form ρ = factor × vP ± sigma.
• The “<output>” block has the following useful flags.
· “<density ordering>” is set to one of full/partial/none, with the meanings
as described in section 2.2.2.
· “<realisation number header field>” denoted the SU header word used to
store the realisation number.
· “<master depth layer number>” denotes the layer number from which
depths of the other layers will be computed using summations of layer
thicknesses (vp,eff × one–way time thickness). The actual depth of the mas-
ter layer is the depth variable d in the model vector m.

8.2 Outputs

The inversion code dumps stochastic realisation of the posterior model to a


series of SU traces, so the fastest scan direction in the resulting (large) SU file
is that of the realisation number, which defaults to the mark header word.
Posterior analysis of the simulations is performed with a second program de-
liveryAnalyser, which can form various statistical quantities of interest, syn-
thetic seismic traces, and blocked layer maps of the simulations. Some of the

46
salient quantities of interest include

• All the basic model variables in m of equation (3).


• Fluid type.
• Effective layer properties such as vp,eff , ρeff , Zeff (impedance).
• Effective reflection coefficients for each stack at the layer top.
• Both a normalised and full χ2 variable indicating the degree of match with
the seismic (summing over all samples and stacks), from equation (26)).
X
χ2seismic ≡ ferror (Ssyn − S)2 /σs2 (66)
error–sampling points

χ2seismic
χ2seismic,normalised ≡ (67)
Nerror-sampling-points

• Thickness (one–way time–thickness × vp,eff .


• Net sand (net sand = thickness ×NG ).
• Net hydrocarbon (Net sand × porosity × saturation).

Like delivery and SU programs, the program has self documentation, but
some typical analysis of a suite of realisations realisation.su might be

(1) deliveryAnalyser -i realisations.su -p-ascii t 6 | ascii–graphing–


program
Print the time of layer 6 from succesive realisations out as an ascii stream
and graph it. Optionally, the flag -G can invoke inbuilt graphics as a
replacement for the ascii–graphing–program for many functions, e.g.
deliveryAnalyser -i realisations.su -p-ascii t 6 -G
(2) cat realisations.su |deliveryAnalyser --mean NG 6
Print a series of realisation averages of NG in layer 6, one for each trace
location.
(3) cat realisations.su |deliveryAnalyser --filter fluid type = 1 4 --
trace--filter ’fldr>1234’ --mean NG 6
Print a series of realisation averages of NG in layer 6, one for each trace
location, but only for realisations with oil (fluid type=1) in layer 4, and
for the traces with fldr>1234 (note the shell protection).
(4) deliveryAnalyser -i realisations.su --histogram net-sand 6 --trace-
-filter ’fldr=34,tracf=12’ | ascii–graphing–program
Print a histogram of net-sand (NG ×thickness) in layer 6 at the specified
location.
(5) deliveryAnalyser -i realisations.su -s 4 5 R near wavelet.su |
suswapbytes | suximage
Produce a synthetic near–stack seismic from the model realisations over
the time interval 4-5 seconds from the wavelet wavelet.su and display
this on a little endian machine (all the java codes wrote big–endian SU
files).

47
Appendix 9: Incorporation of Floating Grain models into Delivery

A lithological phenomenon that affects some kinds of reservoir rocks is the


presence of granular material in the pore space that does not support the
rock matrix, though it does reduce the porosity and permeability. We assume
adequate rock–phyics regressions for the relevant facies can be built from log
data samples unaffected by such material. Delivery has been extended to model
these effects along the following lines.

9.1 Floating grain effects

The effects of floating grains in the rock matrix can be taken into account
using regressions curves of the form

−φflt
φ= + aφ + bφ Zeff + ǫφ (68)
1 − fc
vp = ap + bp (φflt + (1 + g)φ) + ǫp (69)

where fc is the capture fraction (volume of second material taken into sup-
porting matrix divided by total secondary material). The effective stress de-
pendence appears is some well chosen function Zeff , with available data, which
may be, e.g. of form Zeff = (1 − exp(−P/P0 )). The ǫ terms are noise terms.

These forms are chosen as linearisations of the nonlinear Gassman model

( Ã
Kg 3(1 − νm )
vp (φ, φflt ) = β(φ, φflt )
ρg + (ρb − ρg )φ 1 + νm
!)1/2
(1 − β(φ, φflt ))2
+ (70)
(φ + φflt )((Kg /Kf (φ, φflt )) − 1) + 1 − β(φ, φflt )

about a mean porosity φ̄ and φflt = 0, which arises from the following strong
assumptions. The grains are assumed to have the same properties as the matrix
(same material – subscript g), and variations in the Poisson’s ratio νm with
φ and φflt are neglected. We model the rock beta behaviour as a function of
structural porosity φs = φ + φflt as

β(φ, φflt ) = (1 − (φ + φflt )/φ̂0 )λ , (71)

and the ’effective fluid’ modulus by the Reuss average of floating grains and
brine moduli à !−1
φ/Kb + φflt /Kg
Kf (φ, φflt ) = . (72)
φ + φflt

48
The parameters λ, φ̂0 can be obtained by a nonlinear regression of the clean-
rock {vp , φ} data using φflt = 0 in the above formulation (equation 70), using
known values for Kg , ρg , ρb etc. The critical porosity φ̂0 is very likely to be in
the range 0.4 − 0.45. If a value for Poisson’s ratio representative of the average
porosity is chosen (e.g. νm = 0.15), the dominant unknown in equation (70)
is λ, which can be simply regressed for. Note that using an assumed value for
νm (e.g. νm = 0.15), appropriate for the average of data porosity will not yield
the correct pure quartz velocity vp (0, 0) since a pure quartz Poisson’s ratio is
closer to 0.07.

The slope and intercept terms in eqn (69) emerge naturally from the lineari-
sation of the nonlinear fit to (68) of clean rock data.

We define (1 + g) by
¯ ¯
∂vp (φ, φflt ) ¯¯ ∂vp (φ, φflt ) ¯¯
(1 + g) ≡ ¯ / ¯ (73)
∂φ ¯
φ=φ̄,φ =0
∂φflt ¯φ=φ̄,φ
flt flt =0

and sufficiently accurate numerical values for the derivatives are most simply
obtained by finite differences.

9.2 Recommended methodology

A possible regression methodology is as follows. For clean rock data,

• Assume Kg , Kb , ρg , ρb , νm
• Compute the mean porosity φ̄
• Fit the vp data for the non-linear parameters λ, φ̂0
• Compute the bp , bp (1 + g) coefficients of eqn (69) from the derivative expres-
sions in the Appendix at the mean porosity. Then ap = vp (φ̄, 0) − bp (1 + g)φ̄.
• For this clean rock data, here are no φflt terms in (68) and (69), so fit
the loading–behaviour
q
data to get the coefficients aφ , bφ and residuals RMS
2
σǫφ = hǫφ i
• Fit the linearisedqvelocity trend data to get the coefficients ap , bp , and resid-
uals RMS σǫp = hǫ2p i

Now, for each group of float-polluted well data 16 (i.e. which can be modelled
with an common effective φflt )

• Refit the two linearised trends, using different intercepts:

16Regressions for the data with floating grains can be performed using EM
(Expectation-Maximisation) techniques, a possible future development

49
φ = a′φ + bφ Zeff + ǫφ (74)
vp = a′p + bp (1 + g)φ + ǫp (75)

We then compute the common float fraction and capture ratio:

φflt = (a′p − ap )/bp (76)


fc = 1 − φflt /(aφ − a′φ ) (77)

A simple check of the theory is to confirm the new regressions have the same
slope as the clean rock data.

9.3 Conversion to Delivery formats

For use in Delivery, augmented regression relationships of the form

vp = Ap + Bp d + Cp LFIV + Dp φflt + ǫ′p (78)


φ = Aφ + Bφ vp + Cφ φflt + ǫ′φ (79)

will be used, using the existing notation in Delivery documents. The shear
relation is unchanged. The pressure dependence will be effected by taking the
loading depth d → Zeff (e.g. (1 − exp(−P/P0 ))) computed from e.g. basin
modelling (use the depth rock curves entry in the XML, to distinguish from
model depth). The LFIV will be dropped by setting Cp = 0. Approximate
conversion to this Delivery–style form of the coupled regression can be obtained
with

Ap = ap + bp (1 + g)aφ (80)
Bp = bp bφ (1 + g) (81)
1+g
Dp = bp (1 − ) (82)
1 − fc
ap
Aφ = − (83)
bp (1 + g)
1
Bφ = (84)
bp (1 + g)
1
Cφ = − (85)
1+g
q
σǫ′ p = σǫ2p + (bp (1 + g))2 σǫ2φ (86)
σ ǫp
σǫ′ φ = (87)
bp (1 + g)

50
Numerical Example

9.4 Rock trends

We use the loading depth term Zeff = (1 − exp(−P/800PSI)) for the loading
regressions, which is input to Delivery as the parameter depth rock curves.
At a particular location of interest, Zeff = 0.976 at P = 3000 PSI, roughly
around the depth 11000ft.

Modelling loading behaviour in a certain petroleum province yields the follow-


ing estimates for shales and sands.

9.4.1 Shales

The loading behaviour for non–reservoir rocks (shales) has also to be converted
to use the Zeff variable. For simplicity, we took typical velocities from the
existing depth–based trends, then set Bp = 0 to yield

vp = 10500 ± 470ft/s (88)


ρm = 0.278vp0.234 ± 0.027gm/cc (89)
vs = −2333 + 0.678vp ± 280ft/s (90)

9.4.2 Sands

The fits here are:

φ = 1.10 − 0.88(1 − exp(−P/800PSI)) − 1.54φflt ± 0.0024 (91)


vp = 18717 − 32100φ − 31060φflt ± 750ft/s (92)
vs = = −2774 + 0.77vp ± 240f t/s (93)

The coefficients we require are read off as ap = 18717, bp = −31060, aφ = 1.1,


bφ = −0.88, fC = 0.33 and (1 + g) = 1.033. Coefficients for use in Delivery are
computed from equations (80)– (87) as Ap = −16576, Bp = 28235, Dp = 16830,
Aφ = 0.583, Bφ = −3.11 × 10−5 , Cφ = −0.968. The shear coefficients are
As = −2774, Bs = 0.77.

From the estimated noise RMS values σǫφ = 750 and σǫp = 0.0024 we compute
σǫ′ p ≈ 700ft/s and σǫ′ φ ≈ 0.02 as working guesses.

51
9.5 Model system

Consider a 2–layer shale–seal:sand system, where the sand layer is initially


modelled as clean (N G ∼ N (1, 0)). The well logs used to construct the prior
have some clean rocks (no floating grains) and rocks with floating content
of around 5%. To model the inferability of the float fraction, we construct
synthetic seismic truth–case stacks for near and far stack angles of a few degrees
and about 30◦ , using a truth case model with 5% float, and all other parameters
at the most likely values from the trends. Fig 9 illustrates the system, with
truth-case plus posterior near and far synthetics from the posterior of case iii)
we describe shortly.

-0.02 -0.01 0 0.01 0.02 -0.02 -0.01 0 0.01 0.02


1.8 1.8

1.9 1.9

SHALE
2.0 2.0

SAND
2.1 2.1

Near Far
Fig. 9. Two–layer model system with truth case (thick red) seismic traces and syn-
thetics from the posterior (black) for the inversion case iii) described in the text.
The absolute noise level is set at 0.0025 for both stacks.

For inversion, the prior on floating fraction is taken as N (0, 0.052 ), and we
attempt to compute the posterior floating fraction fraction from 3 cases: i)
N G fixed at 1, near–stack only, ii) (N G ∼ N (1, 0.22 ), near–stack only, and iii)
(N G ∼ N (1, 0.22 ), near and far stacks.

For case i), figs. 10– 12 shows 3 possible forms of the prior – varied for illustra-
tive purposes – on which we superpose the original log data, which illustrates
how the floating grain effect smears out the regional prior. The actual prior
used int he inversion cases coincides with the inset (c).

52
1.4 ×104

1.3 ×104

1.2 ×104

1.1 ×104

1.0 ×104

9 ×103

8 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.050.100.150.200.250.300.358×1039×1031.0×1041.1×1041.2×1041.3×1041.4×104
Fig. 10. This and figs. 11 and 12: three ’pedagogic’ priors for velocity versus porosity
in pure sand layer. Clean well data points (+) and float-polluted well data points (*)
plotted on all three graphs: dots (.) are draws from model prior. (a) Prior constructed
with artificially narrow σφ = 0.002, showing how parameters arise from a clear 50:50
mixture of ’clean–rock regression’ points and an elliptical smear from the effects of
the floating grains. The clean rock trend is obviously far too narrow to embrace the
clean well (+) measurements, but the ’clean trend’ is clearly visible and centred.

1.4 ×104

1.3 ×104

1.2 ×104

1.1 ×104

1.0 ×104

9 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35

Fig. 11. Illustrative prior (b): prior drawn from clean rocks only (float–fraction
∼ N (0, 0)), with realistic σφ = 0.02. The tails of the distribution do not contain
the floating grain data comfortably. Clean well data points (+) and float-polluted
well data points (*), dots (.) are draws from model prior.

53
1.4 ×104

1.3 ×104

1.2 ×104

1.1 ×104

1.0 ×104

9 ×103
0.05 0.10 0.15 0.20 0.25 0.30 0.35

Fig. 12. Illustrative prior (c): actual prior with float–fraction distributed as
∼ N (0, 0.052 ). Only a few measurements appear to lie at the periphery of the distri-
bution, but the mixture character is not as clear to the eye as in fig. 10. Clean well
data points (+), float-polluted well data points (*), dots (.) are draws from model
prior.

9.6 Inversion Analysis of Posteriors

9.6.1 Fixed NG, single stack

Figure 13 shows salient scatterplots of properties of the sand layer before and
after inversion. The peak–signal to noise ratio is set at about 6:1.

9.6.2 Free NG, single stack

Figure 14 shows salient scatterplots of properties of the sand layer after inver-
sion, where the model has additional net–to–gross freedom free in the prior
NG ∼ N (1, 0.22 ).

9.6.3 Fixed NG, near+far stack

Inversion using shear data may help narrow down floating grain porosities
better, as the shear carries additional information. The far stack for this test
case is set at about 30 degrees (c.f. a few degrees for the near) and the reflected
amplitude is much weaker (AVO effects). The noise level was set at the same
value as for the far stack.

Figure 15 shows comparable scatterplots to 14.

54
0.14 2.6 0.2

0.12
A 2.5
B C

rho_s (gm/cc)
0.1
0.10

R_near
float_fraction

PRIOR
2.4
0.08
0.0
0.06
2.3

0.04
-0.1
2.2
0.02

0.00 2.1 -0.2


0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8000 10000 12000 14000 1970 1980 1990 2000 2010 2020 2030
phi (sand porosity) vp(sand) (ft/s) t(ms)
0.14 2.6 0.2

0.12
D 2.5 E 0.1
F

POSTERIOR
0.10

R_near
float_fraction

rho_s (gm/cc)

2.4
0.08
0.0
0.06
2.3

0.04
-0.1
2.2
0.02

0.00 2.1 -0.2


0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8000 10000 12000 14000 1970 1980 1990 2000 2010 2020 2030

phi (sand porosity) vp(sand) (ft/s) t(ms)

Fig. 13. Scatterplots from prior distribution (top row) and posterior (bottom row)
for 2–layer model with sand below shale. (A,D) float fraction vs sand porosity. A
perceptible narrowing around the true answer of float fraction=5% is visible. Fewer
clean sands are produced in the posterior. (B,E) The velocity (vp ) density scat-
terplot narrows more obviously. (C,F) The layer–time (t) versus effective reflection
coefficient Reff is clearly pinned down sharply. As usual, these parameters are most
heavily constrained.

9.7 Delivery Implementation details

Recall that Delivery works with two versions of the model vector m. The
vector m has a fully Gaussian prior, with no truncations or restrictions on
values. The physical model vector m′ , which is used in the forward model
and its associated likelihoods (seismic, isopachs) is obtained by applying time
orderings and truncations (e.g. of NG or saturations) to m, i.e. m′ = f (m), for
some suitable f . The truncation effectively induces a prior which, for simple
properties like NG, is a mixture of a truncated Gaussian distribution and delta
functions at endpoints.

With the augmented models defined by equations (78) and (79), the linearity
means the prior is still Gaussian, but the truncation of φflt in m′ must be
handled with care. The extra coefficients Dp , Cφ have the effect of placing the
prior on inclined ellipsoids in e.g. the {vp , φflt } plane, so pure truncation on φflt
has the effect of smearing the tail of the distribution onto the plane φflt = 0
in a direction off the principal axes. This is clearly a poor way to handle the
prior. Fig 16 shows a scatter plot of points produced from a prior constructed

55
0.14 2.6

0.12
2.5

0.10 A B

float_fraction
2.4
0.08

rho_eff
0.06
2.3

0.04

2.2
0.02

0.00 2.1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8 ×103 9 ×103 1.0 ×104 1.1 ×104 1.2 ×104 1.3 ×104 1.4 ×104
phi vp_s

1000 1.0

0.9
800

C 0.8

600
0.7
frequency

NG
0.6
400

0.5

200
0.4
D
0 0.3
−0.05 0.00 0.05 0.10 0.15 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

float_fraction float_fraction

Fig. 14. Posteriors from model with loosened NG distribution in prior. (A) float–frac-
tion vs sand porosity. (B) The velocity (vp ) vs density. (C) Histogram of float–frac-
tion from the posteriors of the models with fixed NG = 1 and free NG ∼ N (1, 0.2).
No obvious effect on the univariate float–fraction distribution is caused by allowing
the NG to become free: the peaks of the two delta functions are shown with ar-
rows. (D) Scatterplot of NG vs float–fraction. Again, no obvious strong correlation
appears here, with the density strongest near the truth case values (0.05, 1.0).

in this naive way, with the obvious artifacts. A more reasonable way to handle
the truncation is with the mappings (only for φflt < 0):

vp′ = vp − Dp φflt (94)


φ′ = φ + Bφ (vp′ − vp ) − Cφ φflt (95)
vs′ = vs + Bs (vp′ − vp ) (96)
φ′flt = 0, (97)

which forces the remapping to occur along directions parallel to the principal
axes.

This mapping minimises the difference (m′ − m)T CP−1 (m′ − m), subject to the
positivity constraint, which seems a reasonable formulation.

Note that the actual Gassman fluid substitution calculation that occurs later

56
0.14 2.6

0.12

0.10
A 2.5
B
2.4
0.08

float_fraction

rho_eff
0.06
2.3

0.04

2.2
0.02

0.00 2.1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 8 ×103 9 ×103 1.0 ×104 1.1 ×104 1.2 ×104 1.3 ×104 1.4 ×104
phi_s vp_s
1000 1.0

0.9
800

C 0.8

600
0.7
frequency

NG
0.6
400

0.5

200
0.4
D
0 0.3
−0.05 0.00 0.05 0.10 0.15 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
float_fraction float_fraction

Fig. 15. Scatterplots as per figure 14. The Posterior is more concentrated on the
solution NG = 1 (D), and similarly has less weight on the clean–rock solution
float–fraction = 0 (C and D) (i.e. note the reduced relative height of the spike
in (C)).
1.4 ×104 1.4 ×104

1.3 ×104 1.3 ×104


vp_s
vp_s

1.2 ×104 1.2 ×104

1.1 ×104 1.1 ×104

1.0 ×104 1.0 ×104

9 ×103 9 ×103
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

phi_flt phi_flt

Fig. 16. Left: pure truncation of φflt resulting in smearing of prior along φflt = 0
plane. The right figure illustrates the remappings of equations (94)–(97) which seem
more reasonable.
in the forward model uses only the pure fluids (oil, gas etc), as the Gassman–
like effect of the floating grain presence is implicitly accounted for by the
floating–grain terms in the modified regressions.

9.8 Inputs/Outputs

The prior distribution of the floating grain volume fraction is specified on a


layer–basis for the reservoir rock, like, say, the saturation of a hydrocarbon
type. It’s default is N (0, 0), so the model reduces to the standard model by

57
default. Outputs will be the modified porosity (1−SL )φs (actual fluid porosity)
and the additional field fL . The SU header word duse will encode output file
format variations for compatibility with older models (it is rarely used, and
defaults to zero).

Appendix 10: Vertical model–stacking functionality

An additional feature added to Delivery since the published paper is the ca-
pacity to vertically stack models, which is a clandestine way to study models
where transverse correlations between parameters at different spatial locations
can be important, or at least more tightly constrained than the allowable
(regression-error) variations in the regional rock curves.

The idea here is that the coupled inversion at two or more separate spatial loca-
tions – where there exists significant transverse correlation of layer properties
– can be effected by stacking several models on top of one another at a single
spatial location, and including coupling terms for the likelihood which connect
properties in arbitrarily different layers. The idea is illustrated in figure 17.

The seismic data from the several locations is “stacked” in a similar way. Con-
sider a “two–stack” case for example: a three-layer model with time–window
of interest t ∈ [1, 1.5]s, can be stacked into the total interval t ∈ [1, 2.5] with
location 1 in [1, 1.5]s and location 2 in [2, 2.5]s. A suitable buffering time in-
terval (in this case [1.5, 2]s) is used between the two packages. Dummy shale
packages can also be inserted if desired.

The user must specify a set of one or more layer mappings, each of which
specifies a base layer and one or more target layers via an xml entry like
<mapping>1:5,6</mapping> which has base layer=1 and target layers = 5,6.
These then enable the coupling of properties between the base layer and the
target layers. This is a quite general form – in simple two–location models
there is only one target layer for each base layer.

Functionality to couple the properties p ∈ {vp , vs , φs , ρm , NG, thickness} exists.


For each of these properties, the user provides a between–layers “difference”
variance σp (in the same units as the trend curves), and can (optionally) sub-
select the set of applicable base layers over which it is desirable to couple the
property. If the variance σ is small or comparable to the typical errors in the
regional curves, the coupled inversion can be expected to give quite different
answers to inversions carried out independently at each spatial location.

For the properties p ∈ {vp , vs , φs , ρm , NG}, the difference constraint can be


absorbed rigorously in the prior. The extra term in the Gaussian prior exponent

58
1

base_layers
2

4 coda
layer mappings

3σt wavelet
t exclusion
3σt window

5 precursor

6
target_layers

8 time t

Fig. 17. Illustration of construction of a stacked 8–layer model from two 3–layer
models with buffering shale layers. Mappings between comparable layers are shown
on the left. To exclude seismic–mismatch coupling effects from the region where
the two packages join, a seismic–likelihood exclusion–window is constructed, where
the XML specifies the parameters (start layer=5, end layer=4). The constructed
exclusion zone starts from a “safe” distance above the top of the start layer (i.e.
including prior time uncertainties and the effect of the wavelet coda) down to a
“safe” time below the bottom of the named end layer (using the wavelet precursor
and 3 standard deviations of the prior time pick).
(c.f. equation (29), with the fluid–choice subscript k suppressed for simplicity)
corresponding to this construction is of form

1 X X X (mI(p,ip ) − mI(p,jip ) )2
− log(P (m)) = .
2 properties p applicable target layers jip σp2
base layers ip

(98)

where I(p, j) denotes the index of property p, layer j, in m. The RHS in (98)
is a quadratic form which we may write as mT CV−1S m/2, where CV S is a ma-
trix assembled from a sum of block matrix type entries of the general type
( −1 1 )/σp2 , inserted in the appropriate rows and columns.
1 −1

The net effect of this term is that the existing prior specified by the rock–
physics curves and time–picks can be simply updated using the usual Bayesian
formulas to the new Gaussian prior with parameters

59

Cm = (CV−1S + Cm
−1 −1
)
′ −1 −1 −1 −1
m = (CV S + Cm ) Cm m (99)

Thus, the prior is the same multi–Gaussian expression as equation (29), but
with the modified mean and covariance as per (99). In particular, specification
of “vertical stacking” terms for p ∈ {vp , vs , φs , ρm , NG}, or their modification,
does not alter the overall prior probability of the model.

Note that the thickness constraint does not contribute a pure quadratic term
that can be absorbed rigorously into the Gaussian prior. Though we would
like to treat all vertical–stacking requirements as part of the prior (and this
is also often true for isopachs), this requires that the normalisation constants
are explicitly available, and these are not analytically available for terms that
do not appear directly as Gaussian. Hence the vertical–stacking constraint
for thickness is imposed via a likelihood term, very similarly to the isopach
constraint. This means that changes in the vertical–stacking thickness term will
alter the underlying probability of a model, just as the isopach and seismic data
does. For completeness, we write the likelihood from the thickness difference
as

1
LVSth (m) =
(2π)dth /2 |CVSth |1/2
1 X X (∆Zi (m) − ∆Zji (m))2
exp(− ).(100)
2 applicable target layers ji σZ2 i
base layers ip

where CVSth = diag{σZ2 i } contains dth entries, one for each term of the double
sum in the exponent.

A minor complication with this trick of vertical stacking is that the reflected
seismic amplitudes at the interfaces between each “package” of layers will
couple the properties between the top and bottom of adjoining packages, which
is usually undesirable. In this context, the amplitudes at these times are of no
interest, and should be excluded from the inversion. The user has the option
of excluding these using one or more xml entries of typical form

<seismic_likelihood_exclusion_range>
<start_layer>5</start_layer>
<end_layer>4</end_layer>
</seismic_likelihood_exclusion_range>

Each such entry defines a time–window where seismic–mismatch terms are not
accumulated in the likelihood equation (26). The window is defined by the
interval

60
t > t̄start layer − 3σtstart layer − |wavelet–coda|
t < t̄end layer+1 + 3σtend layer − |wavelet–precursor| (101)

where the mean and standard deviations t̄i and σti are those of the prior
picks of layer times, c.f. section 2.2.1. This definition enables the exclusion–
window(s) to track the model over space. Figure 17 illustrates how this works,
for a 8–layer model.

A typical XML chunk to perform two–location stacking for the 8–layer model
shown in this figure would be

<stacked_models_info>
<layer_mappings>
<mapping>1:5</mapping>
<mapping>2:6</mapping>
<mapping>3:7</mapping>
</layer_mappings>
<property_difference_std_deviations>
<vp_m applicable_layers="1,3">350</vp_m>
<rho_m applicable_layers="1,3">0.02</rho_m>
<vp_s applicable_layers="2">250</vp_s>
<phi_s applicable_layers="2">0.1</phi_s>
<NG applicable_layers="2">0.1</NG>
<thickness applicable_layers="2">15</thickness>
</property_difference_std_deviations>
<seismic_likelihood_exclusion_range>
<start_layer>5</start_layer>
<end_layer>4</end_layer>
</seismic_likelihood_exclusion_range>
</stacked_models_info>

For convenience, the runtime flag -IVS switches off all the inter–layer coupling
induced by the terms (98),(100), which is convenient for quick comparisons.

Appendix 11: Internal fluid–contact modelling

A common scenario in inverting is one in which the vertical sequence of possible


fluids is known from particular fluid contacts observed in wells or inferred from
pressure data. Often the relative uncertainty in the contact depth(s) is not large
relative to the layer–depth uncertainties in the inversion, and reflections from
the fluid contact(s) can be very strong, so using this additional contact data
is very useful.

61
The default operating mode of the inversion uses the supplied fluid probabilities
to construct an enumerated list of fluid combinations for the entire model,
where each combination assumes that layers are completely occupied by the
nominated fluid type. In situations where a particular fluid contact intersects
many different layers across the inversion domain, this style of modelling can
be messy to construct, especially if layers are introduced as a clandestine way
of capturing the fluid contacts.

As an alternative, a simple fluid–contacts inversion mode is supported, where


the fluid configuration of any layer is determined by specified fluid contacts in
depth, and the computed layer–depth using the model velocities, times, and
specified reference depth layer. In this mode, reflections arising from within-
layer fluid contrasts are incorporated into the forward model.

In this “contact–mode” inversion, fluid combinations are not enumerated as


before, rather, the fluid sequence is deterministically computed from a set of
1 or more “fluid units”, as depicted in figure 18.

Seismic trace and reflections

gas
‘go’ contact

oil
‘ob’ contact
oil brine
‘ob’ contact brine sub SHALE
lay
sub er 1
lay
er 2
Fluid unit 1

SHALE

Fluid unit 2
SHALE

Fig. 18. Illustration of concept of fluid units and depth–specified contacts.

Fluid units satisfy the following rules.

• A fluid unit is a set of contiguous layers, and within a fluid unit there may
be 1, 2 or 3 of the following contacts {go, gl, gb, ol, ob, lb} (oil (o), gas (g),
brine (b), low–saturation gas (l)), under usual full–density ordering.
• The set of fluid units must account for all the layers in the model
• Storage efficiencies require the total number of contacts from all fluid units

62
not exceed the total number of layers in the model (in practice a very unlikely
restriction).
• Two adjacent fluid units must have an impermeable layer at the bottom of
the upper fluid unit or top of the lower fluid unit. This allows unusual fluid
sequences between fluid units, analogous to the ’partial–density–ordering’
which models layer sequences not in hydraulic communication
• Fluid units are itemised in the XML in descending depth order
• Contacts in each fluid unit specify the upper and lower fluid type, and are
modelled by Gaussian distributions Z ∼ N (Zc , σZ2 c ) whose mean and stan-
dard deviation have default values in the XML, but which may be spatially
variable via use of the model–prior traces SU file. Contacts contribute a di-
agonal elements σZ2 c to Ck in equation (29) with associated mean Zc inserted
in the mean vector mk .
• Any layer which can in–principle host a fluid phase inferred from the fluid
units and contacts must specify the fluid in the fluid nameable in each
<reservoir endmember> subsection. E.g., if fluid unit 1 identifies a ’bo’ con-
tact, all permeable layers spanned by fluid unit 1 must specify the oil name
and saturation (the probability field is ignored). The oil name is expected
to match throughout the fluid unit.
• The fluid unit XML attribute fluid properties synchronised can force the
stochastic properties of all hydrocarbon properties within a fluid unit to be
mapped from a common underlying variable if set to true. This may be use-
ful, if, e.g. similar saturations are expected for, say, gas, in all reservoir lay-
ers in the fluid unit. If synchronisation is activated, the XML+model traces
specification of the uppermost active reservoir layer 17 controls the prior dis-
tribution of the fluid properties, even if this layer is above a contact excluding
the fluid.

It is important to note that the supplied fluid probabilities are ignored when
fluid occupations are determined using contacts in this mode.

11.1 Upscaling rules

The code populates a variety of layer–effective fields in the output realisations-


style SU files, so upscaling rules must be used when there are internal contacts.
We use the following:

• velocities vp eff and vs eff are time–thickness weighted averages of sublayer


velocities, so subsequent multiplication by the overall layer time–thickness
17A reservoir layer is defined by a layer with σNG > 0 or N G > 0. Active layers
are defined in Appendix 5.1. Note this can vary spatially, so the safest (and least
ambiguously documented) option is setting the priors identically in all reservoir
layers.

63
gives the true layer thickness, i.e. equal to the sum of the thicknesses of the
sublayers delimited by contact depths and the layer top/bottom depth.
• density rho eff and saturation S h are thickness weighted averages (the
latter lumping all non–brine fluids), which should give reasonable results
for forward calculation of rock mass and hydrocarbon in–place. For mixed
hydrocarbon phases, the fields S oil, S lsg, and S gas are probably more
useful. These last three are phase–specific thickness weighted averages.

11.2 New outputs in the realisations SU file

• Reflection coefficients R near fc, R far fc, which are near and far stack
reflection coefficients arising only from fluid contact reflections occurring
within layers.
• Times t fc of these reflection coefficients, in the same order. No associa-
tion of these times with a particular layer, via the layer–slot in the SU file
from which they are retrieved, can be made. The layer “owning” the reflec-
tions must be determined by comparison with the usual “t” entries. Times
are computed and written even for zero-reflections occuring within purely
impermeable layers.
• Depths d fc of these reflection coefficients, in the same order.
• fluid type is interpreted as an 2 byte integer (cast from the float value in
the SU trace), with legacy values 0=brine, 1=oil, 2=lsg, 3=gas. These values
require only 2 bits of information. For layers split by contact(s), the spare
bits of the integer are encoded to communicate the sub-layer fluid sequence
thus:
Bits 16. . . 13 12,11 10,9 8,7 6,5 4,3 2,1
| {z }| {z } | {z } | {z } | {z } | {z } | {z }
unused A B C D E F

A = number of internal reflections (= number of sublayers-1)


B = fluid type, sublayer 4, if A>2
C = fluid type, sublayer 3, if A>1
D = fluid type, sublayer 2, if A>0
E = fluid type, sublayer 1, if A>0
F = fluid type of thickest sublayer (”effective” fluid type) (102)

This encoding reduces to the legacy 0,1,2,3 values for layers not split by
contacts.
• S oil, S lsg, and S gas. These “effective” saturation components conserve
correct hydrocarbon–phase volume if multiplied by the layer thickness, in
cases where internal contacts exist. For example, if a oil–brine contact bisects
a layer (in depth), S oil is discounted by 1/2. For contact–free layers, their
values are sensible and equal the legacy S h field, for the appropriate phase.
• The Analyser program can now generate net-oil, net-lsg and net-gas,

64
which are correct special cases of net-hydrocarbon, using the S oil, S lsg,
and S gas fields described above

Reconstruction of some fraction of the internal detail of the layers is then


possible using these new or modified fields.

11.3 Example Inversion

Consider the simple 3 layer test problem of figure 19 (see the


Tests/InverterTests/Simple3LayerContactsTestProblem directory in the distri-
bution). The reservoir conditions here are typical of north west shelf Australia,
with 30% porosity sands at about 2km TVD. Agnostically, the reference depth
layer is at the top of the model, with a depth standard deviation of 20m.

Seismic traces

~60m

gas

‘go’ contact
SHALE
oil

‘ob’ contact

brine

Fluid unit 1

SHALE

Fig. 19. Simple 3 layer, single fluid unit, 2 contact problem: gas over oil over brine

The 2 contacts are set up with an XML section thus

<fluid_units_and_contacts>
<fluid_unit fluid_properties_synchronised="false">
<top_layer>1</top_layer>
<bottom_layer>3</bottom_layer>
<contact>
<type>GO</type>
<depth varies_areally="false">2050</depth>
<sigma_depth varies_areally="false">0</sigma_depth>
</contact>
<contact>

65
<type>OB</type>
<depth varies_areally="false">2120</depth>
<sigma_depth varies_areally="false">0</sigma_depth>
</contact>
</fluid_unit>
</fluid_units_and_contacts>

Fig. 20 shows synthetics for this model, where the Gassmann fluid effect is
very strong: indeed, fluid–contrasts are stronger than shale–sand contrasts.
2.00 2.00

2.05 2.05

2.10 2.10

2.15 2.15

2.20 2.20

2.25 2.25

2.30 2.30

Fig. 20. Synthetic seismic data for the model of fig. 19 at 200Hz (unrealistic) and
80Hz (realistic) bandwidth.

Fig. 21 shows typical realisations from the inversion of this model (which in-
cludes an isopach constraint of 60 ± 10m) on the reservoir. The brighter reflec-
tions, including fluid contrasts, yield sharper layer and internal contact times.
In the brine leg, the dimmer amplitudes produce greater uncertainties. The
peak–signal to RMS noise ratio is about 14 in the gas, diminishing to about 2–
3 in the brine leg. A fascinating aspect of the use of contacts is that the depth
uncertainty of layers is seriously reduced when a visible contact occurs in the
trace, as per the lower graph in fig.21. Here, the bright peaks sharply define the
reflectivities (thus velocities) and reflection times, thus the layer thickness can
be better nailed, and all that is needed is a known event in depth upon which
to hang the model. Notice the top depth of layer 1 is appreciably better pinned
by this mechanism, even though the reflection occurs in the layer below.

11.4 Likely Issues

The simple fluid contact model described here uses the usual optimisation
machinery which enumerates multiple modes by accruing local–optimisation

66
2.00

layer 1

2.05

2.10 layer 2

2.15
time (s)

layer 3

2.20

2.25

2.30
25
depth std. deviation

20

15

10 top, layer 1

5
top, layer 2
0

Fig. 21. Typical inversion realisations showing internal contacts well resolved in gen-
eral. The lower graph is depth uncertainty, which reduces markedly when a contact
is visible.
solutions started from an assortment of dispersed starting points. Multimodal-
ity associated with loop skipping will still occur, and can be expected at traces
where internal fluid contacts are close to layer boundaries. Tunnelling-like be-
haviour of the MCMC sampler is not unlikely if the modes are very different
in curvature and the mode–jumping strategies of section 4.3.2 are not well–
adapted. Some of this character is already evident in the brine leg of the
example (fig. 21). Careful examination of sample outputs for good mixing is
encouraged.

Another not–yet thoroughly tested issue is that the posterior surface will
have jump discontinuities associated with contacts pinching out, analogous
to the discontinuities arising from layer pinchouts. The latter are dealt with
via quadratic-programming (QP) techniques tacked on to the Gauss-Newton
schemes, but this strategy is much harder for internal fluid–contact discon-

67
tinuities, as the fluid contact is not specified in time. Possible linearised QP
strategies are under consideration.

Appendix 12: Model selection facilities

It is often desirable to compare the relative statistical significance of several


candidate models k = 1 . . . N , where higher level basic assumptions in the
model may vary. The existing apparatus implicitly performs model comparison
in a single run, where the changes involve different fluid combinations k, but
higher level model changes will need to be assessed by comparing outputs from
separate runs of the code. Henceforth, we use k to label a general model choice.

The fundamental entity in Bayesian model selection strategies (Gilks et al.,


1996),(Denison et al., 2002) is the marginal model likelihood, obtained by the
marginalisation integral
Z
Π(k|S) = Π(mk |S)dmk , (103)

using the notation of equation (28), and interpreting S liberally to mean all
the “data”: seismic, isopachs etc. The priors P (mk ) and likelihoods used to
construct the posterior Π(mk |S) must be normalised correctly.

Between any two models k and j, the marginal likelihoods can be used to
construct the Bayes Factor
Π(k|S)
Bkj =
Π(j|S)
which is usefully interpreted as the posterior odds ratio of model k to model
j. More generally, among k = 1 . . . N models with prior probability Pk , the
posterior probability of model k is

Π(k|S)Pk
P (k|S) = PN .
j=1 Π(j|S)pj

The chief challenge is performing the integral in (103). A first useful approx-
imation is the Laplace approximation (Raftery, 1996), based on a quadratic
expansion about the MAP point m̃ obtained at the termination of the Newton
optimisation scheme ( omitting the label k where context permits):

2 (m̃)
Π(mk |S)Pk = e−χ exp(−(m − m̃)T C̃m
−1
(m − m̃)/2). (104)

Here χ2 (m̃) is the minimum of the objective function used in the optimiser,
evaluated at the termination point m̃. For completeness, we define this as

68
χ2 (m) = χ2full-seismic (m) + χ2iso (m) + χ2VSth (m) + χ2prior (m) + χ2fluid–prior
(105)

where

1 X
χ2full-seismic (m) =
2 stacks i
error–sampling points j
( )
ferror (Ssyn,j (m) − Sij )2 2
2
+ [log 2π + log(σs,i /ferror )]
σs,i
1 X (vj,p,eff (tj − tj−1 ) − ∆Zj )2
χ2iso (m) = 2
2
+ [log 2π + log(σ∆Z )]
2 isopachs j σ∆Z j
j

1 X
χ2VSth (m) =
2 applicable
base layers ip ,
target layers ji
( )
(∆Zi (m) − ∆Zji (m))2
+ [log 2π + log(σZ2 i )]
σZ2 i
1 −1 dm 1
χ2prior (m) = (m − m′ )T C ′ m (m − m′ ) + [ log 2π + log(|Cm′
|)]
2 2 2
χ2fluid–prior = − log(Pk ) (106)

The likelihood and priors are exactly as per the original equations (26),(27),(29)
(with (99), and (100) if vertical stacking is used), but with the normalisation
constants (terms in [] brackets) explicitly delared. The log 2π terms matter
since models of different dimensionality may be compared. All non–thickness
terms from vertical stacking, and/or fluid–contact terms (if contacts are part
of the model), are explicitly absorbed in the multi–Gaussian prior. If there is

no vertical stacking, we drop the primes on Cm , m′ , and omit χ2VSth .

The integration for the marginal model likelihood in equation (103), using this
approximation, is then straightforward, and yields
2 (m̃)
Π(k|S)Pk = (2π)dm /2 |C̃m |1/2 e−χ (107)

The quantity we believe is of most practical use to users is

-log(MML) ≡ − log Π(k|S)


≈ χ2full-seismic (m̃) + χ2iso (m̃) + χ2VSth (m̃) + χ2prior (m̃)
dm 1
− log 2π − log(|C̃m |) (108)
2 2
since this is independent of the prior model probability Pk , and scores model
significance on a useful logarithmic scale. This quantity should be able to

69
be meaningfully compared between different runs of the code, with different
modelling assumptions. Ratios of exponentials of -log(MML) can then be in-
terpreted directly as Bayes factors.

As a simple example, suppose we run model 1, with oil–rich sandstone, us-


ing the options --MAP -v 3. This will produce a table of Normalised local
modes, containing an entry for the -log(MML) term. The smallest -log(MML)
entry represents the most likely mode (assuming multiple modes from e.g. fluid
combinations or loop–skips), and will coincide usually with the largest mode
weight. Then we run with an alternative model, say model 2, with soft shale
instead of “oil–rich sandstone”. Again, we read off the smallest -log(MML)
entry. The posterior odds ratio of the soft–shale model is then

odds ratio = exp(log(MML)soft–shale − log(MML)oil–rich sandstone ).

In principle, the marginal model likelihood is robust to many changes in the


prior. A quick check of its sensibility can be obtained by “switching off” the
data terms in a model comparison experiment (seismic and isopachs via -IS
-II , and the thickness constraints in vertical stacking): no posterior odds
ratio preferences ought to occur in such experiments. Comparisons of models
with different data windows (e.g. varying time gates) is in principle legal,
but is usually fraught with danger and is not recommended. One may expect
very complex models (relative to the Nyquist rate which yields the seismic
data count) to be increasingle penalised by this measure, in typical BIC–like
behaviour.

Simple comparisons, like the substitution of rock types, or variations in the cho-
sen net–to–gross models, should in general be meaningful. In situations where
severe pinchouts occur, the accuracy of the Laplace approximation may be
problematic, so careful investigations are recommended. For example, pseudo
parabolic cross sections through the mode optima can be dumped to ascii files
PosteriorXsection.* by invoking --Xsections 2, which is useful as a quick
check of the adequacy of the quadratic approximation.

70
Appendix 13: MAP point estimates and fluid model probabilities in
noisy problems

Bayesian inversion codes are often required to supply “point estimates”, or


“most–likely” type values for the parameters they try to infer. When the prob-
lem has a model–selection aspect, it is usually tempting to try and provide the
most–likely parameters in the model that appears to have the largest posterior
probability. This can be quite different to the parameters for the model that
best fits the data at a particular point.

Model probabilities are usually found by trying to marginalise/integrate over


all the continuous parameters in the problem, yielding a model “evidence”, or
marginal model likelihood (MML) (see Appendix 12). This is the underlying
machinery in trying to infer fluid types in Delivery. The Bayes factor in com-
paring two models is then usually written as a likelihood–ratio term, with an
extra compensation factor reflecting the probability contribution that comes
from the width of the posterior, usually called the “Occam factor” (see e.g.
Ch 28 of Mackay (2002), and Denison et al. (2002)). In Laplace–like approx-
imations to the MML this Occam factor is identified with the determinant
of the mode Hessian (e.g. eqn (107)), so, for example, a model–comparison
Bayes–factor for the posterior probabilities of models 1 and 2 might look like
2
(2π)dm,1 /2 |C̃m,1 |1/2
e−χ1 (m̃1 )
B1,2 = ·
−χ22 (m̃2 )
. (109)
(2π)dm,2 /2 |C̃m,2 |1/2
e
| {z }
| {z }
Occam factor Likelihood ratio

This “Occam factor” boosts the model probability for models that have diffuse
posterior distributions, relative to models that may have better maximum–
likelihood points, but narrow posteriors. Thus it is not necessarily true that
the model with the smallest χ2 fit is the most probable model, which may be
confusing to some users.

This behaviour is logically consistent, but slightly counter-intuitive, and high-


lights the problems that can be caused by placing undue significance on par-
ticular point estimates. The following examples illustrate what is going on.

13.1 An illustrative toy problem

Consider the problem of Bayesian estimation for fitting a box-car signal to


“truth-case” data given by the box-car function of fig.22. The true signal is
B(t), which we fit to a two–parameter model bB(t−τ ), so {b, τ } = {amplitude,

71
timing–offset}. Naively, we expect the best parameters to be b = 1, τ = 0, but
some interesting behaviour arises when the noise gets large.

We will assume Gaussian noise, even though the signal in this case is noise-
free. We will also use flat Bayesian priors on the parameters b, τ , but we need
a finite range on τ , so −1 < τ < 1 seems reasonable.

1.0
B(t)
0.8

0.6

0.4
b B(t-τ)

0.2

t
-2 -1 0 τ 1 2 3 4

Fig. 22. Boxcar function B(t) and synthetic model bB(t − τ ).

If the noise level is σ, we can define the posterior by


N Z∞
Π(b, τ ) ∼ exp(− (B(t) − bB(t − τ ))2 dt) (110)
2σ 2 −∞
corresponding to Gaussian noise and an equivalent χ2 misfit error term
N Z∞
χ2 = (B(t) − bB(t − τ ))2 dt.
2σ 2 −∞
Here, N is something like the “number of equivalent data per unit time”, and,
in a practical problem, the integral would be approximated by a sum.

This problem is interesting, because the parameter τ is related to layer–time


detection, and b to fluid/saturation detection in Delivery. For a situation with
several candidate fluid types, b would have a few discrete values, and we might
be trying to determine the most likely value of b and τ .

The boxcar problem is trivially simple. We get



N  2b|τ | + (1 − b)2 |τ | < 1

2
χ = 2
2σ  2
1+b otherwise

So the joint maximum–likelihood point (minimum of χ2 ) is {b = 1, τ = 0},


rather obviously. Using this solution for χ2 , the joint–posterior probability
surface of eqn (110) looks like fig. 23.

72
1.0

0.5

0.0

0.5

1.0
0.0 0.5 1.0 1.5 2.0
b

Fig. 23. Joint posterior of boxcar detection problem.

Since we have in mind a problem where b might take on a few discrete values,
we might like to find the relative probability of those values by integrating
out all the other parameters in the problem. Thus, an interesting quantity to
compute is the marginal distribution of b,

2 /2σ 2 2
Z 1 e−N (1−b) (1 − e−N b/σ )
Π(b) = Π(b, τ )dτ ∼ .
−1 Nb

For N = 1.1, curves of Π(b) at various noise levels are shown in fig. 24. One
can see from this figure that for low noise levels, the “most–likely” value from
the marginal for b is close to 1. However, for high noise levels, it shrinks away
from 1 towards smaller values (but remembering that the uncertainty is rather
high). This comes from the extra probability mass from larger τ values that
are reasonable if the event amplitude is smaller, as per the asymmetry of the
joint density in fig. 23. Thus we can see that if the support of b was confined
to a few discrete values, the “most–likely” choices of b would fall on weaker
signals if the noise level is high.

As a concrete example, we might consider two distinct cases b = 1 and b = 0.25,


where, for noise σ = 0.5 and N = 1.15, the difference in χ2 at the joint MAP

73
Π(b) σ=0.6

σ=0.4

σ=0.2

σ=0.1

b
1.0 1.5 2.0

Fig. 24. Marginal distributions of b for various noise levels.


N (1−b)2
point χ2 = 2σ 2
is

N [(1 − 1)2 − (1 − .25)2


∆χ2 = ≈ 1.3
2σ 2
Based on this alone one might imagine a ≈ 3 : 1 model–selection preference
for the case b = 1. However, the marginal distribution for b puts the odds ratio
close to 1.3. Two slices through the joint posterior for b = 1 and b = 0.25 are
shown in fig. 25, showing how the extra probability for the “poorer fit” model
comes from the wide posterior.

1.0
Π(τ|b)

0.8

0.6 b=1

0.4

b=0.25
0.2

τ
-1.0 -0.5 0.0 0.5 1.0

Fig. 25. Conditional distributions Π(τ |b) for b = 1, b = 0.25 (slices through Fig 23).
Both cases have comparable overall probability, even though the maximum likelihood
point for the case b = 1 seems dominant.

The phenomenon is generic. For example, one can do the calculation for the
“stretched box–car” model with a box–car synthetic plateau between τ1 <
t < τ2 , so free parameters are {b, τ1 , τ2 }. With everything as before, but just
including an extra marginalisation over τ2 as well as τ1 , the marginal for b is
much the same as before: see fig. 26.

74
Π(b)
12

10

6
σ=0.6
4

2 σ=0.4
σ=0.2
σ=0.1 b
1.0 1.5 2.0

Fig. 26. Marginal distributions of b for various noise levels, for stretched box–car
problem (τ1 , τ2 )
A few points are helpful to clarify the counter-intuitive nature of this be-
haviour. 1) The MML machinery is the most logical approach, granted the
assumptions of the noise model. The box–car functions used above are clearly
“error–free”, so our instinctive reaction as to what the right criterion should
be is formed by a model with very small noise levels – which we automatically
intuit when inspecting the data. Had the signal and synthetic been drawn with
noise deviates added which are consistent with the noise level in the likelihood
function, the MML procedure would seem much less counter-intuitive. Fig. 27
shows a cartoon of the sorts of signals we might be trying to model with, for
example, a noise level of σ = 0.5. 2) The behaviour is a fairly sensitive function
of the peak/noise–RMS ratio, and for most signals where the ratio is 4:1 or
better, the effect is weak. This is the usual regime we like to work in.
data
1.5

1.0

0.5

t
4 -2 2 4 6

- 0.5

- 1.0

Fig. 27. Example random telegraph signal realisations with boxcar truth case masked
by noise of roughly σ = 0.5

13.2 Demonstration on the Gas–Oil–Flank toy Delivery problem

A practical “walk–through” of the issue on the context of a simple Delivery ap-


plication is outlined in the file au/csiro/JamesGunning/Tests/InverterTests
/GasOilFlank/README MAP CLASSIFICATION, supplied with the Delivery source

75
code 18 . The demo problem is a two–layer model of a flank of a hydrocarbon
deposit, with shale cap–rock, and a sand halfspace underneath. The “true”
fluid changes from brine to oil to gas as we go updip. For reference, the peak
seismic amplitude (gas) is about 0.15, and the oil and brine peaks are about
0.05 and 0.03 respectively. With small noise (σ = 0.015), the MAP model and
MAP fluid classification is “correct”, and shown in fig. 28. With the noise at
1.65E0 1.65E0

1.7E0 1.7E0

1.75E0 1.75E0

1.8E0 1.8E0

1.85E0 1.85E0

1.9E0 1.9E0
t

t
1.95E0 1.95E0

2E0 2E0

2.05E0 2.05E0

2.1E0 2.1E0

2.15E0 2.15E0

2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample

Fig. 28. Gas–oil flank problem. Left: true(red) and synthetic(black) MAP traces for
15 cdp locations, going updip. Right: fluid classification in lower reservoir under
shale (brown), for small noise level (peak/noise–rms≈10). Fluids codes: blue=brine,
red=oil, orange=gas.
σ = 0.08, the MAP classification misses oil, and puts oil where the gas is: see
fig. 29. The maximum likelihood solution, (or minimum χ2 ) solution, found
using --ML instead of --MAP makes the visual coherence of the fits closer, as
one would expect: see fig. 30. The misclassification of the MAP method is of
no particular significance in view of the uncertainty of the fluid type at these
noise levels: compare figs 31 and 32.

By way of comparison to the preceding box–car problem, for trace 15, which
has gas, at a noise level σ = 0.059, the ∆χ2 between the gas and oil cases is
again 1.3 (in favour of gas), but the posterior Bayes factor in favour of gas is
only 1.45, less than half of the likelihood ratio e1.3 ≈ 3.7.

13.3 Recommendations

There are clearly cases where, if the noise level is high enough, a model–
selection based on marginalising out free parameters will favour weaker signals
18 The file location is relative to the root of the source tree

76
1.65E0 1.65E0

1.7E0 1.7E0

1.75E0 1.75E0

1.8E0 1.8E0

1.85E0 1.85E0

1.9E0 1.9E0
t

t
1.95E0 1.95E0

2E0 2E0

2.05E0 2.05E0

2.1E0 2.1E0

2.15E0 2.15E0

2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample

Fig. 29. Gas–oil flank problem. MAP solution and fluid classification, for large noise
level σ = 0.08 (peak/noise–rms≈1.9). True seismic in red, MAP synthetic in black.
Fluids codes: blue=brine, red=oil, orange=gas.

1.65E0 1.65E0

1.7E0 1.7E0

1.75E0 1.75E0

1.8E0 1.8E0

1.85E0 1.85E0

1.9E0 1.9E0
t

1.95E0 1.95E0

2E0 2E0

2.05E0 2.05E0

2.1E0 2.1E0

2.15E0 2.15E0

2.2E0 2.2E0
0E0 2E0 4E0 6E0 8E0 1E1 1.2E1 1.4E1 1.6E1
seismic amplitudes sample

Fig. 30. Gas–oil flank problem. Maximum likelihood solution and fluid classification,
for large noise level σ = 0.08.

in spite of the χ2 likelihood preference for models with stronger amplitudes,


even when comparing models with matching dimension and flat or comparable
priors. In this regime, whether one uses the MAP point or the maximum–
likelihood point as the estimate of b is rather moot, as it is likely the model
probabilities are broadly comparable. Further, the probability estimates pro-
vided by either the Laplace approximation or MCMC sampling will likely have
errors of the same order as the difference in model probabilities, especially in
high–dimensional problems. More philosophically, this highlights the weakness

77
1.65E0 1.65E0

1.7E0 1.7E0

1.75E0 1.75E0

1.8E0 1.8E0

1.85E0 1.85E0

1.9E0 1.9E0
t

t
1.95E0 1.95E0

2E0 2E0

2.05E0 2.05E0

2.1E0 2.1E0

2.15E0 2.15E0

2.2E0 2.2E0
0E0 1E2 2E2 3E2 4E2 5E2 6E2 7E2 8E2
seismic amplitudes sample

Fig. 31. Gas–oil flank problem. Posterior samples for small noise σ = 0.015. True seis-
mic in red, MAP synthetic in black. Fluids codes: blue=brine, red=oil, orange=gas.
1.65E0 1.65E0

1.7E0 1.7E0

1.75E0 1.75E0

1.8E0 1.8E0

1.85E0 1.85E0

1.9E0 1.9E0
t

1.95E0 1.95E0

2E0 2E0

2.05E0 2.05E0

2.1E0 2.1E0

2.15E0 2.15E0

2.2E0 2.2E0
0E0 1E2 2E2 3E2 4E2 5E2 6E2 7E2 8E2
seismic amplitudes sample

Fig. 32. Gas–oil flank problem. Posterior samples for large noise σ = 0.08. True seis-
mic in red, MAP synthetic in black. Fluids codes: blue=brine, red=oil, orange=gas.
of relying on point estimates: in this regime, the relative model probabilities
are so comparable that any forecast of interest must average over all of them
in some useful way.

Some users like to see maximum–likelihood (minimum χ2 ) fits, and are not
comfortable with the use of the “Occam” factor. A practical solution is to
provide each of these outputs:

• the model and parameters with the maximum likelihood point, omitting
marginalisation, obtained with the flag --ML.

78
• the MAP point of the most–likely model, as judged from the marginalised
evidence. Use --MAP.
• full samples from the joint posterior over all models and parameters, so any
forecast can take into account the uncertainty in both models and param-
eters. Typically this is some combination like -RWS -N 1000 with perhaps
--Raftery-auto-decimation

This way users have access to whichever quantities are of interest to them, and
are free to interpret as they please.

The --no-descent-test flag also forces more exhaustive exploration in the


optimisation phase, so this can be helpful. The transition into this conservative
“dimming” behaviour in the inference is also a rather sensitive function of the
noise level, so it may be worth revisiting the estimation procedures for that
parameter.

79
Appendix 14: Use of additional impermeable end–members in the
rock–physics

There are situations where the impermeable material in a reservoir layer may
be a composite of two or more impermeable facies, typically, for example, a
mixture of shale and a cemented facies. In such cases, it is useful to introduce
a third volumetric mixing fraction, and treat the impermeable rock composite
via effective medium theory in the same way we model the mix of reservoir and
non–reservoir material. To this end, we have defined an additional parameter
NGhard as the volume–fraction of impermeable material occupied by the second
impermeable (“hard”) rock type, as per the depiction:

NG
|{z} 1-NG
reservoir–endmember
✛ permeable ✲✛ impermeable ✲
1 − NG
| {z hard}
NG
| {zhard}
nonreservoir–endmember nonreservoir–hard–endmember

The effective properties of the composite impermeable material are modelled


by Backus averaging of the standard non–reservoir endmember and the second
“hard” rock type. Thus, as per equation (9), these are
1 NGhard 1 − NGhard
= + , (111)
Meff,impermeable Mhard Mnonreservoir–endmember

where M is either the p-wave (M ) or shear (µ) modulus, and the effective
impermeable density is

ρeff,impermeable = NGhard ρhard + (1 − NGhard )ρnonreservoir–endmember . (112)

There are two modelling situations that are of interest. Either i) the hard rock is
known beforehand, and its properties and distributions are desired to be fixed
for the reservoir layer for all fluid models, or ii) the hard rock present may
vary in accordance with the fluid type present. This might be to reflect some
geochemistry, or perhaps a sleight–of–hand trick to get the hard rocks to occur
in vertical sequences with particular orderings, just as fluids do under density–
ordering. Thus, the hard rock is specified by (optional) XML fragments under
/inversion/model- description/* layer/nonreservoir hard end-member, and comes
in two flavours:

• Fixed rock–type, with xml entries like


<nonreservoir_hard_endmember>
<name>rock_A</name>
<net_to_gross_hard>0.5</net_to_gross_hard>

80
<sigma_net_to_gross_hard>0.1</sigma_net_to_gross_hard>
</nonreservoir_hard_endmember>
The prior model and distributions for the rock type are then referenced as
per a standard impermeable rock (see e.g. equation (8)).
• Fluid–dependent rock–type. In this case the XML structure is, for example
<nonreservoir_hard_endmember>
<fluid_dependent_nonreservoir_hard_endmembers>
<brine_case>rock_A</brine_case>
<gas_case>rock_B</gas_case>
<oil_case>rock_C</oil_case>
</fluid_dependent_nonreservoir_hard_endmembers>
<net_to_gross_hard>0.5</net_to_gross_hard>
<sigma_net_to_gross_hard>0.05</sigma_net_to_gross_hard>
</nonreservoir_hard_endmember>
Under this specification, the hard–rock used in the stochastic model for
this particular layer then varies according to the fluid state or configura-
tion. XML entries under /inversion/rock fluid properties/rock- properties/-
-nonreservoir- endmember are then expected for the rocks named under
brine case, oil case etc. This can be a useful tool for modelling discrete
facies–uncertainty if the rock–type uncertainty can be emulated by a fluid–
ordering model family.

Just to be clear, the “nonreservoir–endmember” rock corresponds to the quan-


tities vp,m , vs,m , ρm in equation (1). The standard non–reservoir endmember
properties are available as vp m, vs m, rho m, and the second non–reservoir
rock as vp hard, vs hard, rho hard, with mixing NG hard.

Appendix 15: Alternative settings for fluid and fluid–state probabil-


ities

In section 2.2.2 we have described the default mode of assembling discrete fluid
states to be examined in the inversion problem, and how the prior probabil-
ity of these states is constructed from the layer–based probabilities of fluids
and the density–ordering rules via equation (5). This algorithm guarantees
that the prior probability of the various fluid states is consistent with the
density–ordering effect. Thus, it invariably results in increasing probabilities
of lighter fluids (e.g. gas) when ascending a stack of communicating layers, and
conversely increasing heavy–fluid (e.g. brine) probabilities when descending.

It is sometimes desirable to set the probabilities of fluid states from layer–


based fluid probabilities observed after the density–ordering process, e.g. from
observed abundances in representative drillhole data. As a simple example, if
a two–layer system had three equiprobable allowable fluid states, depicted in

81
· ¸
o g g
an obvious shorthand notation as o o g
, then the observed abundancy of gas
in the top layer ought to be 2/3, and 1/3 in the lower layer. In default mode,
this has to be achieved by setting the (pre–ordering) prior probabilities to 1/2
for each of oil and gas in each layer. For general observed abundances, reverse–
engineering the needed pre–ordering probabilities to match these post–ordering
target–abundances is a rather tedious fitting problem.

The code has an optional mode of operation controlled by the output/take fluid-
- probabilities as post ordering tag in the XML file, where these fluid–state
probabilities are recomputed after the usual recursive density–ordering has
enumerated the set of legal configurations, so as to generate the desired abun-
dances given by the layer–based probabilities. In other words, the prior fluid
probabilities are interpreted as post–ordering abundancy targets.

Some care is required in using this mode. Since the density–ordering induces
inequalities on these probabilities, it is possible for a user to enter a sequence
of fluid–probabilities that are inconsistent. For example, in the simple two–
layer case above, specifying a lower probability of gas in the top layer than
the bottom layer does this. Inconsistency will not occur if the abundances
are estimated by counting of samples and the ordering–model is correct, but
in general it might be expected to occur. Thus, the current implementation
seeks a bound–constrained least–squares fit of the target probabilities. For the
interested, the technical details of how this is obtained constitute the remainder
of this appendix.

Given a set of k = 1 . . . NF fluid states, fluids j ∈ {b, l, o, g}, layers i = 1 . . . Nl ,


assemble the matrix

 1, fluid j present in layer i, fluid state k

(F )
X{ji},k = 
 0, otherwise

where {ji} indicates an unrolled index over j, i. If P = {P1 , . . . PNF } is the


unknown vector of fluid–state probabilities, and P = {P{ji} } is the (unrolled)
vector of supplied fluid target probabilities/abundances, then a consistent tar-
get would satisfy X (F ) .P = P. To cater for possibly inconsistent targets we
seek the bound–constrained least–squares minimum of

χ2FP = 12 ||X (F ) .P − P||2

where the constraints are 0 ≤ Pk ≤ 1 and N


P
k=1 Pk = 1. This is a standard,
F

linear equality and inequality constrained quadratic–programming problem.


The equality constraint is treated by elimination, writing PR = {P1 , . . . PNF −1 }
and à ! à !
I 0
P= PR + ≡ U.PR + b
−1T 1

82
so the new problem in terms of PR is
(F )
χ2FP = 12 ||X (F ) U PR − (P − X (F ) b)||2 ≡ 12 ||XR .PR − PR ||2 (113)
(F )
with XR = X (F ) U and PR = P − X (F ) b. This problem leaves only the
inequality constraints
NX
F −1

0 ≤ PR,k ≤ 1 PR,k ≤ 1. (114)


k=1

We solve this using the following simple algorithm


(F )T (F ) (F )T
(1) Solve directly the OLS problem (XR XR + ǫI)PR = XR P, where
ǫ is a small positive constant used to ensure invertibility, via Cholesky
decomposition. If the solution vector satisfies the constraints, terminate.
This is the usual case if the suite of targets are consistent. Cache the
solution as per step (3) below.
(2) If the solution above does not meet the inequality constraints, we use
the following fallback method. Solve using the projected steepest descent
method with gradient
(F )T (F )
∇χ2FP = XR (XR PR − P) + ǫPR

using the methods outlined in section 5.4 of Kelley (1987). For infeasible
points, the projection onto the polytope defined by (114) is a bit non-
trivial, so some care is required. Clearly a general SQP method would
be better for this problem, but (a) the problem is usually well–scaled, so
steepest descent is not too bad. (b) the dimensionality is often very low,
so the inefficiencies don’t matter (c) a well–formulated problem shouldn’t
enter this fallback method anyway, and (d) the fluid–state probability so-
lutions are cached, and often these do not have to be recomputed every
trace.
(3) Cache the problem solution, so if the problem is identical at the next
trace, the solution can be retrieved from cache very efficiently (a very
common case).

83

You might also like