0% found this document useful (0 votes)
86 views19 pages

Ali Geo

Uploaded by

Salzar Bilal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views19 pages

Ali Geo

Uploaded by

Salzar Bilal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Computers & Geosciences 176 (2023) 105339

Contents lists available at ScienceDirect

Computers and Geosciences


journal homepage: [Link]/locate/cageo

Research paper

formikoj: A flexible library for data management and processing in


geophysics—Application for seismic refraction data
Matthias Steiner ∗, Adriá n Flores Orozco
Research Unit of Geophysics, Department of Geodesy and Geoinformation, TU Wien, Austria

ARTICLE INFO
ABSTRACT
Keywords:
Geophysical data processing We introduce the open-source library formikoj, which provides a flexible framework for managing and
Seismic refraction processing geophysical data collected in environmental and engineering investigations. To account for the
First break picking substantial changes regarding the market shares of operating systems within the last two decades, the library
Seismic waveform modeling is specifically implemented and tested for cross-platform usage. We illustrate the applicability of the formikoj
Cross-platform application library for the forward modeling of seismic refraction waveform data with the SeismicWaveformModeler
Geophysical python library
based on a custom subsurface model and survey geometry. We use these synthetic seismic data set to demon-
Flexible open-source library
strate the fundamental seismic refraction processing capabilities of the SeismicRefractionManager;
Wave based methods
thus, illustrating the ability to combine modeling and processing tasks in a single workflow. Based on real
3D field measurements we present the available range of possibilities provided by the formikoj library for
the processing of seismic refraction survey data. In particular, we explore different visualization techniques
of the seismic traveltime readings to enhance their consistency prior to the inversion with the third-party
library pyGIMLi. The low-level access provided by the formikoj library aims at enabling users to implement
novel modeling, visualization and processing tools specifically designed for their objectives as well as other
geophysical methods.

1. Introduction
et al., 2010) and Pyrocko (Heimann et al., 2017) for seismological
The ability to collect spatially quasi-continuous data in a non- data. Other packages provide frameworks for the inversion and
invasive manner makes geophysical methods suitable for subsurface permit the inclusion of forward models for different geophysical
engineering and environmental investigations (e.g., Parsekian et al., methods, e.g., SimPEG (Cockett et al., 2015), Fatiando a Terra (Uieda
2015; Nguyen et al., 2018; Romero-Ruiz et al., 2018). However, the et al., 2013) or pyGIMLi (Rü cker et al., 2017).
processing of geophysical data often relies on commercial software The seismic refraction tomography (SRT) is a standard technique in
solutions and the associated licensing costs might render their use environmental and engineering investigations. Often applied together
prohibitively expensive, e.g., for academic projects or institutions as with other geophysical methods, the SRT is routinely used, e.g., in
well as for teaching in developing countries. A common limitation of permafrost studies (e.g., Draebing, 2016; Steiner et al., 2021), for the
existing software solutions refers to their specific platform requirements investigation of landfills (e.g., Nguyen et al., 2018; Steiner et al., 2022),
mainly related to the type and version of the operating system; more- or for hydrogeological characterizations (e.g., Bü cker et al., 2021).
over, the possibility to adapt the code are limited if possible at all. The market for seismic processing software has long been dominated
Considering the substantial changes regarding the market shares of op- by software packages designed for the processing of large datasets,
erating systems within the last two decades, platform-specific software e.g., associated to oil or gas exploration. Accordingly, these seismic
packages are becoming particularly obstructive for academic research processing solutions may not be suited for small-scale projects in envi-
and teaching. The increasing popularity of the Python programming ronmental and engineering studies, or for teaching activities. ReflexW
language led to the development of various cross-platform open-source overcomes such limitations by providing processing tools specifically
software packages for processing, modeling and inverting geophysical designed for near-surface investigations at substantially lower costs.
data. Available packages can focus on specific geophysical methods, for In terms of licensing costs, Stockwell (1999) went a step further by
instance, ResIPy (Blanchy et al., 2020) for electrical data, GPRPy (Plat-
making the Seismic Unix framework available entirely free of charge;
tner, 2020) for ground-penetrating radar data, or ObsPy (Beyreuther
whereas Guedes et al. (2022) recently presented RefraPy, a python

∗ Corresponding author.
E-mail address: [Link]@[Link] (M. Steiner).

[Link]
Received 23 July 2022; Received in revised form 11 March 2023; Accepted 28 March 2023
Available online 31 March 2023
0098-3004/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ([Link]
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 1. General architecture of the formikoj library comprising a utility, modeling and processing module. The base classes DataModeler and MethodManager can be used to
build tools for specific geophysical methods, e.g., seismic refraction.

processing tool for seismic refraction data. Implemented in python, Re-


fraPy is potentially suitable for cross-platform usage, yet Guedes et al. infrastructures. Accordingly, the formikoj library aims at providing a
(2022) developed and tested RefraPy solely for Windows operating transparent and customizable framework for the collaborative design
systems, and does not offer the possibility to generate synthetic seismic of reproducible workflows.
waveform data. Forward modeling is relevant for survey design, as well The comprehensive usage of clear error and log messages aims at
as for teaching and interpretation purposes, testing of research hypothe- supporting the user throughout the modeling and processing work-
ses, evaluating the accuracy of different measurement configurations or flows. A meticulous exception handling ensures that the data stored in
inversion strategies. the SQLite project database is not corrupted in case of erroneous input.
The formikoj library presented here is an open-source framework Moreover, documenting the user input and the respective responses
of the formikoj library with the python logging module provides a
for creating synthetic datasets, as well as for managing and processing
timestamped command history that further enhances the transparency
numerical and field data independently from the operating system
and repeatability of the conducted workflows.
and without licensing costs; thus, overcoming limitations in existing
We present the application of the formikoj library for the modeling
solutions. The design of the library follows the multi-method concept
and processing of seismic refraction data based on the fundamental
of pyGIMLi and SimPEG, which allows for the implementation of
use cases presented in Figs. 2 and 3, respectively. These flow charts
custom designed tools for different geophysical methods. The usage of
illustrate the corresponding workflows as well as the required interac-
transparent file formats, e.g., the unified data format (udf1 ), and data
tions between the user, the formikoj library and third-party packages.
management concepts (SQLite database) within the formikoj frame-
As can be seen from Figs. 2 and 3, the formikoj library acts as an
work facilitates a simple data exchange, for instance, between partners
interface between the user and more complex functionalities of third-
in research projects and academia, which is required to guarantee the
party libraries, such as pyGIMLi for modeling and inversion of seismic
repeatability of results and good research practices. Considering the di- refraction data. The SeismicWaveformModeler and
verse applications of the SRT method we demonstrate the applicability SeismicRe- fractionManager classes implement these
of the proposed library based on tools implemented within the formikoj fundamental use cases,
framework for the modeling and processing seismic waveform data. In yet their actual capabilities are continuously expanded, e.g., to address
particular, we present a carefully designed synthetic study highlighting specific modeling or processing requirements as well as to enhance
the capabilities of the formikoj library for the forward modeling and the user experience. To avoid redundancies in the implementation we
processing of 2D seismic refraction datasets. Using data from a real built these classes upon the functionalities of existing packages such as
3D field survey we illustrate that the formikoj library also allows for ObsPy for the processing of seismological data (Beyreuther et al., 2010)
the processing of complex survey geometries, where the obtained first and pyGIMLi for the modeling and inversion of different geophysical
break traveltimes can be inverted with third party open-source libraries data (Rü cker et al., 2017). Other important third party dependencies
to solve for 3D subsurface models expressed in terms of the seismic P- refer to NumPy (Harris et al., 2020) and Pandas (McKinney, 2010)
wave velocity. for general data handling, as well as matplotlib (Hunter, 2007) and
PyVista (Sullivan and Kaszynski, 2019) for data visualization.
2. Design and structure of the formikoj library The formikoj library is primarily developed on machines running
Linux Mint 20 or Kubuntu 22.4, respectively. Testing of the library
As illustrated in Fig. 1, the formikoj library comprises a modeling refers to carrying out the fundamental use cases for modeling and
and a processing module, which both rely on a common utilities processing of seismic refraction data presented in Figs. 2 and 3. To
module. The DataModeler and the MethodManager class provide ensure the cross-platform applicability of the formikoj library we test
the basis for adding modeling or processing functionalities for different these fundamental use cases also on machines running on Windows 10,
geophysical methods. The formikoj library is built upon a well Windows 11 as well as macOS versions 12 and 13.
thought out data management concept based on the SQLite database
engine that does not require a separate server process2 . In particular, [Link] of seismic waveform data for synthetic subsurface models
the information for each project, such as survey geometry and first
break traveltimes, are stored in a SQLite database file. Using such an The seismic refraction (SR) method exploits the ground motion
appli- cation file format facilitates the cross-platform design of the recorded by sensors installed in the surface (e.g., geophones) to char-
formikoj library, and provides fast I/O operations through concise acterize the propagation of seismic waves generated at well known
SQL queries. The portability of the application file allows for an easy locations (i.e., shot stations). The visualization of the ground motion
exchange of projects between partners across institutions relying on as function of time yields a so-called seismogram for each geophone
different IT position. The SeismicWaveformModeler class provides a flexible
way to generate synthetic seismic waveform data for P-wave
refraction modeling either in a python script, interactively in a jupyter
1
[Link] last accessed on March 11, notebook or an ipython shell. To create an instance of the class the
2023. user provides the absolute or relative path to the working directory
2
[Link] last accessed on March 11, 2023. as parameter to the constructor:

2
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 2. Fundamental use case illustrating the modeling of seismic P-wave waveform data within the formikoj ecosystem.

Fig. 3. Typical formikoj workflow for the picking of first break traveltimes from seismic waveform data.

1 """ Table 1
2 Import the SeismicWaveformModeler from the Description of the information to be provided in the columns of a measurement scheme

3
formikoj library in csv format.
4 """ Column Content Data type Description
5 from formikoj import SeismicWaveformModeler 1 x coordinate float Station x coordinate, e.g., given in (m)
6 2 y coordinate float Station y coordinate, e.g., given in (m)
7 """ 3 z coordinate float Station z coordinate, e.g., given in (m)
8
Create an instance of the 4 Geophone bool 1 in case of a receiver station, 0 otherwise

9
SeismicWaveformModeler """ 5 Shot bool 1 in case of a shot station, 0 otherwise
10 swm = SeismicWaveformModeler(’.’)
11
wavelet:
length: 1.024
frequency: 100
sampling_rate: 2000
INFO : Created instance of SeismicWaveformModeler pretrigger: 0.02

model:
In the working directory, the required input files are provided via the velmap: [[1, 750], [2, 2500], [3, 4000]]
subdirectory in, whereas the modeling output will be stored in the layers: [[1, 3], [2, 5], [3, 15]]
quality: 32
automatically created subdirectory out. area: 10
The key input file is the measurement scheme as it contains informa- smooth: [1, 10]
sec_nodes: 3
tion regarding the distribution of the shot and geophone stations. If pro-
vided in the unified data format, the measurement scheme is imported dataset:
directly into a pyGIMLi DataContainer. In case the measurement number: 1
names: [syn_data]
scheme is provided as a csv file, the SeismicWaveformModeler noise: 1
reads the information and stores it in a pyGIMLi DataContainer noise_level: 1e-4
missing_shots: 1
object. In the csv format, the measurement scheme contains a single broken_geophones: 1
line for each station in the survey layout, where a station either hosts wrong_polarity: 1
a geophone or a shot, or both (see Table 1). The values provided in traveltimes:
each line need to be separated by a unique delimiter, and the file must noise_relative: 0.
not contain a header. noise_absolute: 0.
For the modeling of the seismic waveform data, the parameters
characterizing the base wavelet, the synthetic subsurface model and the In this exemplary configuration, the first block ( wavelet) contains the
resulting waveform datasets are provided (see Table 2) in a configura- parameterization of the base wavelet, which controls the modeling of
tion file following the yaml format: the seismic waveforms (see Table 2).

3
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

The second block (model) contains information regarding the syn-


thetic subsurface model. Here, the user can define simple models with The forward modeling process generating the seismic waveform
all layers considered to be parallel to the surface topography inferred data is then invoked through the create method:
from the station geometry provided in the measurement scheme. The
seismic velocity values for the different model regions (velmap) and 10 """
Start the modeling of the seismic waveform
the corresponding thickness of the different geological units ( layers) 11
12 data
have to be explicitly defined in the configuration file. The remaining """
13
parameters (quality, area, smooth and sec_nodes) refer to the 14 swm. create( type=’ waveforms’)
properties of the mesh that is eventually used for the forward mod-
eling of the seismic waveform data and the corresponding first break
traveltimes (we refer to the respective pyGIMLi resources 3 for further INFO : Measurement scheme
information). loaded INFO : Velocity model
In the third block of the exemplary configuration file (dataset), created INFO : Wavelet created
the user can set specific names for the datasets to be created (the [++++++++++++++ 100% ++++++++++++++] 2048 of 2048 complete
number of datasets is automatically determined), or set the number of ...
datasets to be created and the dataset names are automatically gener- [++++++++++++++ 100% ++++++++++++++] 2048 of 2048 complete
ated with the prefix dataset_. The remaining parameters provided in the INFO : Dataset ’syn_data’ created
dataset block control the random error ( noise and noise_level)
as well as systematic errors (missing_shots, broken_geophones As can be seen from the log messages, the SeismicWaveform-
and wrong_polarity) in the modeled seismic waveform data (see Modeler first loads the measurement scheme into the workflow. In the
Table 2 for a detailed description). The number and position of the shot second step, the provided mesh and the seismic velocity values
and geophone stations affected by the systematic errors are randomly defined in the configuration file are combined to create the seismic
chosen with a maximum of 5 % of the total number of stations in order velocity model considered for the waveform modeling in this study (see
to avoid a high number of invalid trace data. The final block of the Fig. 4a). The model consists of a top and a bottom layer with varying
configuration file (traveltimes) contains data-error parameters for thickness characterized by seismic velocity values of 750 m s −1 and
both the absolute error (𝑒𝑎𝑏𝑠) and the relative error (𝑒𝑟𝑒𝑙) defined by the user. 4000 m s−1, respectively. In the center, the model contains an
The data error model is then estimated as irregularly shaped anomaly associated with a seismic velocity of 2500
m s−1, i.e., the model features vertical and lateral variations in the
𝒆 = 𝑒𝑎𝑏𝑠 + 𝒅 𝑒𝑟𝑒𝑙 , (1) seismic velocity distribution. The third step refers to the generation of
where 𝒅 denotes the forward modeled data not affected by noise, a Ricker wavelet through the pyGIMLi function ricker as
i.e., here the first break traveltimes obtained through the Dijkstra algo- ( 2
𝑢 = )1 − 2𝜋 2 𝑓 2 𝑡2 𝑒−𝜋2 𝑓 2 (𝑡−𝑡0 ) (3)
rithm (Dijkstra, 1959). These computed error values are subsequently
added to the data to obtain the noisy data as based on the wavelet properties provided in the configuration file.
In Eq. (3), 𝑓 is the frequency of the wavelet given in Hz, 𝑡 refers to
𝒅𝑛𝑜𝑖𝑠𝑒 = 𝒅 (1 + 𝒆) . (2)
the time base definition, i.e., the length and resolution of the wavelet
The configuration file needs to be provided via the input directory in the time domain, and 𝑡0 is the time offset of the wavelet.
from where it can be imported through the load method of the Based on the mesh, the velocity model and the Ricker wavelet we
SeismicWaveformModeler: use pyGIMLi to solve the pressure wave equation for each shot station
defined in the measurement scheme. The resultant seismograms are
6 # Load and parse the configuration file extracted at the corresponding geophone stations, gathered for each
7 swm. load( type=’ config’) shot position in an ObsPy Stream object and saved in the output
directory (out ) as shown here:

INFO : Configuration loaded working_directory


in
To demonstrate the ability to model seismic waveform data for out
arbitrary subsurface conditions we do not define a simple subsurface syn_data
model in the configuration file but provide a more complex model data
in the binary mesh format (e.g., a bms file). We prepare the model [Link]
and the corresponding forward modeling mesh based on the mesh station_coords.csv
tools provided by pyGIMLi and save the mesh in the bms format (a Shot_1001.syn
commented version of the corresponding python script can be found in ...
Shot_10nn.syn
the Appendix). Similar to the configuration file, a bms file stored in the syn_data_tt.pck
input directory can be imported into the workflow through the load
method as follows: [Link]

8 # Load the mesh into the workflow In particular, the subdirectory data contains a separate file for each
9 swm. load( type=’ mesh’) shot position in the miniseed format (Ahern et al., 2012; Ringler and
Evans, 2015), where the file extension syn indicates that the shot files
contain forward modeled seismic waveform data. The measurement
INFO : Mesh loaded protocol ([Link]) and the station coordinates provided as a csv file
(station_coords.csv) are also stored in this directory. The header of the
3
measurement protocol contains the survey parameters, e.g., sampling
[Link]
rate, recording length, number of geophones and geophone spacing.
html#[Link], last accessed March 11, 2023.
Moreover, the protocol associates each shot file of the dataset with a

4
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Table 2
Description of the parameters, which can be defined in a configuration file used for the modeling of the synthetic seismic data.

Parameter Unit/Data type Description


wavelet
length s Length of
length of the
the synthetic
base wavelet, which
seismic also defines
waveform data the

frequency Hz Frequency of the base wavelet


sampling_rate Hz Defines temporal resolution of the seismic waveforms
pretrigger s Add buffer to the seismic waveforms before the onset
of the actual data

dataset
number
names int (string)
list Numberofofthe
Names datasets to be created
datasets
noise bool 1 in case noise should be added to the synthetic
waveforms, 0 otherwise
noise_level – Level of the seismic background noise
missing_shots bool 1 in case the datasets should be affected by missing
shot files, 0 otherwise
broken_geophones bool 1 in case the datasets should comprise broken
geophones (i.e., no data in the corresponding
seismograms), 0 otherwise
wrong_polarity bool 1 in case the datasets should contain traces with
inverse polarity, 0 otherwise
model
velmap list For each layer the first value defines the marker and
(float/int) the second one the seismic velocity within the layer
layers list For each layer the first value defines the marker and
(float/int) the second one the thickness

traveltimes
noise_relative %/100 Relative noise to be added to the forward modeled
seismic traveltimes
noise_absolute s Absolute noise to be added to the forward modeled
seismic traveltimes

specific location in the survey geometry with respect to the geophone


positions, e.g.: along the entire profile, yet receivers 3 and 45 were modeled as
broken geophones, i.e., the corresponding seismograms do not contain
############################# a signal. Crosses overlaid on the valid seismograms at the respective
Line: SYN_syn_data first onset indicate the first break traveltimes 𝑡𝑡 between the shot and the
Sampling rate: 2000 Hz
Recording length: 0.512 receiver. In addition to the missing first break traveltimes for the broken
s Number of geophones: receivers, we manually added a systematic error, i.e., we set an erroneous
48 Geophone spacing: 2 m traveltime for receiver 36, to demonstrate how such outliers can be
#############################
identified in a so-called pseudosection.
File number |
Station 1001 Pseudosections are a useful tool for the analysis of the data quality
| G001 by presenting the data based on the positions of shots and receivers.
: | :
Such a visualization allows for the detection of outliers and their spatial
1048 | G048
distribution as necessary for the understanding of possible sources of
error. In case of the SRT, a pseudosection as presented in Fig. 4c,
The auxiliary file [Link] exported to the dataset directory summa- illustrates apparent velocity (𝑣𝑎𝑝𝑝) values computed as
rizes the parameters from the configuration file as well as informa-
tion regarding the simulated systematic errors in the synthetic seismic 𝑎𝑜𝑓 𝑓 𝑠𝑒𝑡
𝑣𝑎𝑝𝑝 = , (4)
waveform data, e.g.: 𝑡𝑡
where 𝑎𝑜𝑓 𝑓 𝑠𝑒𝑡 refers to the absolute offset between shot and receiver. The
Number of geophones: 48 𝑣𝑎𝑝𝑝 values are plotted at pseudolocations, where the location along
Number of shots: 48 profile direction is defined by the midpoint of the corresponding shot-
Recording length (s):
0.512 receiver pair and the pseudodepth is computed as 1∕3 of 𝑎𝑜𝑓 𝑓 𝑠𝑒𝑡.
Sampling frequency (Hz):
2000 Wavelet type: Ricker As demonstrated in Fig. 4c, a pseudosection allows for the identifi-
Frequency of the wavelet (Hz): 100 cation of missing data (e.g., receiver 3 and 45) as well as systematic
errors or outliers, i.e., velocities erroneously influenced by a single
Missing shot(s): 11
Broken geophone(s): 13 shot or receiver (see receiver 36). The main assumption here is that
Wrong polarity geophone(s): 26 the pseudosection should reveal smooth transitions between lateral
and vertical neighbors, considering that the data were collected with
Fig. 4b presents the seismic waveform data forward modeled for a gradual changes in the position of the source (i.e., hammer blow)
shot point located in the center of the profile. The seismograms are and the receiver (i.e., geophone). The pseudosection will reveal large
shown as curves, whereas the strength of the amplitudes is added as variations in case of abrupt changes in the topography or the
color-coded information. In the seismograms we see clear first onsets geometry of the array, yet this can be taken into account by the user
during the identification of outliers and possible systematic errors.

5
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 4. Synthetic seismic data generated with the SeismicWaveformModeler:


(a) Two-layered subsurface model without topography characterized by a distinct anomaly in the center. Triangles along the surface indicate the positions of geophones.
(b) Synthetic seismic waveform data computed from the subsurface model for a shot point (star-shaped symbol) located in the center of the profile.
(c) Pseudosection illustrating the apparent velocity computed from the seismic traveltimes based on the survey geometry.
(d) Subsurface model of the seismic P-wave velocity distribution obtained through the inversion of the synthetic P-wave traveltimes. Solid lines indicate the geometry of the
synthetic subsurface units.

Based on pseudosections erroneous traveltimes can be identified


and subsequently removed or corrected, which is a critical processing given as:
( )
step prior to the inversion of the data. The inversion of first break ∥ 𝑾 𝑑 ( (𝒎) − 𝒅) ∥22 +𝜆 ∥ 𝑾 𝑚 𝒎 − 𝒎0 ∥22→ min (5)
traveltimes aims at resolving a model of the P-wave velocity of the
The first term on the right-hand side of Eq. (5) refers to the data misfit.
subsurface materials, where systematic errors in the data would lead
𝑾 𝑑 is the data weighting matrix holding the reciprocals of the data
to the creation of artifacts in the imaging results or hinder the conver-
errors, 𝒅 denotes the data vector holding the input data (first break
gence of the inversion. Considering the multitude of available inversion
traveltimes), 𝒎 is the model vector representing the target parameter
frameworks we do not include a new inversion strategy in the formikoj
library. Instead the processed data can be exported in any format (seismic P-wave velocity values), and  (𝒎) is the model response. The
second term denotes represents the regularization, where 𝑾 𝑚 and
required for the use of commercial inversion algorithms, or can be
𝒎0 are the model constraint matrix and the reference model, respectively.
linked directly to other open-source libraries. In our case, we refer
The regularization parameter 𝜆 balances the influence of the data misfit
to the modeling and inversion capabilities of pyGIMLi (Rü cker et al.,
and the regularization term on the inversion process.
2017). pyGIMLi uses a generalized Gauss–Newton method to solve the
The inversion of the synthetic first break traveltimes resolves the
inversion problem through the minimization of an objective function
subsurface model presented in Fig. 4d, which is given in terms of
the spatial variations of the seismic P-wave velocity, i.e., reflecting
lateral and vertical variations. Depending on the survey geometry the

6
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Table 3
Description of the information to be provided in the columns of the geometry file.
Col Content Data type Description
1 x coordinate float Station x coordinate, e.g., given in (m)
2 y coordinate float Station y coordinate, e.g., given in (m)
3 z coordinate float Station z coordinate, e.g., given in (m)
4 Geophone bool 1 if a geophone was deployed at station, 0 otherwise
5 Shot int The numerical part of the file name, e.g., 1001, if a shot was conducted
at this station, −1 otherwise
6 First geophone int First active geophone (−1 if no shot was conducted at the station)
7 Number of geophones int Number of active geophones (−1 if no shot was conducted at the station)

Table 4 as the TravelTimeManager provides an inversion framework for


Excerpt from the geometry file describing the survey geometry of the synthetic dataset
generated in this study. first break traveltimes, yet not a framework for the processing of seis-
x (m) y (m) z (m) Geo Shot 1st Geo # Geo
mic waveform data. For the demonstration of the fundamental Seis-
micRefractionManager capabilities we consider the synthetic seis-
0.0 0.0 0.0 1 1001 1 48
2.0 0.0 0.0 1 1002 1 48
mic data created with the SeismicWaveformModeler above, which
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
illustrates that both classes can be combined in subsequent workflows.
20.0 0.0 0.0 1 −1 1 48
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
2.2.1. Compiling the survey information and creating a project
24.0 0.0 0.0 0 1013 1 48
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ To create a new or load an existing SeismicRefractionMan-
92.0 0.0 0.0 1 1047 1 48 ager project, the working directory needs to contain specific subdirec-
94.0 0.0 0.0 1 1048 1 48 tories:

working_directory
01_data
inversion can resolve subsurface models in both 2D or 3D. To aid in raw
the evaluation of this inversion result we superimposed the known 02_geom
inter- faces between the different subsurface units of the synthetic 03_proc
model. As can be seen from this plot, the imaging result resolves the
fundamental structural features and reflects the P-wave velocity In this directory structure, the seismic shot files are stored in 01_data/raw
distribution of the synthetic subsurface model. Deviations from the and the geometry file ([Link]) is provided in
true velocity model are due to the smoothness-constraint inversion 02_geom.
scheme applied by pyGIMLi. To obtain sharper contrasts between the The geometry file is a csv file that provides an abstract repre-
different subsurface units in the imaging result structural constraints sentation of the survey layout and must not contain a header. The
could be incorporated in the inversion, e.g., as demonstrated by fundamental element for the description of the survey layout is the
Steiner et al. (2021); yet, the exploration of different inversion station, which refers either to a geophone position, a shot position
strategies is beyond the scope of this study. We consider Fig. 4 to or a position with co-located shot and geophone. For each station the
demonstrate the applicability of the SeismicWaveformModeler geometric and semantic information described in Table 3 are
class for generating synthetic seismic waveform data to be used in provided column-wise, i.e., each line in the geometry file corresponds
numerical P-wave refraction seismic investigations. to a single station with a unique position within the survey layout.
For the synthetic dataset considered in this study, the 2D station
2.2. Processing of seismic refraction datasets coordinates are provided in the geometry file, whereas the z coordinate
is assumed to be 0 m along the entire profile, as illustrated in Table 4;
The SR method measures the traveltimes of seismic waves based thus, reflecting the lack of surface topography in the synthetic model
on the first onset of the seismic energy recorded by geophones. In the (see Fig. 4a). In the column Geo the geometry file contains the value
SRT, the inversion of traveltimes gathered from tens to thousands of 1 (True) for each station except the station located at 24 m referring to
seismograms permits the computation of variations in the subsurface the broken geophone modeled by the SeismicWaveformModeler.
seismic velocities in an imaging framework. Measuring the traveltimes When compiling the geometry file we also need to take into account the
in seismograms (i.e., first break picking) is commonly done manually missing shot at station 11, i.e., the station located at 20 m along profile
(or semi-automatically) in an iterative process. The signal-to-noise ratio direction, and set the corresponding value to −1. The column 1st Geo
in the recorded seismograms substantially influences the quality of the indicates the first active geophone along the profile for each shot file.
traveltimes picked in the seismograms, and thus the seismic velocity For the synthetic dataset considered here this refers to the geophone
model obtained through the inversion. Accordingly, a proper enhance- deployed at the first station of the profile. In particular, the column
ment of the perceptibility of the first onsets is crucial for the first break 1st Geo allows the reproduction of roll-along survey geometries, where
picking. in the first segment the first active geophone is always the geophone at
The SeismicRefractionManager class provides functionali- station 1. Considering 48 geophones deployed in each segment, and an
ties that permit the processing of seismic waveform data for a refraction overlap of 50 %, the first geophone for the consecutive segments would
seismic analysis. These functionalities involve the reading of seismic be 25, 49, 73, 97, and so on.
waveform data, combining the data with the survey geometry informa- An instance of the SeismicRefractionManager can be cre-
tion, processing of the waveforms as well as the picking of first break ated by providing the path to the working directory as parameter to
traveltimes. Since the first break picking is a highly interactive process, the constructor. Depending on the content of the working directory,
the SeismicRefractionManager is designed primarily for usage the SeismicRefractionManager automatically decides whether
from within an ipython shell. (i) to start in the data preview mode (first break picking not possi-
The first break traveltimes determined with the ble), (ii) create a new project, or (iii) load an existing project from
SeismicRefrac- tionManager represent the input data, e.g., for disk. In case the shot files as well as the geometry file are provided
the TravelTimeM- anager of pyGIMLi (Rü cker et al., 2017). This
is particularly relevant

7
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 5. Entity-relationship diagram illustrating the SQLite database used in a SeismicRefractionManager project to store an manage processing related information.

and a basic sanity check of the geometry file was successful, the
SeismicRefractionManager creates a new project: 6 # Select traces with common absolute offset
7 srm. select(by=’ aoffset’, num=6)
1 """
2 Import the SeismicRefractionManager from the INFO : 88 traces selected
3 formikoj library
4 """ 8 # Select traces with receiver
5 from formikoj import SeismicRefractionManager 9 srm. select(by=’rin’, num=10)
6
7 """
8
Create an instance of the INFO : 48 traces selected
9
SeismicRefractionManager
""" 10 # Select traces with receiver
10
srm = SeismicRefractionManager(’.’)
11 11 srm. select(by=’sin’, num=23)

INFO : 48 traces selected


INFO : Read geometry information from
file INFO : Extracted shot geometry Fig. 7 shows the frequency spectrum of the currently selected traces,
INFO : Extracted receiver which can be computed and visualized through the plot method:
geometry INFO : Applied geometry
INFO : Standard pickset ’picks’ created
INFO : Pickset ’picks’ loaded 12 # Plot the frequency spectrum
INFO : ’picks’ set as active pickset 13 srm. plot( type=’ spectrum’)
Progress <==============================> 100.0% completed
INFO : Read 48 files
The SeismicRefractionManager resolves the frequency spec-
In a first step, the SeismicRefractionManager creates an SQLite trum by computing the stacking the power spectral density (𝑝𝑠𝑑) of the
seismograms 𝑠 (𝑡) as
database [Link] in the working directory based on the entity-relationship ( )
diagram shown in Fig. 5. The geometry information is then read 𝑝𝑠𝑑 = 10 log10 |fft (𝑠 (𝑡))|2
� , (6)
from the geometry file and stored in the database table geometry with �
where fft refers to the fast fourier transformation (FFT) and 𝑁 is
consecutively numbered stations (see Fig. 6). To allow for an effi-
the number of the considered seismograms. The frequency spectrum
cient data selection for the user the SeismicRefractionManager
provides information regarding the amplitudes associated to different
creates the database tables shots and receivers, which link the station
frequencies in the seismograms, which can be used for the identifica-
numbers to shot index numbers (SIN) and receiver index numbers
tion of the dominating frequencies as required for the selection and
(RIN), respectively. For each shot-receiver pair the corresponding SIN
application of adequate filters such as lowpass, highpass, bandpass and
and RIN are stored in the table applied_geometry together with the
bandstop filters.
absolute offset and midpoint between these stations, i.e., after this step
To enhance the signal-to-noise ratio of the seismograms the user can
is completed the geometry is applied. In the last step, the database table
apply the respective filters through the filter method, which
fbpicks is created, which stores the first break traveltimes for each SIN- utilizes
RIN pair together with the name of the corresponding pickset, i.e., a the frequency filters implemented in the ObsPy package (Beyreuther
common label for an entire set of first break traveltimes. By default, et al., 2010), e.g.:
each project contains the default pickset ‘picks’, which is loaded and
activated on startup. Once the database is initialized, the waveform 14 # Apply bandpass filter
data are read from disk and the project is ready for processing. 15 srm. filter( type=’ bandpass’, freqmin=10 ,
freqmax=120)
2.2.2. Selecting and visualizing seismic waveform data
Once the geometry is applied, the select method of the INFO : Applied bandpass filter (10.0 to 120.0 Hz)
SeismicRefractionManager allows to gather the seismic wave-
form data based on a common absolute offset ( aoffset), a common In this example, the filter is solely applied to the currently selected
RIN (rin), or a common SIN (sin): traces, yet setting the parameter onhold as True automatically filters
all subsequently selected traces with the same filter settings, e.g.:

8
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 6. The SeismicRefractionManager addresses the stations through consecutive station numbers based on the sort order in the geometry file. The shot index numbers
(SIN) and receiver index numbers (RIN) are assigned to the shot and receiver stations separately to allow for an intuitive data handling for the user.

Fig. 7. The frequency spectrum illustrates the frequency content of the currently selected traces, which allows for the identification of frequency ranges associated to noise allowing
for the definition of corresponding frequency filter settings.

16 # Apply lowpass filter visualization of the seismograms, e.g., associated to broken geophones
17 srm. filter( type=’ lowpass’, freq=200 , or seismograms with wrong polarity. The ‘m’ key activates the trace
18 onhold=True) mute mode (‘Trc mute’), which allows to set the amplitude of a trace
to zero by clicking with the left mouse button; clicking again on the
same trace restores the amplitude information. The trace reverse mode
INFO : Applied 200.0 Hz lowpass filter
INFO : Set filter hold on (‘Trc rev’) is activated by pressing the ‘r’ key and enables the user to
toggle the polarity of a trace by clicking on it with the left mouse
button.
The currently selected traces can be visualized by calling the plot
The default data scaling mode is ‘Zoom’, which allows the scaling
method without passing any parameter:
of the 𝑦-axis by turning the mouse wheel. By pressing the ‘a’ key the
amplitude scaling mode (‘Amp scal’) is activated. Turning the mouse
18 # Plot selected traces
wheel increases or decreases the amplitudes of the traces currently
19 srm. plot()
shown in the seismogram plot, and thus might help to enhance the
perceptibility of the first onsets. By pressing the key of the currently
By default, the seismogram plot visualizes the seismic waveforms in a active mode again, the SeismicRefractionManager returns to the
combination of wiggle trace and variable area mode, i.e., the trace data default mode; yet, the different modes can be activated in any arbitrary
are shown as curves. The area of the curves is colored red for negative order (as illustrated in Fig. 9).
and blue for positive amplitudes, respectively (not shown for brevity).
Pressing the up or down arrow key on the keyboard toggles between 2.2.3. Analysis of the seismic waveforms and first break traveltime picking
variable area and variable density mode, with the latter reflecting the In the seismogram plot, the waveforms can be analyzed and pro-
strength of the amplitudes by the color saturation, i.e., high amplitudes cessed to obtain information about the subsurface conditions. Activat-
refer to a stronger shade than low amplitudes (see Fig. 8). ing the velocity estimation mode (‘Vel est’) by pressing the ‘v’ key on
The currently activated processing mode and data scaling mode are the keyboard, enables the user to estimate velocities for different wave
reported in the status bar of the seismogram plot window together with
phases (e.g., originating from a refractor) by pressing the left mouse
the traveltime at the current cursor position (see Fig. 9). The initial
button and moving the cursor. Once the left mouse button is released,
processing mode is ‘Fb pick’, i.e., first break picking is possible. Addi-
a line between the start and end point is drawn and labeled with the
tional modes, accessible through the keyboard, allow for an enhanced
corresponding velocity (estimates can be deleted by clicking with the
right mouse button).

9
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 8. The seismogram plot presents the currently selected traces along the 𝑥 axis, where the sort order is determined through the geometry. The corresponding trace data are
illustrated as a function of time along the 𝑦 axis (solid curves), with positive and negative amplitudes depicted in blue and red, respectively. The selection criterion and the applied
filter are shown in the upper left corner of the plot. Green crosses refer to the picked traveltimes stored in the currently active pickset.

measurements, which are commonly associated to traveltimes with


substantial deviations from those observed at adjacent stations, e.g.,
the first break pick for the SIN-RIN pair (23, 11). Outliers can be
removed by clicking on the respective data point in the traveltime
diagram, which is instantly synchronized with the project database. If
the seismogram plot and the traveltime diagram are used side-by-side,
changes made to the first break picks in one window will interactively
trigger an update of the other one and vice versa. Once the user is
satisfied with the quality of the first break traveltimes, the data points
of the pickset can be exported in the unified data format, which is
compatible with the pyGIMLi framework. In particular, the udf file is
stored in the subdirectory 03_proc/picks with the current timestamp as
file name suffix:

22 # Export first break traveltimes


srm. picksets(do=’ export’, name=’ pick’)
Fig. 9. The status bar in the interactive seismogram plot displays the currently active 23
processing and data scaling modes as well as the time (in seconds) at the current cursor
position. By pressing the keys ‘a’, ‘m’, ‘r’, ‘v’ on the keyboard the different modes can
be activated. INFO : Pickset ’pick’ saved to pick_20230303T140447.pck’

2.3. Expanding the capabilities of the formikoj library

If the processing mode ‘Fb pick’ is activated, picking of first break Making the formikoj library publicly available under an open-source
traveltimes is done individually by clicking with the left mouse button license allows the addition of supplementary functionalities tailored for
on the respective trace. Clicking again on the same trace will set the the specific requirements of the users. The concept of formikoj suggest
first break pick to the new location as there can only be one traveltime
that such custom extensions should be implemented either as internal
for each SIN-RIN pair; whereas clicking with the right mouse button
methods or as functions in the utilities module, which are then executed
deletes the pick. Alternatively, first break picks can be set for multiple
through the compute method.
traces by pressing the left mouse button and moving the cursor. Once
We illustrate this possibility for customization by implementing
the left mouse button is released, first break picks are defined at
a simplified version of an automatic first break picking algorithm
the intersections between the line and the seismograms. In the same
based on the short- and long-time window average ratio (STA/LTA)
way, multiple picks can be deleted if a line is drawn with the right
method (Allen, 1978), which determines the traveltimes following the
mouse button pressed. The first break traveltimes determined in the
energy ratio approach (e.g., Earle and Shearer, 1994). In particular,
seismogram plot are automatically written to the project database when
our implementation computes the envelope 𝐸 (𝑡) of the seismogram 𝑠 (𝑡)
the window is closed or another set of traces is loaded by pressing the
as (e.g., Duan and Zhang, 2020)
‘left’ or ‘right’ arrow keys on the keyboard. (
The traveltime diagram for the currently active pickset can be 𝐸 (𝑡) = )𝑠 (𝑡)2 + 𝑠̃ (𝑡) 1∕2 , (7)
created through the plot method:
where 𝑠̃ (𝑡) is the Hilbert transform of 𝑠 (𝑡). The energy ratio 𝐸 𝑅 is then
computed as 𝐸 𝑅 = 𝑆 𝑇 𝐴∕𝐿𝑇 𝐴 with
1 ∑𝑖
20
# Plot traveltime diagram 𝑆 𝑇 𝐴 (𝑖) = 𝐸 (𝑗) , (8)
21 srm. plot( type=’ traveltimes’) 𝑛𝑆𝑇 𝐴 𝑗=𝑖−
𝑆𝑇 𝐴

Fig. 10 presents an exemplary traveltime diagram, which is a common and


𝑖
way to examine the quality of the first break picking. Such an illustra- ∑
1 𝐸 (𝑗) , (9)
𝐿𝑇 𝐴 (𝑖) = 𝑛𝐿𝑇 𝐴
tion of the traveltimes can be used to identify outliers or erroneous 𝑗 =𝑖−𝑛𝑆 𝑇

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 10. The traveltime diagram shows the first break traveltimes stored in the currently active pickset along the 𝑦 axis (x symbols), where solid lines connect traveltimes assigned
to a common shot. The sort order of the stations along the 𝑥 axis reflects the geometry. Filled circles indicate stations with co-located shot and geophone (receiver), triangles refer
to receiver stations (no shot) and stars refer to shot stations (no geophone).

where 𝑛𝑆 𝑇 𝐴 and 𝑛𝐿𝑇 𝐴 refer to the number of data points in 𝐸


(𝑡) considered for the short- and long-time windows, respectively. For this Table 5
exemplary implementation we determine the first break traveltimes in Excerpt from the 3D survey geometry file.
the seismograms as the position of the maximum in the 𝐸𝑅 function, x (m) y (m) z (m) Geo Shot 1st Geo # Geo
which is automatically saved in the project database. 40 336.056 292 470.692 120.0 1 1001 1 96
The autopicking algorithm is added to the 40 334.751 292 469.177 120.0 1 1002 1 96
SeismicRefraction- Manager in form of two internal methods ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
_manage_autopicking and _compute_autopicks, respectively. 40 284.472 292 455.431 120.0 1 1058 1 96
To invoke the autopicking process we modified the compute method, 40 285.896 292 454.027 120.0 1 1059 1 96
which now accepts the custom-defined keyword autopicking. ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
Additionally, the autopicking requires values to be passed for the 40 339.421 292 402.681 120.0 1 1096 1 96
parameters pick and pickset. The former accepts the values all 40 305.228 292 445.050 120.0 0 1097 1 96
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
or cur to determine first break
40 265.415 292 431.957 120.0 0 1115 1 96
traveltimes for all traces in the dataset or solely the currently selected 40 255.453 292 431.395 120.0 0 1116 1 96
traces, respectively. The latter defines the name of the pickset in
which the automatically determined traveltimes should be saved to. A
typical use case involves conducting the autopicking and exporting the
determined traveltimes: suggesting that the implemented autopicking algorithm performs well
for the synthetic seismic waveform data. However, the observed
24 # Automatically pick first break traveltimes devia- tion from the perfect correlation (i.e., the 1:1 line in Fig. 11)
25 srm. compute(do=’ autopicking’, pick=’all’,
pickset=’ autopicks’) indicates that autopicking and forward modeling algorithm might be
26 srm. picksets(do=’ export’, name=’ autopicks’) sensitive to different seismic wave phases. Further improvements in
terms of the autopicking process could incorporate, for example,
machine learning approaches as the method proposed by Duan and
INFO : Created new pickset Zhang (2020), which can be easily implemented as an additional
’autopicks’ INFO : Pickset functionality in the Seis- micRefractionManager. We point out
’autopicks’ loaded here, that following the same procedure, users can implement further
INFO : ’autopicks’ set at active pickset autopicking algorithms as well as other data processing strategies and
Progress <==============================> 100.0% completed have a simple framework for the evaluation of the performance
INFO : Pickset ’autopicks’ saved to through the analysis of synthetic data (e.g., clean and contaminated
autopicks_20230303T140834.pck
with Gaussian noise).

In Fig. 11, we compare the automatically determined first break 3. Application to field data: Processing a 3D seismic refraction
picks with the forward modeled traveltimes computed above to allow dataset
for a basic evaluation of autopicking performance. The inset plot in
Fig. 11 presents the histogram of the autopicked traveltimes, which To demonstrate the applicability of the SeismicRefraction-
shows that three traveltimes should be considered outliers, and thus Manager for the processing of real field data, we present the analysis
are removed from the dataset. After the outlier removal the correlation and inversion results for a seismic refraction survey conducted in a
coefficient between forward modeled and autopicked traveltimes is 0.99 soda lake located in the national park Neusiedler See–Seewinkel close

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)
to Vienna. The seismic survey aims at solving for the geometry of
the

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 11. Comparison of forward modeled synthetic and automatically picked first break traveltimes for the synthetic seismic waveform data created in this study.

confined aquifer below this soda lake with the required information
referring to the thickness of the aquifer and the depth of the groundwa- Due to the 3D survey geometry, a 2D pseudosection is not suitable
ter table. As shown in Fig. 12, the seismic data were collected with 48 for assessing the quality of the first break traveltimes. Since the
geophones deployed along the North-East to South-West oriented line, Seis- micRefractionManager project is configured for 3D
and 48 geophones along the North-West to South-East oriented line, processing the apparent velocity values are illustrated in an interactive
with a spacing of 2 m between the geophones. Shots were generated 3D pseudosec- tion. This plot can be rotated and the image section
with an 8 kg sledgehammer at the geophone positions as well as at can be zoomed and panned allowing the user to easily investigate the
positions along the diagonals to obtain a sufficient coverage. data quality for 3D geometries. Fig. 14 shows a screenshot of the
As in case of the synthetic dataset presented above, the geometry 3D pseudosection for the salt lake dataset; yet, such a screenshot
file contains the coordinates of the survey stations, yet for the soda cannot reveal the full capabilities implemented in the
lake dataset 3D coordinates we provided as illustrated in Table 5. Since SeismicRefractionManger for the interactive analysis and
geophones were not deployed at each survey station the column Geo visualization of 3D pseudosections.
contains the value 1 (True) for the first 96 stations and 0 (False) for To review the data quality for the entire dataset it is possible
the remaining 20 stations. Based on the shot files and the geometry to visualize the picking percentage, i.e., the ratio of actually picked
file stored in the required directory structure the SeismicRefrac- traveltimes and total number of SIN-RIN pairs:
tionManager creates the project database, automatically infers the 10 # Plot the picking percentage
3D survey layout from the 3D survey geometry, and thus configures 11 srm. plot( type=’ pickperc’)
the
project for 3D processing. Then we can select seismograms the same Fig. 15 presents the picking percentage visualized separately for each
way as for the synthetic data presented above. For this example we SIN. Accordingly, a low picking percentage indicates shots affected
select seismic waveform data recorded for SIN 1, i.e., the shot position co- by a low signal-to-noise ratio, whereas clear first onsets yield a cor-
located with the first geophone (Station 01 in Fig. 12): respondingly high picking percentage. For the soda lake dataset we
observe a consistently high picking percentage for all shot positions;
10 # Select traces with receiver thus, indicating a good data quality. Moreover, the picking percentage
11 srm. select(by=’sin’, num=1) plot can be used to track the picking progress, for instance, in case the
traveltimes cannot be determined in frame of one session or to identify
single shots that might have been forgotten during the first break
INFO : 96 traces selected picking. Accordingly, it is advisable to check the picking percentage
prior to exporting the traveltimes for the inversion.
In Fig. 13, we can see that the data for RIN 1 to 48 appear familiar
Once the first break picking is finished, the corresponding pickset
as the corresponding SIN-RIN layout refers to the one of conventional
can be exported for the inversion. We inverted the first break trav-
2D profiles; whereas the seismic waveforms for RIN 49 to 96 show eltimes with pyGIMLi and present the resolved 3D subsurface model
an entirely different pattern. To understand this visualization, we need in Fig. 16a. From this representation we can see, that the inversion
to take into account that RIN 49 to 96 are deployed perpendicular to
solves for low seismic velocities (600 to 800 m s−1) in the near-surface
the direction of propagation of the wavefront originating from SIN 1.
in the center of the investigated area, which corresponds to the still
Accordingly, the observed curvature in the first onsets is due to the
intact part of the soda lake, i.e., the part that is covered by water
varying offset of the different SIN-RIN pairs.
on a seasonal basis. Seismic velocity values above 800 m s−1 are

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)
resolved at depth as well as outside of the soda lake. To facilitate
the

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 12. The soda lakes field dataset was collected in a 3D survey layout with stations (co-located geophones and shots indicated by filled dots) deployed in form of a cross.
Additional shots (yellow stars) were conducted in from of a cross rotated by 45◦ to increase the coverage of the dataset.

Fig. 13. Examplary seismic waveforms from the soda lakes dataset shown for shot index number (SIN) 1 with a 100 Hz lowpass filter applied to suppress high frequency noise. The
recorded seismic waveforms clearly reflect the geometry of the geophones with RIN 1 to 48 deployed along the direction of wave propagation, while RIN 49 to 96 are deployed
perpendicular to the propagating wavefront.

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 14. 3D pseudosection showing the apparent seismic velocities determined from the first break traveltimes obtained from the soda lakes dataset and the corresponding absolute
offset between the shot and receiver stations. The apparent velocity for each shot-receiver pair is illustrated at the corresponding 2D midpoint and pseudodepth (1/3 of the absolute
offset), thus yielding a 3D representation.

Fig. 15. Picking percentage for each shot in the soda lakes dataset indicating a high signal-to-noise ratio allowing for the picking of first break traveltimes for nearly all shot-receiver
pairs.

interpretation of the resolved seismic velocity distribution in terms of


the aquifer geometry we show the two vertical slices in Fig. 16b and c, We demonstrated the capabilities of the formikoj framework re-
respectively. In particular, we relate seismic velocity values >800 m s−1 garding the development of versatile and easily scalable classes for
to the transition from the unsaturated to the saturated zone due to the the modeling and processing of waveforms in seismic refraction sur-
veys. Standardizing the data input in combination with a thorough
higher seismic velocity of water (≈1500 m s−1) compared to air (≈340
sanity check aims at reducing the risk of corrupting the information
m s−1) filling the pore space. Accordingly, our 3D subsurface model
stored in the project database. Moreover, crucial processing steps are
indicates the bottom of the aquifer at a depth of approximately 8 m. A
automatized within the SeismicWaveformModeler and the Seis-
more detailed interpretation is beyond the scope of this manuscript, yet
micRefractionManager classes facilitated by efficient data input
the presented figures reveal the capabilities provided by the proposed
strategies, for instance the preparation and import of the geometry
framework for the visualization and processing of data collected in 3D
file or the keyboard-based interaction related to the first break
survey geometries.
picking. In this regard, the user controls the formikoj library by
providing text- based commands preferably through an ipython shell
4. Conclusions and outlook to exploit the full interactive potential modeling and processing tools.
However, applica- tions of the formikoj library can also be
We have presented formikoj, a flexible open-source library enabling automatized by implementing workflows in python scripts or jupyter
the development of modeling and processing tools for geophysical data. notebooks.
Implemented in python and tested on all major operating systems
By making the source code of the formikoj library available under
(Linux/Unix, MacOS, Windows), formikoj is suitable for multi- and cross-
the MIT license we intend to spark the development of further modeling
platform applications; thus, allowing collaboration between users
and processing tools for various geophysical models based on this
indenpendent from licensing costs and platform requirements.
framework. Our further efforts will focus on implementing tools for

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)
other wave-based geophysical methods used in frame of our
research

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

Fig. 16. Subsurface model of the soda lake in terms of the seismic P-wave velocity. (a) 3D representation of orthogonal slices through the resolved subsurface model. (b) 2D
representation of the NE–SW slice. (c) 2D representation of the NW–SE slice.

activities, such as multi-channel analysis of (seismic) surface waves or


magnetotelluric surveys. The authors are grateful to Nathalie Roser and Lukas Aigner for us-
ing and testing the formikoj library in frame of their research activities,
CRediT authorship contribution statement and for providing valuable suggestions for improvement. Furthermore,
we would like to thank Clemens Moser, Martin Mayr, Vinzenz Schichl
Matthias Steiner: Conceptualization and implementation of the and Harald Pammer for their constructive comments during first tests
library, Creating the figures, Preparation of the manuscript. Adrián of the formikoj framework as well as for their help during the seismic
Flores Orozco: Conceptualization of the library, Preparation of the surveys.
manuscript.
Appendix. Source code for generating the subsurface model con-
Declaration of competing interest sidered in this study

The authors declare that they have no known competing finan-


cial interests or personal relationships that could have appeared to
influence the work reported in this paper. 1 # Import required packages import numpy as np
2 import pygimli as pg
Data availability 3 import pygimli. meshtools as mt
4
5
Data are available in the git repository # Create the model geometry # - top layer
6
top = mt. createPolygon( [[0 , 0], [94 , 0],
7
Acknowledgments 8
[94 , -3.5], [72 , -3.5],
9
[20 , -2], [0, -2]],
We sincerely appreciate the constructive review of Editor-in-Chief 10
isClosed=True , marker=1, area=0.1)
Dario Grana and two anonymous reviewers, which helped to consider- 11
# - bottom layer
ably enhance the quality of the manuscript. 12
bottom = mt. createPolygon(
The authors acknowledge TU Wien Bibliothek for financial support 13

through its Open Access Funding Programme. 14


15

1
M. Steiner and A. Flores Computers and Geosciences 176 (2023)

16 [[0 , -2], [20 , -2], Dijkstra, E.W., 1959. A note on two problems in connexion with graphs. Numer. Math.
17 [22 , -6], [70 , -6], 269–271. [Link]
18 [72 , -3.5], [94 , -3.5], Draebing, D., 2016. Application of refraction seismics in alpine permafrost studies: A
19 [94 , -10], [0, -10]], review. Earth-Sci. Rev. 155, 136–152. [Link]
20 isClosed=True , marker=3, area=0.1) 02.006.
21 Duan, X., Zhang, J., 2020. Multitrace first-break picking using an integrated seismic
22 # - anomaly/ infill and machine learning methodpicking based on machine learning. Geophysics 85
23 infill = mt. (4), WA269–WA277. [Link]

24 createPolygon( [[20 , Earle, P.S., Shearer, P.M., 1994. Characterization of global seismograms using an

25 -2], [72 , -3.5], automatic-picking algorithm. Bull. Seismol. Soc. Am. 84 (2), 366–376.

26
[70 , -6], [22 , -6]], Guedes, V.J.C.B., Maciel, S.T.R., Rocha, M.P., 2022. Refrapy: A python program

27
isClosed=True , marker=2, area=0.1) for seismic refraction data analysis. Comput. Geosci. 159, 105020. [Link]
org/10.1016/[Link].2021.105020, URL: [Link]
28
geom = top + infill + bottom article/pii/S0098300421003022.
29
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cour-
30
napeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M.,
31 """
32
Define shot and receiver stations and Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M.,
create corresponding nodes Peterson, P., Gé rard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Ab-
33
""" basi, H., Gohlke, C., Oliphant, T.E., 2020. Array programming with NumPy. Nature
34
nstats = 48 585 (7825), 357–362. [Link]
35 Heimann, S., Kriegerowski, M., Isken, M., Cesca, S., Daout, S., Grigoli, F., Juretzek, C.,
spc = 2
36 Megies, T., Nooshiri, N., Steinberg, A., Sudhaus, H., Vasyura-Bathke, H., Willey, T.,
37 stations = np. Dahm, T., 2017. Pyrocko - An Open-Source Seismology Toolbox and Library. GFZ
38 vstack(( np. Data Services, [Link]
39 linspace(0, Hunter, J.D., 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9 (3), 90–
40 ( nstats -1)*spc, nstats), 95. [Link]
41 np. zeros( nstats))).T McKinney, W., 2010. Data structures for statistical computing in Python. In: Stéfan
42 van der Walt, Jarrod Millman (Eds.), Proceedings of the 9th Python in Science
for p in stations: Conference. pp. 56–61. [Link]
43
geom. Nguyen, F., Ghose, R., Isunza Manrique, I., Robert, T., Dumont, G., 2018. Managing past
44
createNode(p) landfills for future site development: A review of the contribution of geophysical
45
methods. In: Proceedings of the 4th International Symposium on Enhanced Landfill
46 # Create mesh for the finite element modeling
47 mesh = mt. createMesh(geom , quality=34) Mining. pp. 27–36.

48
Parsekian, A., Singha, K., Minsley, B.J., Holbrook, W.S., Slater, L., 2015. Multiscale
""" geophysical imaging of the critical zone. Rev. Geophys. 53 (1), 1–26. [Link]
49
50 Save the mesh in the binary mesh format for [Link]/10.1002/2014RG000465.

51 later use with the SeismicWaveformModeler Plattner, A.M., 2020. GPRPy: Open-source ground-penetrating radar processing and

52
""" visualization software. Lead. Edge 39 (5), 332–337. [Link]
mesh. save(’ mesh. bms’) tle39050332.1.
Ringler, A.T., Evans, J.R., 2015. A quick SEED tutorial. Seismol. Res. Lett. 86 (6), 1717–
1725. [Link]
Romero-Ruiz, A., Linde, N., Keller, T., Or, D., 2018. A review of geophysical methods
References for soil structure characterization. Rev. Geophys. 56 (4), 672–697. [Link]
org/10.1029/2018RG000611.
Ahern, T., Casey, R., Barnes, D., Benson, R., Knight, T., Trabant, C., 2012. SEED
Rü cker, C., Gü nther, T., Wagner, F.M., 2017. pyGIMLi: An open-source library for
reference manual, version 2.4. URL: [Link]
modelling and inversion in geophysics. Comput. Geosci. 109, 106–123. [Link]
pdf.
[Link]/10.1016/[Link].2017.07.011.
Allen, R.V., 1978. Automatic earthquake recognition and timing from single traces. Bull.
Steiner, M., Katona, T., Fellner, J., Flores Orozco, A., 2022. Quantitative water content
Seismol. Soc. Am. 68 (5), 1521–1532. [Link]
Beyreuther, M., Barsch, R., Krischer, L., Megies, T., Behr, Y., Wassermann, J., 2010. estimation in landfills through joint inversion of seismic refraction and electrical
ObsPy: A python toolbox for seismology. Seismol. Res. Lett. 81 (3), 530–533. resistivity data considering surface conduction. Waste Manage. 149, 21–32. http://
[Link] [Link]/10.1016/[Link].2022.05.020, URL: [Link]
Blanchy, G., Saneiyan, S., Boyd, J., McLachlan, P., Binley, A., 2020. ResIPy, an intu- science/article/pii/S0956053X22002793.
itive open source software for complex geoelectrical inversion/modeling. Comput. Steiner, M., Wagner, F.M., Maierhofer, T., Schö ner, W., Flores Orozco, A., 2021.
Geosci. 137, 104423. [Link] URL: https: Improved estimation of ice and water contents in alpine permafrost through con-
//[Link]/science/article/pii/S0098300419308192. strained petrophysical joint inversion: The Hoher Sonnblick case study. Geophysics
Bü cker, M., Flores Orozco, A., Gallistl, J., Steiner, M., Aigner, L., Hoppenbrock, J., 86 (5), 1–84. [Link]
Glebe, R., Morales Barrera, W., Pita de la Paz, C., García García, C.E., et al., 2021. Stockwell, J.W., 1999. The CWP/SU: seismic unix package. Comput. Geosci. 25 (4), 415–
Integrated land and water-borne geophysical surveys shed light on the sudden
419. [Link] URL: [Link]
drying of large karst lakes in southern Mexico. Solid Earth 12 (2), 439–461.
[Link]/science/article/pii/S0098300498001459.
[Link]
Sullivan, C.B., Kaszynski, A., 2019. PyVista: 3D plotting and mesh analysis through a
Cockett, R., Kang, S., Heagy, L.J., Pidlisecky, A., Oldenburg, D.W., 2015. SimPEG: An
open source framework for simulation and gradient based parameter estimation streamlined interface for the visualization toolkit (VTK). J. Open Source Softw. 4
in geophysical applications. Comput. Geosci. 85, 142–154. [Link] (37), 1450. [Link]
1016/[Link].2015.09.015, URL: [Link] Uieda, L., Oliveira, Jr., V.C., Barbosa, V.C., 2013. Modeling the earth with fatiando a
pii/S009830041530056X. terra. In: Proceedings of the 12th Python in Science Conference. pp. 96–103.

You might also like