0% found this document useful (0 votes)
18 views11 pages

1 s2.0 S0920410519302803 Main

The document presents a novel data-driven approach for identifying rock types at the drilling bit using machine learning techniques, addressing the challenge of delayed rock type detection due to the distance of sensors from the bit. By employing various machine learning algorithms, the study demonstrates a reduction in classification error from 13.5% to 9%, enhancing the precision of directional drilling operations. The approach is applicable to new wells and aims to optimize drilling trajectories for better contact with thin target layers.

Uploaded by

Samiran Sardar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

1 s2.0 S0920410519302803 Main

The document presents a novel data-driven approach for identifying rock types at the drilling bit using machine learning techniques, addressing the challenge of delayed rock type detection due to the distance of sensors from the bit. By employing various machine learning algorithms, the study demonstrates a reduction in classification error from 13.5% to 9%, enhancing the precision of directional drilling operations. The approach is applicable to new wells and aims to optimize drilling trajectories for better contact with thin target layers.

Uploaded by

Samiran Sardar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Petroleum Science and Engineering 178 (2019) 506–516

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering


journal homepage: [Link]/locate/petrol

Data-driven model for the identification of the rock type at a drilling bit T
a,∗,1,2,5,4 a,1,2,5 b,3,2,5
Nikita Klyuchnikov , Alexey Zaytsev , Arseniy Gruzdev ,
Georgiy Ovchinnikova,1,2,5, Ksenia Antipovaa,1,4, Leyla Ismailovaa,6,5, Ekaterina Muravlevaa,6,5,
Evgeny Burnaeva,1, Artyom Semenikhinb,1, Alexey Cherepanovc,3,7, Vitaliy Koryabkinc,1,7,
Igor Simonc,1,3,7, Alexey Tsurganc,1,7, Fedor Krasnovc,3, Dmitry Koroteeva,1,5,4
a
Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Building 3, Moscow, 143026, Russia
b
IBM East Europe/Asia, 10, Presnenskaya Emb, Moscow, 123112, Russia
c
Gazprom Neft Science & Technology Centre, 75-79 Liter D Moika River Emb., St. Petersburg, 19000, Russia

ARTICLE INFO ABSTRACT

Keywords: Directional oil well drilling requires high precision of the wellbore positioning inside the productive area.
Directional drilling However, due to specifics of engineering design, sensors that explicitly determine the type of the drilled rock are
Machine learning located farther than 15m from the drilling bit. As a result, the target area runaways can be detected only after
Rock type this distance, which in turn, leads to a loss in well productivity and the risk of the need for an expensive re-
Classification
boring operation.
MWD
We present a novel approach for identifying rock type at the drilling bit based on machine learning classi-
LWD
fication methods and data mining on sensors readings. We compare various machine-learning algorithms, ex-
amine extra features coming from mathematical modeling of drilling mechanics, and show that the real-time
rock type classification error can be reduced from 13.5% to 9%. The approach is applicable for precise direc-
tional drilling in relatively thin target intervals of complex shapes and generalizes appropriately to new wells
that are different from the ones used for training the machine learning model.

1. Introduction assembly 15 m–40 m behind the drilling bit. The sensor data is the
source of information on whether the sensors are within the oil/gas
Oil and Gas reserves are becoming more complex for an efficient bearing formation or not. Based on this information, engineers correct
exploration with significant financial margins nowadays. There is a the drilling trajectory.
number of examples when oil companies have to approach thin oil/gas The gap between the bit and the sensors is a significant issue pre-
bearing layers of complex topology. These layers, or the target inter- venting the timely correction of the drilling trajectory. It can result in a
vals, can be as thin as a couple of meters. One of the common ways of non-optimal placement of the well or multiple cost-intensive re-drilling
exploring such intervals is directional drilling. exercises. Fig. 1 shows schematic illustrations to supply the definition of
The directional drilling aims to place a wellbore in a way that it has the problem.
the maximal contact with the thin target layer. Later requires the This paper proves the feasibility of a technology aimed at opti-
wellbore trajectory to follow all the folds of the layer as accurate as mizing trajectories of directional wells ensuring best possible contact of
possible. To follow the folds, drilling engineers use Logging While the wellbore and the target layer of the reservoir. The technology al-
Drilling (LWD) data recorded by physical sensors placed on a borehole lows tackling the challenge of a delayed reaction on trajectory


Corresponding author.
E-mail address: [Link]@[Link] (N. Klyuchnikov).
1
Conception and design of study.
2
Implementation of methods.
3
Acquisition of data.
4
Analysis and interpretation of data.
5
Drafting the manuscript.
6
Literature review
7
Revising the manuscript.

[Link]
Received 31 May 2018; Received in revised form 13 March 2019; Accepted 14 March 2019
Available online 22 March 2019
0920-4105/ © 2019 Elsevier B.V. All rights reserved.
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 1. Schematic illustration of the drilling string


(on the left) and the effect of timely applied trajec-
tory correction (on the right): the black curve shows
a trajectory in case rock types are available only at
the distance of 15 m from the drilling bit, blue dashed
curve corresponds to the trajectory when rock types
are available at the drilling bit.

correction during drilling of directional wells. Machine learning allows Drilling (SED):
eliminating 15 m–40 m gap between the drilling bit and the LWD sen-
WOB 2 × RPM×TRQ
sors and corresponding speeding up of decision making at the trajectory SED = + ,
A A×ROP (2)
correction. Along with machine learning approaches we examine, how
mathematical modeling can advance machine-learning based ap- where A represents a cross section area of the wellbore.
proaches. Zhou et al. (2010) illustrates that unsupervised learning together
Basically, a trained data-driven algorithm allows a computer to with the minimization of SED is a promising approach for the optimi-
identify when the bit touches a shale-rich part of the formation by a zation of the penetration rate. Another effort on penetration rate opti-
continuous screening through the real-time Measurements While mization is presented by Hegde and Gray (2017). The authors use the
Drilling (MWD) data. In machine learning, this problem is referred as Random Forest algorithm to build a model linking the penetration rate
the two-class classification problem: we need to create a predictive with weight on bit, rotation speed, drilling mud rate, and unconfined
classification model (a classifier) that can identify whether the bit at the rock strength. The model allowed to optimize the penetration rate for
current moment is in the shale-rich part of the formation (the first class) up to 12% for the wells close to ones in the training set.
or not (the second class). In addition to labeling objects, the classifier LaBelle et al. (2000) and LaBelle (2001) describe an application of
can output the probability of the object to belong to a specific class, Artificial Neural Networks for material typing and rock typing at dril-
thus allowing to introduce confidence of predictions. ling. MWD-like measurement and the trained Neural Networks allow a
From the machine learning perspective, the main peculiarities of the relative classification error to be as small as 4.5% for a case with the
problem are missing values in measurements and a relatively high complete set of available mechanical measurements (features).
imbalance of classes: there are only 13.5% of shales and hard-rocks in According to the fundamentals of Machine Learning, Gaussian
the available data, where “hard” refers to a measure of the resistance to Processes and Neural Networks are not the best fit for rock type clas-
localized plastic deformation induced by either mechanical indentation sification with MWD data as they can not automatically handle missing
or abrasion, and 86.5% of sands. Therefore, we tested different machine values that typically occur in MWD data. Thus, both methods require
learning algorithms under these peculiarities, and developed appro- training data without missing values that implies the development of
priate evaluation methods of their performance. accurate imputation procedures. The difference between these methods
The main contribution of this work is a novel data-driven approach is in the preferred data size and its dimensionality. Gaussian Processes
for identifying lithotype at the drilling bit. We prove the feasibility of are based on the Bayesian approach, so they can work well when
this approach by studying mathematical and physical modeling and training sample is small, however, their area of application is limited to
applying three essential machine learning baselines (Logistic low input dimensions and small sample sizes (up to 10000 elements).
Regression, Neural Networks and Gradient Boosting on Decision Trees) Neural networks are based on frequentist inference, so they require
for the problem of lithotype classification based on MWD data, which large training samples, but they can work well in large dimensions. In
come from 27 wells of the Novoportovskoe oil and gas condensate field case we need to reflect the temporal behavior of MWD in input features,
on Western Siberia. we face high dimensions, also for real-life MWD sample sizes are large.
Therefore Neural Networks would be more preferable than Gaussian
1.1. Machine learning in drilling application Processes, if there are no missing values in training and real-life data.
Decision trees and methods based on them (Hastie et al., 2009) such
There are previous studies on the involvement of machine learning as Random Forest and Gradient Boosting can automatically handle
for detection of a material type at drilling bit. Zhou et al. (2010) cover missing values, and they are comfortable with large sample sizes.
an analysis of the applicability of regression and classification based on However, tree-based methods are weak at data interpolation, so they
Gaussian Processes and unsupervised clustering for on-bit rock typing generalize well only when the density and the diversity of points in the
with MWD data. In the report the authors consider the rate of pene- training sample are high. Gradient Boosting can also handle classes
tration (ROP), pulldown pressure, which is also referred after as a imbalance by automatic weighting the importance of data entries while
weight on bit (WOB), and top drive torque (TRQ) as the key parameters maximizing the quality of a classifier.
for building the data-driven forecasting model. One of the conclusions
is that a value called adjusted penetration rate (APR) (Eq. (1)) is the 1.2. Modeling of drilling mechanics
best reflection of a features specifics of the rock which are unknown a-
priori. The authors conclude that the optimal way to predict a rock type Physical models are based on the physical equations (typically mass
at the drilling bit is to apply a hybrid model combining the advances of and energy balances) governing the system under analysis. Downton
both supervised classification and unsupervised clustering. (2012) examines the modeling of different aspects of drilling and fo-
ROP cuses on the possibility of bringing these models together into a single
APR approach and creating unified control systems to automate the entire
WOB TRQ (1)
process. Sugiura et al. (2015) gives the most accurate description of the
APR is tested in this study as well as another characteristic utilized state-of-the-art in the modeling of drilling systems for automation and
by many authors (Zhou et al., 2010, 2011), the Specific Energy of control, adaptive modeling for downhole drilling systems and actual

507
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

tasks of the industry. Cayeux et al. (2014) provides a detailed analysis The reservoir rocks are fine-medium grained sandstones and siltstone
of sensor equipment on the drilling rig and the issues of its layout based with thin layers of shales and limestone. The average rocks perme-
on obtaining the most qualitative boundary and initial conditions for ability is 0.01–0.03 μm2 and the porosity is about 18%.
solving the problems of physical modeling of the drilling process. The
majority of the papers on drilling mechanics are devoted to the vibra- 2.1.2. Initial data
tional analysis of the drill-string (Shor et al., 2014). The initial data included mud logging, involved the rig-site mon-
Initially, analytical formulas derived from a simplified view of the itoring and assessment of information measured on the surface while
drilling process can be used (Detournay and Defourny, 1992). The input drilling and MWD, LWD data from downhole sensors. The main purpose
data (WOB and RPM) allow to predict the output (TRQ and ROP). The of MWD systems is to determine and transmit to the surface of the in-
main difficulty is the calibration of the model, which requires finding clinometry data (zenith angle and magnetic azimuth) in real time while
the model coefficients from the experimental data. The general scheme drilling. It is necessary to determine the well trajectory. Sometimes the
is the following. For a known set of lithotypes in height with unknown inclinometry data are supplemented with information about the drilling
parameters of the model, a numerical solution is found, and the com- process and logging data (LWD). Logging allows measure the properties
puted values of ROP are compared with the experimental data. Thus, in of the rock, dividing the geological section into different lithotypes.
the presence of a sufficient number of experimental data, it is possible The data includes the following parameters: WOB, TRQ, ROP, APR,
to find the model coefficients for each of the lithotypes and bit types. SED, also rotary speed (RPM), input flow rate (Q in), output flow rate
Therefore, one may simulate the drilling process for an arbitrary set of (Q out), standpipe pressure (SPP), and hook load (HL).
lithotypes in height, thereby substantially expanding the training set for Initial information about the drilled lithotypes was Lithology Map
the predictive model. produced by petrophysical interpretation of LWD measurements which
Non-linear models of drill string vibrations were considered by were represented by natural gamma radiation; apparent resistivity;
Spanos et al. (2002), where the nonlinearity arises when taking into polarization resistance; electromagnetic well log; induced gamma-ray
account the interaction of the bit and rock. Only lateral vibrations were log; neutron log; acoustic log.
examined therein. The state of the system is described by the transverse LWD petrophysical interpretation was also used to compare the real
displacement u and the angle of rotation θ of each of N segments. The values of the lithotype and the prediction obtained.
resulting system of equations is:
2.1.3. Pre-processing
M u + C u + K u + F (u) = g (t ), (3)
For the solutions based on machine learning approaches, it is crucial
where u = [u1, …, uN , 1, …, N ]; M, C, K are the system mass, damping to preprocess raw data into a suitable format for data-driven algo-
and stiffness matrices, respectively; g (t ) denotes the excitation applied rithms, also known as constructing data-pipeline. For the real-world
to the system, and u , u , u correspond to the displacement, velocity cases, the problem of preprocessing is usually complicated: the size of
and acceleration vectors. Nonlinear part F (u) plays an important role, it the raw data, the variety of formats and the number of sources can be
arises due to the contact interaction of the drill string with the wall. too large to apply straightforward methods (García et al., 2016; Taleb
While matrices M , C , K depend on properties of drill-string, the friction et al., 2015). Although some formats are common for oil-and-gas in-
term F depends on rock type. By solving the inverse problem for F, for dustry, such [Link] files, others can vary from company to company e.g.
example, determining constants in Hertzian contact law, we get para- drilling reports. In order to effectively process source files, the pipeline
meters characteristic for rock type. To increase the quality of the model has to handle common types of errors in them. Some formats can also
the right-hand side of equation (3) can be considered as a random have different options, for example, [Link] files can have dif-
(Wiener) process. Unfortunately, this type of models is hardly applic- ferent columns separators. Moreover, the number of wells for the pre-
able as input data is incomplete: to get matrices M , C , K we need to processing can be as large as hundreds or even thousands, so the pro-
know exact geometric properties of drill-string along with material posed framework should work in a fast and accurate way in an
ones. automatic regime.
In this study we used a task-based approach using Luigi8 framework
2. Materials and methods for Python programming language. This framework allows building
data pipelines, where each step of the preprocessing can be im-
This section first specifies the origin of data used in our work and its plemented as a separate task, such as processing of source files or
pre-processing procedures, next it describes machine learning methods merging some chunks of data, which can depend on other tasks. Thus,
we studied for rock type classification at a drilling bit, then the section the whole pipeline is resistant to errors in raw data and dependencies
defines quality metrics used for evaluation of classifiers, and finally, it between tasks.
describes approaches for improving classification quality by choosing
input features. [Link]. Pipeline description. The complete scheme of pipeline used in
our study is shown in Fig. 2. The pipeline for the preprocessing of the
data consisted of four main steps: extraction of required columns from
2.1. Data description and pre-processing
the raw data files; selection of the relevant horizontal parts of the wells;
merging data from different sources; unifying depths steps for
This subsection specifies geological formation on which the data
constructed dataframes.
was collected, then it outlines essential for this work components of the
All data sources were in different directories. The first step for each
data, and describes the process of obtaining them from the raw ex-
source file processing was the extraction of required columns or cells of
ported files.
data. This step is represented in the schema as outgoing arrows from
each file (.xls,.las [Link]). All information from drilling reports was
2.1.1. Geological formation of the interest aggregated into the file aggregate_table.pickle. We stored the results of
The Novoportovskoye oil and gas condensate field, located within each intermediate step in pickle files, which were serialized tables of
the Yamal Peninsula, 30 km from the Gulf of Ob Bay, is the largest field data. Pickle format is storage-consuming, but fast for input/output
under the development of the northwest of Siberia, Russia. The for- operations.
mation includes several strata, the most productive of which is the
Lower Cretaceous NP-2-3 — NP-8 (the formation depth is 1800 m), and
sand layers of the Tyumen suite J-2-6 (the formation depth is 2000 m). 8
[Link]

508
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 2. Raw data preprocessing pipeline.

The mud logging data was discretized by files with the sampling ( ) = 1/(1 + exp( )) of a linear function of input features
frequency equal to the sampling of other sources of data in block d
= i = 1 wi xi , where x i is a value of some input feature e.g. WOB, and
“Discretization”. Next, we extracted data corresponding to the hor- wi is a weight for this feature. During the learning phase, we estimate
izontal part from each mud logging table in the block “Get horizontal weights wi by maximizing likelihood or quality of fit of the model to the
part”. For obtaining boundaries of the horizontal parts, we used the data.
interpretation data. In this article we use Logistic regression as a baseline to identify
Some wells had several laterals (in preprocessing pipeline they were improvement due to the usage of more complex significantly nonlinear
called holes), that is why part of data was associated with laterals (e.g. Gradient Boosting and Artificial Neural Networks approaches for our
mud logging data), and other data was connected to wells (e.g. drilling problem.
reports). The final step of the preprocessing is merging data for each
hole by depth (see block “Merge” in the schema). For merging all
chunks of data, we used a table with the correspondence between lat- 2.2.2. Decision trees and Gradient Boosting
erals and wells from “[Link]”. As a result, we received the The most widely used approach for the solution of classification
set of merged data into depth-associated time series by laterals (see problems is based on the Ensembles of Decision Trees. An example of a
block “Final datamarts” in the schema). decision tree is presented in Fig. 3:
After preprocessing of the raw data, we reduced the granularity of
time-series by aggregating them over depth intervals of size 0.1 m. For • for each object the classifier proceeds through the decision tree
intervals containing any data, we averaged its values, for intervals according to the values of input variables for this object until it
without data, we used forward fill with a constant that equals the latest reaches a leaf of the tree
preceding value. • if it reaches the leaf, it returns either the major class in this leaf or
probabilities to belong to classes according to the distribution of
2.2. Machine learning models objects of different classes from the training sample, that correspond
to this leaf.
We considered the of rock type identification as the common ma-
chine learning binary classification problem. To attack this problem, we The advantages of this approach include a superior performance
used three machine learning methods: Logistic Regression, Gradient with default settings (Fernández-Delgado et al., 2014), fast model
Boosting on decision trees and Artificial Neural Networks. These construction, almost no over-fitting and handling of various problems
methods are described in this section. in data including the availability of missing values and outliers.
Among various approaches for construction of Ensembles of
2.2.1. Logistic regression Decision Trees, the most popular nowadays is Gradient Boosting (Chen
The logistic regression is a generalization of the linear regression to and Guestrin, 2016; Kozlovskaia and Zaytsev, 2017), which essentially
classification problem Hastie et al. (2009). follows functional gradient in the space of decision tree classifiers to
For logistic regression, we suppose that the target probability of an construct the ensemble. At each step it increases weights of objects that
object to belong to a certain class is a sigmoid transformation are poorly classified using the current ensemble, thus increasing their

509
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 3. An example of a real decision tree for the lithotype classification: internal nodes contain decision rules, the splits of the training objects that fall into this node
into two classes (Sand — left number, Shale & Rock - right number). Color of the node corresponds to this distribution. Leaf nodes don't have decision rules, but
provide suggested classes.

contribution to the total model quality measure. The algorithm has the end.
following main parameters: There are many ways to define this deep composition, the most
relevant to our problem are Feed Forward fully connected (Hornik
• learning rate — how fast it learns the ensemble. If learning rate is et al., 1989) and Long-Short Term Memory (LSTM) (Hochreiter and
too small, we need to use larger number of trees in the ensemble at Schmidhuber, 1997) architectures. For fully connected architecture we
the cost of larger computational power, which grows linearly from connect each input with each output at each layer, when applying
the number of trees; in the opposite case, we can get overfitting as linear function. For LSTM we use as additional input some variables
the adaptation of the ensemble to the training data occurs too fast; from the previous moment of time, thus keeping some information from
In the experimental section we demonstrate this effect in Fig. 5; the distant pass and creating long-term memory effect for a neural
• maximal depth — maximal depth for each tree in the ensemble network. This scheme is shown in Fig. 4.
• random subspace share — share of features used by each decision During experiments we applied two classes of Neural Networks:
tree Feed Forward and LSTM. Our experiments were based on different
• subsample rate — share of objects from the training sample used for configurations of these classes of Neural Networks.
training of each decision tree.
2.3. Quality metrics
2.2.3. Artificial Neural Networks
Alternative modern data-driven approach for classification pro- There are many quality metrics for comparing classifiers. In this
blems is Artificial Neural Networks. They are more demanding for article we used three metrics: a specific industry-driven metric
quality and size of input data and require more subtle tuning of hy- Accuracy L and two common machine learning metrics, namely, area
perparameters. On the other hand, this type of machine learning al- under Receiver Operating Characteristic (ROC) curve (ROC AUC) and
gorithms can be more powerful in some types of problems and for some area under Precision-Recall (PR) curve (PR AUC).
specific structures of input data (Hung et al., 2017; Ahmad et al., 2017). We used additional quality metrics, because accuracy metric alone
The main idea behind Neural Networks is to define a deep compo- is not very representative due to significant class imbalance, such that a
sition of sequential application of linear and nonlinear multi-input and constant “always-sand” predictor gives relatively high accuracy, yet
multi-output functions parameterized by weights of linear functions. brings no practical benefits.
Each composition of linearity and nonlinearity is called a layer. As We did not consider specific metrics for time-series or ordered data,
gradient of classification error is easy to propagate through this com- as there was no universally acknowledged metric that was easy to in-
position, we can apply gradient methods for optimizing a quality metric terpret (Burnaev et al., 2016; Artemov and Burnaev, 2015).
with respect to these parameters and get a pretty accurate model in the Let us consider a test sample D = {(x i, yi )}ni = 1, x i is an input vector
for an interval, yi is a true class, either 0 (Sand) or 1 (Shale and Rock).
We have predictions by a classifier for each interval ŷi {0,1} . The
length of each interval is li , i = 1, n .
Accuracy L is the sum of lengths of intervals with correct predictions
of lithotype divided by the total depth of considered wells.
n
l [y = yˆi ]
i=1 i i
Accuracy L= n ,
l
i=1 i (4)

where for any arguments a and b expression [a = b] henceforth means


the indicator function: it equals 1 if a is equal to b, and 0 otherwise.
To define ROC AUC and PR AUC metrics we need to introduce ad-
ditional notation. After training a classifier, it outputs a probability of
an object to belong to a class. To obtain the final classification with
labels we apply a threshold to the probabilities: the objects with
probabilities below the threshold are classified as the first class objects,
and the objects with probabilities above the threshold are classified as
Fig. 4. An illustration of information flows in LSTM. Xt are input values at the the second class objects.
moment t, Yt is the corresponding output of the network, arrows between LSTM For a particular classification there are four numbers that represent
cells represent additional input of internal variables from the previous moment. its quality: number of True Positive (TP) — correctly classified objects

510
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 5. Quality vs Gradient Boosting parameters. Curves of different colors correspond to different learning rates.

Table 1 By dividing the number of TP objects by the total number of positive


Feature selection results. Greedy selected set of features combined with the objects (sum of TP and FN) we get True Positive Rate (TPR), by dividing
Extra set provides the best quality. the number of False Positive objects by the total number of negative
Feature set ROC AUC PR AUC Accuracy L objects (sum of False Positive and True Negative objects) we get False
Positive Rate (FPR):
– 0.494 0.181 0.866
B 0.794 0.492 0.865 TP FP
B, F 0.803 0.484 0.867 TPR = , FPR = .
TP + FN FP + TN (7)
B, F, D, L 0.829 0.504 0.870
G 0.850 0.559 0.888
E 0.653 0.359 0.879 By varying the threshold, we get a trajectory in the space of TPR and
B, E 0.848 0.581 0.900 FPR that starts at point (0,0) when all objects are classified to the ne-
B, F, D, L, E 0.870 0.600 0.902 gative class, and ends at (1,1) where all objects are classified to the
G, E 0.878 0.614 0.905 positive class. This trajectory is ROC curve. In a similar way we define
G, E (fine-tuned) 0.880 0.625 0.910
precision as TP/(TP + FP) and recall as TP/(TP + FN) and plot the
trajectory in the space of precision and recall. This trajectory is PR
of the first class, False Negative (FN) — objects of the first class at- curve.
tributed by the classification to the second class, False Positive (FP) — – By calculating areas under ROC and PR curves, we get corre-
objects of the second class attributed by the classification to the first spondingly ROC AUC and PR AUC widely used to measure the quality of
class, and True Negative (TN) — correctly classified objects of the classifiers. Higher values of ROC AUC and PR AUC suggest that the
second class: classifier is better. ROC AUC and PR AUC values for a random classifier
are 0.5 and the share of the positive class respectively, ROC AUC and PR
1 n 1 n AUC values for the perfect classifier are 1. For imbalanced classification
TP = [y = 1][yˆi = 1], TN = [y = 0][yˆi = 0],
n i=1 i n i=1 i (5) problems, PR AUC suits better, for a detailed discussion on metrics for
imbalanced classification see Burnaev et al. (2015) and references
1 n 1 n therein.
FP = [y = 0][yˆi = 1], FN = [y = 1][yˆi = 0].
n i=1 i n i=1 i (6)

Fig. 6. Importance of features for the Gradient Boosting classifier predictions. Two sets of features are included: Greedy and Extra. The bottom-up order of Greedy
features corresponds to their selection order during the selection procedure.

511
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Table 2 features from the basic ones:


Performance of the approaches significantly depends on the set of used features.
However, the usage of mathematically modeled features doesn't improve
quality.
• Derivatives (D) — rolling mean and standard deviation with the
window size of 1 m, and the difference between values on rolling
Feature set ROC AUC PR AUC Accuracy L window's borders

– 0.499 0.198 0.858


• Lagged (L) — lagged basic features i.e. their values 0.1, 0.5, 1 and
10 m ago

B 0.837 0.552 0.890
M 0.524 0.208 0.829
Fluctuations (F) — standard deviation of original time series inside
M, FM 0.566 0.264 0.855 aggregated (see sec. 2.1.3) intervals of 0.1 m
G
G, M
0.874
0.875
0.609
0.597
0.906
0.904
• Extra (E) features — true class values 20 and 50 m ago, since they
can be obtained from LWD measurements with such spatial lags.
G, M, FM 0.870 0.590 0.900

2.4.2. Mathematical modeling of drilling mechanics


Table 3 Rock destruction under load has been studied in great detail by
Performance of machine learning approaches Logistic regression, Gradient Mishnaevsky (1993) and Mishnaevsky Jr (1995), but only a few works
Boosting, and Feedforward NN. All performance measures are better if higher. studied dynamic properties of the process.
Algorithm ROC AUC PR AUC Accuracy L
We started with the assumption that drill-bit rock interaction could
be described as several processes: rock crushing, rock cutting and rotary
Always predict the major class 0.494 0.181 0.866 friction on drill-bit. We further assumed the rate of penetration was
Logistic regression 0.860 0.585 0.908 proportional to the weight on bit (rock crushing part) and the angular
Gradient Boosting 0.880 0.625 0.910
velocity (cutting and friction part):
Feedforward NN 0.875 0.625 0.911
ROP = a1 + a2 WOB + a3 . (8)

2.4. Feature engineering and selection On the other hand, following Detournay and Defourny (1992) and
assuming torque on bit is mainly related to rock cutting process, we had
In this section we describe several methods of refining information the following relation:
about rock types which is stored in MWD and LWD data, so that clas- ROP
sifiers can take advantage of it. TOB = a4 + a5.
(9)

To get a smaller set of parameters, we substituted (8) into (9):


2.4.1. Time-series features
b1 + b2 WOB
At each moment of time not only current MWD and LWD values TOB = + b3 .
(10)
characterize the type of rocks, but also previous values and their re-
lationships among each other bring additional information. Therefore For the fixed bit, parameters b1, b2 , b3 depend on rock properties and
in this section, we start with considering a few ways to incorporate such therefore can characterize them, so they can be used as Mathematical
information as input features. (M) features for rock type identification. These parameters were ob-
The Basic (B) set of features used in a predictive model includes tained for short intervals with size m of MWD time-series by solving the
original mechanical features, SED, and APR. We also derived new optimization problem (11), which minimized the model local

Fig. 7. Performance curves for three different machine learning approaches: Logistic Regression, Gradient Boosting, and Feedforward NN; compared with the input-
agnostic method that always predicts the major class. As the curves for Gradient Boosting and Feedforward NN lie higher than the curves for Logistic regression, we
conclude that the corresponding models are better.

512
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

will have in the field test on new wells.


The classifier was constructed with Gradient Boosting of 50 decision
trees, each of maximal depth 6. The best selected set Greedy (G) consists
of ROP, HL, rolling differences of WOB, 1m rolling standard deviations
of ROP and TRQ, 1m moving average of ROP, 0.5 m lagged TRQ, and
10 m lagged Q out, Q in, HL and TRQ.
We also fine-tuned Gradient Boosting hyperparameters by in-
creasing the number of decision trees up to 100 and conducting a grid-
search LOWO-CV for maximal depth of trees, random subspace share
and sub-sampling rate. Table 1 summarizes the results of the feature
selection process. We obtained the best results for all quality metrics
using the selected set of features G along with extra set E. In particular,
Accuracy L is larger than 0.9.
Fig. 5 shows the dependence of quality metrics on learning rate and
the number of trees in the ensemble obtained by Gradient Boosting.
Low learning rates (blue curves) result in underfitting, whereas high
learning rates (red curves) result in overfitting of the model. Orange
and green curves correspond to a good trade-off.
Fig. 6 shows feature importances for the fine-tuned classifier trained
on the whole dataset. Importance scores indicate how many times a
particular feature played the key role in the classifier's decision.

Fig. 8. Gradient Boosting performance on different wells with respect to well-


specific shale and rock percentage. The vertical axis represents the improve-
3.2. Examination of mathematical modeling features
ment of Accuracy L from using Gradient Boosting over the major class predic-
tions. Only 13 out of 27 wells had no missing values of features required
for mathematical modeling. For them we studied the effect of
Mathematical features (M) and their fluctuations (FM) on quality me-
discrepancy at some moment k:
trics. We used window size m = 5. The results are presented in Table 2.
k
b1 + b2 WOBk
2 Mathematical modeling features turned out to have weak predictive
b1 (k ), b2 (k ), b3 (k ) = argmin TOBk b3 . power: no significant gain on top of the Greedy features was obtained.
b1, b2, b3 i=k m +1 k

(11)
Because of locality, window size m should not be large. 3.3. Algorithms performance

2.4.3. Feature selection We compared three classes of machine learning methods in details:
Generating too many interrelated features results in their re- Logistic regression, Gradient Boosting, and Neural Networks. Results in
dundancy, longer time of models training and risk of overfitting. Thus, this section correspond to the performance of the best-found config-
after feature engineering, we ran the feature selection procedure which urations for each method using LOWO-CV. All methods used both
had the aim to select the subset of features that maximized classifica- Greedy and Extra sets of features.
tion quality. For logistic regression, we observed the best quality when no reg-
We used a “greedy” approach for feature selection: the procedure ularization is applied. The best-found configuration of Gradient
started from the empty set and expanded it by adding step by step the Boosting for 100 trees had the following hyper-parameters: learning
most impactful feature from the pool of remaining ones according to a rate 0.05, maximal depth 3, random subspace share 0.8, and sub-sam-
selected quality metric. pling rate 0.55. For Feedforward Neural Networks (NN) we tested dif-
ferent architectures with 2-, 3- and 4-layer networks. The best found
3. Results configuration had two hidden layers of size 100 and 500 neurons using
ReLU activation between layers.
In this section we: Table 3 summarizes the best performance of different classification
methods. Gradient Boosting uniformly dominates logistic regression, in
• report on how different sets of features affect the quality of rock type turn, Feedforward NN and Gradient Boosting qualities are comparable
classification, which features are more informative due to the preprocessing pipeline we developed, which filled the
• examine selection of hyperparameters for different machine missing data sections with rather adequate values. LSTM training time
learning methods was impractically long, whereas its best-found performance was similar
• compare the performance of different machine learning methods to Feedforward NN.
and show how classification quality depends on the balance of Fig. 7 present visual comparison of performance of different clas-
classes. sification methods.
Fig. 8 shows performance of the Gradient Boosting with respect to
3.1. Feature selection results lithotype classes balance. The lithotype predictions with the trained
classifier are better than major-class predictions for 24 out of 27 wells.
For feature selection we used ROC AUC quality metric obtained via Improvement of Accuracy L increases if the classes are more balanced,
leave-one-well-out cross-validation (LOWO-CV). Since sensors readings that is, if they tend to have more equal shares of shales and rocks (first
are autocorrelated, it is crucial to split the dataset by wells, not by class), and sands (second class). However, the improvement varies
random subsets during cross-validation. Otherwise, data leakage will significantly from well to well. Fig. 9 shows examples of lithotype
take place resulting in overestimated quality, that is, models will have classification on three wells with different achieved quality.
more information about the test set during cross-validation than they

513
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 9. Examples of lithotype classification for three wells with different achieved quality: from one of the best on the left through average in the middle to one of the
worst on the right. In each subfigure the left column shows the true lithotype values: yellow color represents sand, grey color represents shales and hard-rock; the
right column shows the respective probability of lithotypes given by the classifier.

4. Discussion discriminative features from additional sensors. Another way is to apply


domain adaptation approach (Ganin and Lempitsky, 2015) for trans-
In this section, we discuss possible ways for improving the classifi- forming input features for non-Neural Network algorithms like Gradient
cation accuracy of the data-driven models. For this purpose, we study Boosting. However, the performance of Neural Networks is unlikely to
peculiarities of the initial data by embedding multi-dimensional MWD be improved much with this approach, since they are capable of
features in a convenient for analysis 2D space. In Fig. 10 we represent learning universal representations (Bilen and Vedaldi, 2017).
data applying a t-distributed Stochastic Neighborhood Embedding Other ways for improving classification quality belong to three
(Maaten and Hinton, 2008) method on vectors of Basic and Lagged major areas.
features including APR, SED, and their lagged values. The first area is consideration of different types of income data like
The 2D representation shows the non-homogeneous nature of the LWD data, information about a well or a bit in total or drill cuttings.
real-world MWD measurements. In terms of Machine learning, this The main problem here is how to integrate different data sources of
means that the algorithms are trained on a few localized areas of the variable degree of fidelity and spatial resolution (Zaytsev and Burnaev,
data points, which dilutes the information among them. That is, we face 2017) as the current approaches are often problem-specific especially
a case when the algorithms make use of multiple small datasets instead when dealing with more than two levels of fidelity of data (Zaytsev,
of a single uniform large one. For example, we did not use features that 2016).
explicitly specify pads, while the 2D representation of data has sepa- The second area is related to correcting sample labels. One may
rated pads. Such distribution of data can negatively affect the gen- want to use raw LWD data to train at and to predict, because LWD data
eralization ability of classifiers, especially the ones that are based on will allow one to replace subjective lithotype interpretation made by
threshold rules. Moreover, the mixture of different rock types and in- experts with automatic labeling based on images at a training set
distinct margins of classes illustrate fundamental indiscriminability of markup, and will likely open new horizons for better resolution of the
some part of data within the considered features. predictive model.
One way to improve generalization ability is to use more The third area is the multi-class classification which is likely to

514
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Fig. 10. 2-dimensional embedding of the MWD data. Colors of scattered points indicate rock types in the corresponding drilling states. Contours indicate different
PADs. It is easy to distinguish different PADs for this 2-dimensional embedding, while it is hard to distinguish two LOBs.

allow distinguishing between several rock types rather than only a Burnaev, E., Koptelov, I., Novikov, G., Khanipov, T., 2017. Automatic construction of a
target interval and a boundary shale-reach zone. This will enrich the recurrent neural network based classifier for vehicle passage detection. In: Ninth
International Conference on Machine Vision (ICMV 2016), vol. 10341. International
application of such data-driven predictions and move them from the Society for Optics and Photonics, pp. 1034103.
point of just operative trajectory correction towards the capabilities to Cayeux, E., Daireaux, B., Dvergsnes, E.W., Florence, F., 2014. Toward Drilling
optimal control of the penetration rate with respect to maximal drilling Automation: on the Necessity of Using Sensors that Relate to Physical Models. Society
of Petroleum Engineers (SPE-163440-PA).
efficiency at minimal tolerance to potential failures related to geo- Chen, T., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. In: Proceedings of
mechanical specifics of the rocks. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. ACM, pp. 785–794.
Detournay, E., Defourny, P., 1992. A phenomenological model for the drilling action of
5. Conclusion drag bits. In: International Journal of Rock Mechanics and Mining Sciences &
Geomechanics Abstracts. vol. 29. Elsevier, pp. 13–23.
This study illustrates the capabilities of machine learning to handle Downton, G.C., 2012. Challenges of modeling drilling systems for the purposes of auto-
mation and control. IFAC. Proc. Vo 45 (8), 201–210.
the real technological issues of directional drilling. The accuracy of
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., 2014. Do we need hundreds
prediction of rock types relevant to directional drilling management of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15 (1),
reaches 91%, that is, the classification error drops from 13.5% (major- 3133–3181.
class prediction) down to 9% (the best-achieved performance by ex- Ganin, Ya, Lempitsky, V., 2015. Unsupervised domain adaptation by backpropagation. In:
International Conference on Machine Learning, pp. 1180–1189.
amined algorithms). The involved algorithms allow real-time im- García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F., Nov 2016. Big data
plementation which makes them useful for drilling support IT infra- preprocessing: methods and prospects. Big Data Anal. 1 (1), 9. [Link]
structure. Further development of the predictive algorithms is covered 1186/s41044-016-0014-0. ISSN 2058-6345. [Link]
0014-0.
in the discussion. Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009. The Elements of Statistical
Learning: Data Mining, Inference and Prediction, 2 edition. Springer.
Appendix A. Supplementary data Hegde, C., Gray, K.E., 2017. Use of machine learning and data analytics to increase
drilling efficiency for nearby wells. J. Nat. Gas Sci. Eng. 40, 327–335.
Hochreiter, S., Schmidhuber, J., November 1997. Long short-term memory. Neural
Supplementary data to this article can be found online at https:// Comput. 9 (8), 1735–1780. [Link] ISSN
[Link]/10.1016/[Link].2019.03.041. 0899-7667. [Link]
Hornik, K., Stinchcombe, M., White, H., July 1989. Multilayer feedforward networks are
universal approximators. Neural Network. 2 (5), 359–366. [Link]
References 0893-6080(89)90020-8. ISSN 0893-6080. [Link]
90020-8.
Hung, C.Y., Chen, W.C., Lai, P.T., Lin, C.H., Lee, C.C., July 2017. Comparing deep neural
Ahmad, M.W., Mourshed, M., Rezgui, Y., 2017. Trees vs neurons: comparison between
network and other machine learning algorithms for stroke prediction in a large-scale
random forest and ann for high-resolution prediction of building energy consump-
population-based electronic medical claims database. In: In 2017 39th Annual
tion. Energy Build. 147, 77–89. ISSN 0378-7788. [Link]
International Conference of the IEEE Engineering in Medicine and Biology Society
2017.04.038. [Link]
(EMBC), pp. 3110–3113. [Link]
S0378778816313937.
Kozlovskaia, N., Zaytsev, A., 2017th. Deep ensembles for imbalanced classification. In:
Artemov, A., Burnaev, E., 2015. Ensembles of detectors for online detection of transient
Machine Learning and Applications (ICMLA), 2017 16th IEEE International
changes. In: Eighth International Conference on Machine Vision (ICMV 2015), vol.
Conference on, pp. 908–913.
9875. International Society for Optics and Photonics, pp. 98751Z.
LaBelle, D., 2001. Lithological Classification by Drilling. Carnegie Mellon University, The
Bilen, Hakan, Vedaldi, Andrea, 2017. Universal Representations: the Missing Link be-
Robotics Institute.
tween Faces, Text, Planktons, and Cat Breeds.
LaBelle, D., Bares, J., Nourbakhsh, I., 2000. Material classification by drilling. In:
Burnaev, E., Erofeev, P., Papanov, A., 2015. Influence of resampling on accuracy of im-
Proceedings of the International Symposium on Robotics and Automation in
balanced classification. In: Eighth International Conference on Machine Vision (ICMV
Construction, Taipei, Taiwan.
2015), vol. 9875. International Society for Optics and Photonics, pp. 987521.

515
N. Klyuchnikov, et al. Journal of Petroleum Science and Engineering 178 (2019) 506–516

Mishnaevsky, L.L., 1993. A brief review of Soviet theoretical approaches to dynamic rock framework. In: 2015 IEEE International Congress on Big Data, pp. 191–198. https://
failure. In: International Journal of Rock Mechanics and Mining Sciences & [Link]/10.1109/BigDataCongress.2015.35.
Geomechanics Abstracts, vol. 30. Elsevier, pp. 663–668. van der Maaten, Laurens, Hinton, Geoffrey, 2008. Visualizing data using t-sne. J. Mach.
Mishnaevsky Jr., L.L., 1995. Physical mechanisms of hard rock fragmentation under Learn. Res. 9 (Nov), 2579–2605.
mechanical loading: a review. In: International Journal of Rock Mechanics and Zaytsev, A., 2016th. Reliable surrogate modeling of engineering data with more than two
Mining Sciences & Geomechanics Abstracts, vol. 32. Elsevier, pp. 763–766. levels of fidelity. In: Mechanical and Aerospace Engineering (ICMAE), 2016 7th
Shor, R.J., Pryor, M., van Oort, E., 2014. Drillstring vibration observation, modeling and International Conference on. IEEE, pp. 341–345.
prevention in the oil and gas industry. In: Dynamic Systems and Control Conference. Zaytsev, A., Burnaev, E., 2017. Large scale variable fidelity surrogate modeling. Ann.
ASME DSCC2014-6147, page V003T37A004. Math. Artif. Intell. 81 (1–2), 167–186.
Spanos, P.D., Chevallier, A.M., Politis, N.P., October 2002. Nonlinear stochastic drill- Zhou, H., Hatherly, P., Monteiro, S., Ramos, F., Oppolzer, F., Nettleton, E., 2010. A
string vibrations. ASME J. Vib. Acoust. 124 (4), 512–518. Hybrid Gp Regression and Clustering Approach for Characterizing Rock Properties
Sugiura, J., Samuel, R., Oppelt, J., Ostermeyer, G.-P., Hedengren, J., Pastusek, P., 2015. from Drilling Data. Technical Report ACFR-TR-2011-001.
Drilling Modeling and Simulation: Current State and Future Goals. Society of Zhou, H., Hatherly, P., Ramos, F., Nettleton, E., 2011. An adaptive data driven model for
Petroleum Engineers (SPE-173045-MS). characterizing rock properties from drilling data. In: Robotics and Automation
Taleb, I., Dssouli, R., Serhani, M.A., June 2015. Big data pre-processing: a quality (ICRA), 2011 IEEE International Conference on. IEEE, pp. 1909–1915.

516

You might also like