0% found this document useful (0 votes)
20 views23 pages

Electronics 10 02329 v2

Uploaded by

v84pjfghns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views23 pages

Electronics 10 02329 v2

Uploaded by

v84pjfghns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

electronics

Article
Acoustic Anomaly Detection of Mechanical Failures in Noisy
Real-Life Factory Environments
Yuki Tagawa 1 , Rytis Maskeliūnas 1, * and Robertas Damaševičius 2

1 Department of Multimedia Engineering, Kaunas University of Technology, 51423 Kaunas, Lithuania;


[email protected]
2 Department of Software Engineering, Kaunas University of Technology, 51423 Kaunas, Lithuania;
[email protected]
* Correspondence: [email protected]

Abstract: Anomaly detection without employing dedicated sensors for each industrial machine is
recognized as one of the essential techniques for preventive maintenance and is especially important
for factories with low automatization levels, a number of which remain much larger than autonomous
manufacturing lines. We have based our research on the hypothesis that real-life sound data from
working industrial machines can be used for machine diagnostics. However, the sound data can be
contaminated and drowned out by typical factory environmental sound, making the application of
sound data-based anomaly detection an overly complicated process and, thus, the main problem
we are solving with our approach. In this paper, we present a noise-tolerant deep learning-based
methodology for real-life sound-data-based anomaly detection within real-world industrial machin-
ery sound data. The main element of the proposed methodology is a generative adversarial network

 (GAN) used for the reconstruction of sound signal reconstruction and the detection of anomalies. The
experimental results obtained in the Malfunctioning Industrial Machine Investigation and Inspection
Citation: Tagawa, Y.;
Maskeliūnas, R.; Damaševičius, R.
(MIMII) show the superiority of the proposed methodology over baseline approaches based on the
Acoustic Anomaly Detection of One-Class Support Vector Machine (OC-SVM) and the Autoencoder–Decoder neural network. The
Mechanical Failures in Noisy proposed schematics using the unscented Kalman Filter (UKF) and the mean square error (MSE) loss
Real-Life Factory Environments. function with the L2 regularization term showed an improvement of the Area Under Curve (AUC)
Electronics 2021, 10, 2329. https:// for the noisy pump data of the pump.
doi.org/10.3390/electronics10192329
Keywords: failure detection; condition monitoring; sound-based anomaly detection; predictive main-
Academic Editors: Jingdong Chen tenance; industrial machinery; signal reconstruction; noise analysis; generative adversarial network
and Chiman Kwan

Received: 26 August 2021


Accepted: 18 September 2021
1. Introduction
Published: 23 September 2021
Anomaly detection, or novelty detection, is a well-studied topic in data science [1]
Publisher’s Note: MDPI stays neutral
with various applications. The technique has recently received further attention due to
with regard to jurisdictional claims in
the development of the Internet of Things (IoT) and the following explosive growth of big
published maps and institutional affil- data and to rapid improvement of machine learning techniques, especially deep learning,
iations. in the last decade. Anomaly detection is recognized as one of the essential techniques
in an application for preventive maintenance of the industrial machine [2] as well as for
predictive maintenance of useful life (or time to failure) [3] and quality control [4]. Anomaly
detection of industrial machinery relies on various diagonal data from equipped sensors,
Copyright: © 2021 by the authors.
such as temperature, pressure, electric current, vibration, and sound, to name a few. Among
Licensee MDPI, Basel, Switzerland.
these data, sound data are easy to collect in the factory due to the relatively low installation
This article is an open access article
cost of microphones to existing facilities, and various approaches have been studied [5–8].
distributed under the terms and Failure sounds can be associated with a distinguishable fault sound signature, varying
conditions of the Creative Commons in a dedicated frequency range and harmonics. For example, low-frequency range is often a
Attribution (CC BY) license (https:// factor in defining shifts in rotational speed up to lower harmonics, containing information
creativecommons.org/licenses/by/ about unbalance, misalignment, failing bearing, and general mechanical construction
4.0/). shifts. The medium frequency range can be used to define failure in multipart mechanisms,

Electronics 2021, 10, 2329. https://doi.org/10.3390/electronics10192329 https://www.mdpi.com/journal/electronics


Electronics 2021, 10, 2329 2 of 23

such as gearboxes, indicating wear or an upcoming failure by a shift in its harmonics.


High frequency ranges, for example, can indicate steam flow or other similar failure.
Often the noises are so varying in their characteristics (e.g., railway sounds) that they
become “unconventional noises” and are usually neglected in noise modelling [9]. The
key problem and the main subject of this study is the real-life case of noisy environments
drowning these failure sounds. Suddenly, these noise characteristics become hard to
detect as background noise. This is known to exacerbate diagnosis, and the fact that the
sound data can be readily contaminated by environmental sound makes the application
of sound data-based anomaly detection complicated. Therefore, the development of a
noise-tolerant machine learning methodology is crucial for the application of sound-data-
based anomaly detection in a real factory. We believe that a side effect of such a “feature
hunt” in extremely noise-contaminated signals can also benefit human well-being studies,
as was found in the case of analyzing noise-contaminated environments signals of noise
contaminated environments inducing annoyance [10] affecting work performance [11,12]
and learning [13], and leading to cognitive performance decline [14] and even increased
blood pressure [15], hypertension [16] or myocardial infarction [17].
The main objective of the study is to improve the accuracy in classifying normal and
anomalous conditions of industrial machines based on noisy sound data by proposing a
novel model and algorithm for anomaly detection from industrial noisy data.
The paper is structured as follows. In Section 2, we provide an overview of precedent
works related to anomaly detection using machine learning and deep learning. In Section 3,
we describe the dataset and our methodology. In Section 4, we provide an outline of the
experiments conducted and the results achieved. In Section 5, we present a comparison
with other work and discussion. In Section 6, we conclude by pointing out the directions
for further work.

2. Related Works
In this section, we describe anomaly detection techniques using machine learning.
Anomaly detection addresses the problem of discovering patterns in data that do not
replicate the expected behavior [18]. These non-standard patterns are called anomalies,
outliers, and exceptions. No matter how it is called, the common principle is to measure
the extent of the difference between normal and anomalous data numerically.

2.1. Analysis of Industrial Machinery Data for Predictive Maintenance


The majority of existing production lines’ equipment can provide valuable data, which
may then be examined and the resulting knowledge applied more efficiently. The standard
preventive maintenance becomes predictive because of this knowledge. This strategy,
known as Maintenance 4.0, may therefore better address issues that develop, including
those that are not known ahead of time. Predictive maintenance (PdM) [19] is one of the key
components of Maintenance 4.0, while one of the crucial parts of PdM is anomaly detection,
which can be applied, for example, on the temperature characteristic of the technological
process measured in real-time and analyzed using a neural network [20], or by monitoring
the sounds produced by the milling process using spectral analysis and K-means clustering
algorithms [21]. When applied in an unsupervised way, the approach can be used for
predicting the remaining useful life in the absence of available run-to-failure data, as was
done in [22] using the autoencoder based methodology to analyze the vibrations of a robotic
arm. Skoczylas et al. [23] used a diagnostic feature extracted from the spectral coefficients
of the acoustic signal to identify the faulty operation of the rotating elements of the belt
conveyor using the autocorrelation characteristics. Ho et al. [24] suggested using Blind
Source Separation as a signal decomposition approach to analyze vibration data of rotating
bearings for the detection of fault patterns and signatures. Mey et al. [25] adopted a step-
by-step integration of classifications obtained from vibration and acoustic emission sensors
to incorporate information from low and high frequency signals collected from a system of
a motor train and bearings with some artificial damages. The results show that utilizing the
Electronics 2021, 10, 2329 3 of 23

suggested approach of integrating classifiers for vibrations and acoustic emissions, damage
classification may be improved. Serradilla et al. [26] employed the feature vector of the
autoencoder’s latent space to cluster data collected from a press machine of a stamping
production line. The explainable artificial intelligence techniques were used to track the
autoencoder’s loss on input data to detect anomalous work conditions. More works on the
analysis of vibration and acoustic data for early fault diagnostics of industrial machinery
are discussed in the review paper [27,28].
From a technology perspective the problem of failure analysis is also related to the
robust speaker identification methods, focusing on a segregation of sounds from different
acoustics mixtures, especially in low quality signals [29]. Williamson et al. tackled this
problem by applying an estimate of the real and imaginary components of the complex
ideal ratio mask with a good performance versus more traditional methods [30]. This
problem is particularly expressed in very noisy environments similar to those in our study.
Ayhan et al. showed that by a combination of mask estimation, gammatone features with
bounded marginalization dealing with unreliable features with a classic Gaussian mixture
model may lead to an improvement in distinguishing the lead signal [31].
Several techniques and models have been proposed which should be selected consid-
ering the characteristics of the data, the behavior of anomalous data, and the purpose of
the application. We categorize anomaly detection techniques into signal processing-based
methods, machine learning methods, and deep learning methods.

2.2. Signal Processing Based Methods


Getting meaningful information from noisy data is a classical subject in the field of
geoscience and medical sciences, to name a few, where the experimental data are usually
low Signal-to-Noise Ratio (SNR) due to inevitable environmental noise. A prevalent noise
reduction method is the application of the filter to the sample. Some types of filters, such as
high-pass, band-pass, low-pass, and median filters [32], are utilized to select the designated
frequency or amplitude. This technique is easy to build in and widely used in applications,
but there is the risk to unintendedly eliminate necessary signals if the sound data has a
low SNR or the sound data is unknown. One of the most typical methods for the detection
of statistical anomaly detection is based on the control chart, with applications for the
monitoring of industrial machine and bearing monitoring [33].
Another approach to noise reduction is based on multivariate analysis. Independent
component analysis (ICA) [34] is a powerful idea for multivariant data that has been al-
ready utilized in the biomedical signal and image domain, such as electroencephalography
and magnetic resonance imaging [35] and geosciences for train noise separation [36]. ICA
relies on the underlying assumption that a received signal is a combination of mutually
independent signals. The independence among the source signals is evaluated in the
Kullback–Leibler cost function. ICA is formulated for noiseless cases; therefore, techniques
for real-world data such as adding noise terms, which have mutually independent com-
ponents, and using semiparametric approaches, were proposed [35]. Empirical mode
decomposition (EMD) is another method for the analysis of multi-component signals that
have been used to de-noise jitter noise in telecommunication signals [37]. Spectral anal-
ysis was applied in [38] to perform a vibration analysis of a fan motor. Random matrix
theory was applied to the imaging of the sensor array imaging perturbed by measurement
noise [39]. The theory assumes that the distribution of eigenvalue of a product of random
matrix to itself is converged to the Marcenko–Pastur distribution at a large scale, and this
can provide information of threshold for selecting signals and noise.

2.3. Machine Learning-Based Methods


Classification-based methods are generally supervised anomaly detection. In this
approach, a model or classifier is trained from a set of labeled data instances, and the learned
model is used to classify test instances. Both multiclass and one-class anomaly detection
techniques are available. Multiclass anomaly detection is a technique that assumes that
Electronics 2021, 10, 2329 4 of 23

training data contains labeled instances that belong to multiple normal classes. The model
has to learn a classifier to identify the normal class against all other classes. If test data
are not classified as normal by any of the classifiers learned by the model, then they are
considered an outlier. This technique gives their prediction a confidence score. Therefore,
this technique applies to data whose normal classes are known.
The distribution-based method is used to model the distribution of normal data. The
probabilistic model is used to identify data with a different distribution of its features. As
the data space has high dimensionality, the distance cannot be measured in the Euclidean
way and therefore various measurement methodologies were proposed, such as the Local
Outlier Factor (LOF) as a density-based method [40], and the Nearest-Neighborhood as a
distance-based method [41]. Hsu et al. [42] employed density-based spatial clustering of
applications with noise algorithms to identify abnormal state in wind turbine data. Then,
random forest and decision tree algorithms were used to construct to predict wind turbine
anomalies. Toma et al. [43] suggested a hybrid technique that uses statistical features,
genetic algorithms (GA), and machine learning models (KNN, random forest, and decision
tree) to diagnose motor current faults.
These classical approaches are already recognized as proven techniques. If the input
data are simple, these techniques are still the first choice for the application. However,
complicated data such as image recognition community and audio processing may exceed
the modeling assumptions of these machine learning techniques.

2.4. Deep Learning-Based Methods


The advent of deep learning techniques for anomaly detection has improved the results
of traditional methods. Deep learning is based on an artificial neural network model. Deep
learning promises to train hierarchical models that represent probability distributions over
input data. The recent development in both hardware and neural models, especially in the
last decade, has overcome the challenges, making artificial intelligence a thriving field with
many practical applications and active research topics.
One of the successful methods using deep learning is a reconstruction-based method [44].
The fundamental idea behind the methods is that the normal condition can be reconstructed
accurately from a reduced latent space interim of neural network architecture, whereas
anomalous conditions cannot be reconstructed embracing larger reconstruction losses. This
fashion is suitable for anomaly detection, where the volume of anomalous condition data
is generally much smaller than normal condition data because a model for detection can
be trained only using the normal condition data. Deep one-class (DOC) is an approach
inspired by kernel-based one-class classification and minimum volume estimation and the
training of a neural network while minimizing the volume of a hypersphere that encloses the
network representations of the data [45]. Minimizing the volume of the hypersphere forces the
network to extract the common factors of variation, and anomalies can be detected if the test
instance is plotted out of the boundary of the hypersphere. Luwei et al. [46] used a two-stage
ANN model for the classification of rotating machines faults based on real-life vibration data.
Zhao et al. [47] used a deep autoencoder (DAE) network, model. The parameters of the model,
acquired by learning normal operational supervisory control and data acquisition (SCADA)
data from wind turbines, we used for fault detection of turbine components. Dongo et al. [48]
suggested regression-based abnormality decision using manifold learning with autoencoder.
The approach has been validated on the sound data of the operating machine. Cheng et al. [49]
extracted the characteristics of the time, frequency, and time-frequency domain. Feature
selection was performed using a Euclidean distance. Next, adaptive kernel spectral clustering
(AKSC) was used to find machine anomaly behaviors, and deep long- and short-term memory
recurrent neural networks (LSTM-RNN) were used to predict the failure time of the machine.
Li et al. [50] proposed a Deep Small-World Neural Network (DSWNN) to detect early failures
of wind turbines based on anomaly in turbine sensor data.
In summary of the related work, a general observation is that deep learning is expected
to outperform traditional machine learning for anomaly detection in big data [51].
short-term memory recurrent neural networks (LSTM-RNN) were used to predict the
failure time of the machine. Li et al. [50] proposed a Deep Small-World Neural Network
(DSWNN) to detect early failures of wind turbines based on anomaly in turbine sensor
data.
Electronics 2021, 10, 2329 In summary of the related work, a general observation is that deep learning5 of 23is
expected to outperform traditional machine learning for anomaly detection in big data
[51].

2.5.
2.5.Generative
Generative Adversarial
AdversarialNetwork-Based
Network-BasedMethods
Methods
The
Thecentral
centralidea
ideaofofGenerative
Generativeadversarial
adversarialnetworks
networks(GAN)
(GAN)isisthat
thataagenerator
generatortrained
trained
with normal data poses high reconstruction loss when trying to generate
with normal data poses high reconstruction loss when trying to generate an anomalous an anomalous
image.
image.Discriminative
Discriminative models
models map
mapaahigh-dimensional
high-dimensional input
inputto
toaaclass
classlabel
labelfor
forpattern
pattern
recognition
recognition [52]. Anomaly detection using GANs emerged recently but has alreadyshown
[52]. Anomaly detection using GANs emerged recently but has already shown
promising
promisingperformances,
performances,especially
especiallyfor
forbig
bigand
andcomplicated
complicateddata.
data.InInthe
thereconstruction
reconstruction
context,
context, GAN
GAN is also applied
is also appliedfor
foranomaly
anomalydetection
detection(AnoGAN)
(AnoGAN) [53,54].
[53,54]. ForFor example,
example, Wu
Wu et al. [55] suggested a probabilistic adversarial generative auto-encoder for
et al. [55] suggested a probabilistic adversarial generative auto-encoder for machine fault machine
fault classification
classification of machines.
of machines. WeWe think
think theseapproaches
these approachesare
are applicable
applicable forfor anomaly
anomaly
detection with audio data, as our concern is to measure the difference between normal
detection with audio data, as our concern is to measure the difference between normal
and anomalous. Zhang et al. [56] proposed a multi-index generative adversarial network
and anomalous. Zhang et al. [56] proposed a multi-index generative adversarial network
(MI-GAN) to detect tool wear from imbalanced sensor signal data.
(MI-GAN) to detect tool wear from imbalanced sensor signal data.
3. Materials and Methods
3. Materials and Methods
3.1. Methodology
3.1. Methodology
In this research, the purpose is to improve the robustness of anomaly detection
in theIndomain
this research, the purpose
of stationary is to
valves improve
and the robustness
slide rails (Figure 1).of anomaly
Figure 2 detection
illustratesinthe
the
domain of of
schematics stationary valves
the network and slide
applied rails (Figure
for anomaly 1). Figure
detection 2 illustrates
in acoustic data. Inthe
theschematics
research,
of the
our network are
experiments applied forout
carried anomaly
on the detection
MIMII data in set
acoustic
[57], asdata. In the research,
it is explained our
in further
experiments
sections. are carried out on the MIMII data set [57], as it is explained in further sections.

Dataset Filtered Dataset Input Features Anomaly Detection Results and


Comparison
Filtered Data Based
Autoencoder- AUC based on
Normal Normal Filtered Data
UKF STFT Train Decoder
Train Step
Test
AUC based on
Anomaly Anomaly Raw Data
Raw Data Based Autoencoder- (REPRODUCTIVE)
Decoder
Train
Test Step
STFT&
(REPRODUCTIVE WORK) logMel Test Baseline AUC
provided in the
Literature by the
Dataset Provider
MATLAB Process Python Process
Electronics 2021, 10, x FOR PEER REVIEW 6 of 24

Figure1.1.The
Figure Theworkflow
workflowofofexperiment.
experiment.

Normal ! Anomaly !
Normal Input Output

· ·
· · · · · · · · ·
· · · · · · · ·
· · ·
· · · · · · · · ·
· ·

Anomaly

Figure
Figure2.
2.Illustration
Illustrationof
ofaaneural
neuralnetwork
networkapplied
appliedfor
foranomaly
anomaly detection
detection in
in acoustic
acoustic data.
data.

3.2. Datasets
In 2019, researchers at the Japanese manufacturing company Hitachi Co. Ltd. in-
troduced a new dataset of Industrial Machine Inspection and Inspection Malfunction
Investigation and Inspection (MIMII) [58]. The data set consists of four distinct types of
machinery: valves, pumps, fans, and slide rails. The data set is provided in the waveform
audio file (.wav) format. The audio data consist of machine sound and noise. The noise is
real factory environment sound, and it is artificially mixed with the pure machine sound
at several levels of signal-noise ratio (SNR): 6 dB, 0 dB, and −6 dB. The machine sound
is recoded for both normal and abnormal conditions. There is no label on the abnormal
In 2019, researchers at the Japanese manufacturing company Hitachi Co. Ltd. intro‐
duced a new dataset of Industrial Machine Inspection and Inspection Malfunction Inves‐
tigation and Inspection (MIMII) [58]. The data set consists of four distinct types of ma‐
chinery: valves, pumps, fans, and slide rails. The data set is provided in the waveform
Electronics 2021, 10, 2329 audio file (.wav) format. The audio data consist of machine sound and noise. The noise 6 of 23is
real factory environment sound, and it is artificially mixed with the pure machine sound
at several levels of signal‐noise ratio (SNR): 6 dB, 0 dB, and −6 dB. The machine sound is
recoded for both normal and abnormal conditions. There is no label on the abnormal con‐
condition sound data except that they explained the abnormal indicates various troubles.
dition sound data except that they explained the abnormal indicates various troubles. As
As a result, the characteristics of the data set can be described by the type of machinery and
a result, the characteristics of the data set can be described by the type of machinery and
SNR. The machine sound is recorded in 16 (bit) at a sampling rate of 16,000 (Hz) and a.wav
SNR. The machine sound is recorded in 16 (bit) at a sampling rate of 16,000 (Hz) and a.wav
file is a segment of 10 (s); accordingly, the file of one segment consists of 160,000 samples
file is a segment of 10 (sec); accordingly, the file of one segment consists of 160,000 samples
of time frames. The list of pump sound files is reported in Table 1. The pump sound data
of time frames. The list of pump sound files is reported in Table 1. The pump sound data
set consists of four different pumps, labeled Model ID00, 02, 04, and 06. The number of
set consists of four different pumps, labeled Model ID00, 02, 04, and 06. The number of
segments for the normal condition of each machine is seven to ten times larger than that of
segments for the normal condition of each machine is seven to ten times larger than that
the anomalous conditions.
of the anomalous conditions.
Table 1. Contents of MIMII dataset.
Table 1. Contents of MIMII dataset.
Segments for Normal Segments for Anomalous
Model
Model ID ID
Segments for Normal Condition
Condition Segments for Anomalous Condition
Condition
ID00 ID00 1006 1006
143 143
ID02 1005 111
ID02 1005 111
ID04 702 100
ID04 702 100
ID06 1036 102
ID06 1036 102
3.3. Feature Engineering
3.3. Feature Engineering
The feature engineering in the experiment follows the recommendations of the data
The feature
set provider, thatengineering in the experiment
is, each segment of waveform follows
soundthe recommendations
data of the
is processed in Fast data
Fourier
set provider, that(FFT)
Transformation is, each
andsegment of waveform
then applied sound data is processed
the logMelspetctrogram. in FastisFourier
This process shown
Transformation
illustratively in (FFT)
Figureand3. then applied the logMelspetctrogram. This process is shown
illustratively in Figure 3.

Figure3.3.Schematics
Figure Schematicsof
ofthe
theaudio
audiodata
dataprocess
processfrom
fromtime-frequency
time‐frequencytotolog-Mel
log‐Melspectrogram.
spectrogram.

The
The data
dataset
setprovider
providerdeveloped
developedthe theinput
inputfeature
featureby
bycombining
combiningfive
fiveframes
framesand
and
made a 320-dimensional feature vector for the autoencoder. On the other hand, we
made a 320‐dimensional feature vector for the autoencoder. On the other hand, we havehave
developed
developedananinput
inputfeature
featurefor
foraasuitable
suitableformat
formatfor
formodels
modelsweweare
aregoing
goingtotostudy.
study.
3.4. Problem Formulation and Signal Processing
Let X, Gθ as an STFT of signal (spectrogram in time-frequency space) and a filter with
parameter theta, respectively:
G = Gθ ( X ) (1)

Here we applied φ( x ) = k x k2 as the penalty term. The underlying concept to apply


the norm is that the minimum energy term should be selected in the case of several roots.
Based on the previous works, it was found that Kalman Filter and penalized loss
function produced a better AUC. Considering that the noise is recorded in a real factory, it is
natural to consider that the noise is non-Gaussian. Non-linear filtering, such as Unscented
Electronics 2021, 10, 2329 7 of 23

Kalman Filter (UKF), would be more suitable than KF (linear system). In non-linear filtering,
it is essential to consider an approximation filter. The posterior Cramer–Rao inequality is:
n o
E [ x̆t/t−1 − x̆t ][ x̆t/t−1 − x̆t ] T ≥ Jt/t
−1
−1 ( x t ), (2)
n o
E [ x̆t/t − x̆t ][ x̆t/t − x̆t ] T ≥ Jt/t
−1
( x t ), (3)

The Tikhonov regularization or diagonal loading was:


h i
x̂ = argminF = argmin ky − Hx k2 + ξφ( x ) (4)
x x

In this study, we applied φ( x ) = k x k2 as the penalty term. The underlying concept to


apply the norm is that the minimum energy term should be selected in case several roots
exist. The root of the equation is as follows:
  −1
x̂ = H T H + ξ I HT y (5)

which can be represented in singular vectors as:

N N
1
x̂ = ∑ γ2 + ξ v j vTj ∑ γk uk ukT ( Hx + ε) (6)
j =1 j k =1

γ2j
" #
N N γj  
= ∑ γ2 + ξ v j vTj x+ ∑ γ2 + ξ u Tj ε v j , (7)
j =1 j j =1 j

The first term of the equation is the signal, and the second term is noise. The amplifi-
cation of noise is suppressed by γ j .

γ2j
" #
N
E( x̂ ) = ∑ γ2 + ξ v j vTj x (8)
j =1 j

This is not an impartial estimator, but taking into account that ∑ N T


j=1 v j v j = I, the
equation is approximated as:
N γ2j
∑ γ2 + ξ v j vTj ≈ I (9)
j =1 j

and the E( x̂ ) is approximated to x̂.

3.5. Signal Processing


The signal can be described as a nonlinear discrete system

x t +1 = f t ( x t ) + w t , (10)

yt = ht ( xt ) + vt , (11)
where xt ∈ Rn is a state vector.
The state estimation program is defined as finding the optimized estimator x̂t+m/t
which minimize the Bayes risk,
n o
J = E k xt+m − x̂t+m/t k2 , m = 0, 1, (12)

x̂t+m/t = E xt+m Y t ,

(13)
Electronics 2021, 10, 2329 8 of 23

The observation step is:

p ( y t | x t ) p x t Y t −1

p xt Y t =

, (14)
p ( y t |Y t − 1 )

and time updating step is


 Z
p x t +1 Y t = p( xt+1 | xt ) p xt Y t dxt ,

(15)
Rn

The Unscented Kalman Filter (UKF) performs an approximation of posterior proba-


bilistic density function (PDF) with normal distribution, where PDF is defined by:
 
1 1
p( x ) = q exp − ( x − x ) T Px−1 ( x − x ) (16)
(2π )n | Px | 2

To approximate a posterior PDF, UKF uses an unscented transformation (UT). We


describe UT hereby for preparation of UKF. We consider a non-linear mapping function
f : <n → <n which transforms n-dimensional random variables n dimensional x to n-
dimensional random variables y,
y = f (x) (17)
Let x be the mean of x, and Px be the covariance matrix of x. The problem can be
defined as computing the first- and second order moments of y.

X0 = x, (18)
√  √
Xi = x + n+κ
Px , (19)
i
√  √ 
X n +i = x − n + κ Px , (20)
i
√ 
where κ is a scaling parameter and Px is the i-th column of the square root of matrix
Px . Px is the positive determinant. The matrix square root is computed by Cholesky
factorization or singular value decomposition. Then, weights on each sigma point are
given as
κ
w0 = , (21)
n+κ
1
wi = , i = 1, 2, . . . , 2n, (22)
2( n + κ )
where the weights are normalized to suffice ∑2n
i =0 wi = 1.

Yi = f (Xi ), i = 0, 1, . . . , 2n (23)

By using Yi , the first order and second order moments of the transformed y, mean y
and covariance matrix Py , respectively, can be computed as

2n
y= ∑ wi Y i , (24)
i =0

2n
Py = ∑ wi (Yi − y)(Yi − y)T (25)
i =0

3.6. Dimension Reduction with PCA and T-SNE


Principal component analysis (PCA) is a commonly used and proven technology in
various image processing tasks such as compression, denoising, and quality assessment.
Electronics 2021, 10, 2329 9 of 23

It uses singular value decomposition (SVD) of the data to mat it to a lower-dimensional


space. In our data analysis, we used it to reduce the high-dimensional log-Mel spectrogram
features to two-dimensional space for visualization. We use the t-Distributed Stochastic
Neighbor Embedding (t-SNE) technique [59], which is a technique for dimensionality
reduction that is highly fit for the visualization of high-dimensional datasets.

3.7. One-Class Support Vector Machine (OC-SVM)


The one-class support vector machine (OC-SVM) is a widely used classification-based
methodology to discover novelties unsupervised way [60]. OC-SVM is a special case of
SVM, which learns a hyperplane to separate all the data points from the origin in a feature
space corresponding to the kernel and maximizes the margin from this hyperplane to
the origin. The expectation is that anomalous test data will have OC-SVM fits for outlier
detection. The model is first trained using normal condition data. The model learns to
keep these training data away from the origin in the coordination. Thus, a hyperplane is
established to separate the area of normal condition area. With the trained model, test data
of anomalous condition data are supposed to be plotted near the origin in the coordination.
If the plotted data are inside of the hyperplane, the data are detected as an anomaly.

3.8. Autoencoder-Decoder Neural Network


The output of the neural network is shown in the formula as:
 
x̂ ( x ) = fe We f (Wx +) + b , (26)

Here, x is the input of the neural network. In case the size of a latent layer is smaller
than that of the input layer, W and W e which minimize the loss function are substantially
identical to these parameters which can be obtained by analysis of the principal component
analysis. Autoencoder worksis deterministically, except for the random sampling process
in SDG.
Figure 4 illustrates the schematics of the autoencoder–decoder network. The encoder
network E(·) has three fully connected layers with the ReLU activation function. The
decoder network D (·) incorporates three fully connected layers with the ReLU activation
function, where FC ( a, b, f ) means a fully connected layer with a input neurons, output
neurons, and activation function f . To train the network, the Adam optimization technique
is used to minimize the loss function of the least squares as follows:

L AE (θe , θd ) = k x − D (( x |θe )|θd )k22 (27)

where θe , θd are the parameters of the encoder and decoder networks, respectively.

3.9. Neural Network Auto-Encoder-Decoder with LSTM


We implemented the autoencoder–decoder neural network with long-short-term
memory (LSTM). The input features are the same as the baseline. The architecture has
the LSTM layer and five more hidden layers (see Figure 5). The output of the LSTM layer
is transferred to the autoencoder–decoder architecture, which is similar to the baseline
architecture (Figure 6). The reconstruction loss function is MSE. Training was carried out
for 50 epochs.
FC (320x64)
ReLU
Electronics 2021, 10, 2329 10 of 23
Electronics 2021, 10, x FOR PEER REVIEW FC (64x64) 11 of 24
Encoder
ReLU

FC (64x8)
ReLU
Loss Function
FC (8x64) (MSE)
FC (320x64)
ReLU
ReLU

FC (64x64)
Decoder FC (64x64)
Encoder ReLU
ReLU

FC (64x320)
(64x8)
ReLU
Loss Function
FC (8x64) (MSE)
ReLU
Figure 4. Architecture of the autoencoder–decoder neural network.
FC (64x64)
Decoder ReLU
3.9. Neural Network Auto-Encoder-Decoder with LSTM
We implemented the autoencoder–decoder neural network with long-short-term
memory (LSTM). TheFCinput (64x320)
features are the same as the baseline. The architecture has the
LSTM layer and five more hidden layers (see Figure 5). The output of the LSTM layer is
transferred to the autoencoder–decoder architecture, which is similar to the baseline
architecture (Figure 6). The reconstruction loss function is MSE. Training was carried out
for 50 epochs.
Figure 4. Architecture of the autoencoder–decoder neural network.
Figure 4. Architecture of the autoencoder–decoder neural network.

3.9. Neural Network Auto-Encoder-Decoder with LSTM


We implemented the autoencoder–decoder neural network with long-short-term
memory (LSTM). The input features are the same as the baseline. The architecture has the
LSTM layer and five more hidden layers (see Figure 5). The output of the LSTM layer is
transferred to the autoencoder–decoder architecture, which is similar to the baseline
architecture (Figure 6). The reconstruction loss function is MSE. Training was carried out
for 50 epochs.

Figure 5. Front-end architecture of the convolutional autoencoder–decoder neural network with


Figure
LSTM. 5. Front-end architecture of the convolutional autoencoder–decoder neural network with
LSTM.

Figure 5. Front-end architecture of the convolutional autoencoder–decoder neural network with


LSTM.
Electronics 2021,
Electronics
Electronics 10,10,
2021,
2021, x FOR
10, PEER
x FOR
2329 REVIEW
PEER REVIEW 12 12
of 24
11of
of 24
23

Figure
Figure
Figure 6. Back-end
6. Back-end
6. Back-end architecture
architecture of the
of the
architecture of the convolutional
convolutional autoencoder–decoder
autoencoder–decoder
convolutional neural
neural
autoencoder–decoder network.
network.
neural network.

3.10. Generative Adversarial Network


3.10. Generative
3.10. Generative Adversarial
Adversarial Network
Network
Another epoch of deep neural network architecture progress is a generative adversarial
Another
Anotherepoch epochof ofdeep deepneural
neuralnetwork
networkarchitecture
architectureprogress
progressis isa agenerative
generative
network (GAN) [52]. GAN is categorized as a generative model and is a framework for
adversarial network (GAN) [52]. GAN is categorized as a
adversarial network (GAN) [52]. GAN is categorized as a generative model and generative model and is is
a a
the estimation of generative models via an adversarial process in which two models,
framework
framework forfor
thethe
estimation
estimation of of
generative
generative models
models viavia
ananadversarial
adversarial process
processin in
which
which
a discriminator and a generator, are trained simultaneously. The generator generates
twotwomodels,
models, a discriminator
a discriminator andanda generator,
a generator, areare
trained
trainedsimultaneously.
simultaneously. The generator
The generator
counterfeit images based on input noise, and the discriminator judges an input image as
generates
generates counterfeit
counterfeit images
images based
basedononinput
inputnoise, and
noise, thethe
and discriminator
discriminator judges
judgesanan
input
input
an original or the counterfeit one. The learning process in the original GAN framework is
image
image as as
ananoriginal or or
thethe
counterfeit one. The learning process in in
thethe
original GAN
recognized as a Min–Max game where a generator and a discriminator are optimizedGAN
original counterfeit one. The learning process original with
framework
framework is recognized
is recognized as a Min–Max game where a generator and a discriminator are
a value function V ( D, G ) as a Min–Max
formulated as: game where a generator and a discriminator are
optimized
optimized with
witha value
a valuefunction
function 𝑉(𝐷,
𝑉(𝐷,𝐺)𝐺) formulated
formulated as:as:
min maxV ( D, G ) = Ex∼ Pdata( x) [log D ( x )] + Ez∼ Pz(z) [log(1 − D ( G (z)))], (28)
𝑚𝑖𝑛 G 𝑚𝑎𝑥
𝑚𝑖𝑛 D 𝑉(𝐷,
𝑚𝑎𝑥 𝐺)𝐺)
𝑉(𝐷, == 𝔼𝔼 ∼ ∼ ( )( 𝑙𝑜𝑔 𝐷(𝑥)
) 𝑙𝑜𝑔 𝐷(𝑥)+ + 𝔼 ∼𝔼 ∼( ) ( 𝑙𝑜𝑔 1 −1 −
) 𝑙𝑜𝑔 𝐷𝐷 𝐺(𝑧)
𝐺(𝑧) , , (28)
(28)

where
where thethe input
input noise
noise
input variables
variables
noise 𝑃𝑧(𝑧)
are
areare
variables 𝑃𝑧(𝑧)
Pz(zand
) and
and aa mapping
a mapping
mapping to to
thethe
to datadata space
space is represented
is represented
𝐺(𝑧;
as
as as 𝜃𝑔).
𝐺(𝑧;
G (z; θg)𝐷(𝑥)
𝜃𝑔). 𝐷(𝑥)
. D ( xrepresents
) represents
representsthethe probability
probability
the that
that𝑥 xcame
𝑥 came
came from
from
fromthe the data
data
the data rather
rather
rather than
than from
from
the
thethe generator.
generator.
generator.
Here,
Here,
Here, wewe
we use
useuseaa deep
adeep convolutional
convolutional
deep generative
convolutional generative
generativeadversarial
adversarial
adversarialnetwork
networkfor for
network anomaly detec-
anomaly
for anomaly
tion (AnoGAN).
detection (AnoGAN). The architectural
The diagram
architectural diagramof the
of network
the network is presented
is in
presented
detection (AnoGAN). The architectural diagram of the network is presented in Figure Figure
in 7.
Figure 7. 7.

Figure 7. The
Figure
Figure 7. architectural
7. The
The diagram
architectural
architectural of of
diagram
diagram thethe
of Deep
the Convolutional
Deep
Deep Generative
Convolutional
Convolutional Adversarial
Generative
Generative Network
Adversarial
Adversarial forfor
Network
Network Anomaly
for Detection
Anomaly
Anomaly Detection
Detection
(AnoGAN).
(AnoGAN).
(AnoGAN).
Electronics 2021, 10, x FOR PEER REVIEW 13 of 24

Electronics 2021, 10, 2329 12 of 23


3.11. Optimization
The Dice score coefficient (DSC) is a measure of overlap that is used to assess
segmentation performance when a ground truth is available. We use the 2-class variant of
3.11. Optimization
DSC, which expresses the overlap between two classes A and B as:
The Dice score coefficient (DSC) is a measure of overlap that is used to assess segmen-
tation performance when a ground2|𝐴 ∩ truth 2 ∑ 𝑝 𝑔 We use the 2-class variant of DSC,
𝐵| is available.
𝐷𝑆𝐶(𝐴, 𝐵) = = . (29)
|𝐴| + |𝐵|two classes
which expresses the overlap between ∑ 𝑝 +A∑and𝑔 B as:

2| A ∩ B | 2 ∑N p g
3.12. Evaluation DSC ( A, B) = = N 2 i i Ni 2 . (29)
| A| + | B| ∑ i p i + ∑ i gi
The performance of anomaly detection is measured in the index of AUC which is a
proven technique to evaluate binary classifier output quality used in communication
3.12. Evaluation
engineering. In the evaluation
The performance of anomaly process, the is
detection receiver
measuredoperating characteristic
in the index of AUC (ROC)
which isis
plotted
a proven based on the to
technique false positive
evaluate rate and
binary true positive
classifier rate. Theused
output quality AUCiniscommunication
defined by the
area of the curve.
engineering. AUC hasprocess,
In the evaluation a rangethe ofreceiver
0 to 1.operating
The higher AUC means
characteristic (ROC)the higher
is plotted
performance
based on the false positive rate and true positive rate. The AUC is defined by the areathe
of binary classification, and 0.5 means that the discriminator judges of
result randomly.
the curve. AUC has a range of 0 to 1. The higher AUC means the higher performance of
binary classification, and 0.5 means that the discriminator judges the result randomly.
3.13. Development Environment
3.13.The
Development
machineEnvironment
specification was the following: 8 core Intel Core i9 CPU, Processor
clock—2.4 GHz, No.
The machine of processors—1,
specification was theand RAM—32GB.
following: 8 core Intel Core i9 CPU, Processor
clock—2.4 GHz, No. of processors—1, and RAM—32 GB.
4. Experimental Analysis
4. Experimental
4.1. Data Analysis Analysis
4.1. Data Analysis
We conducted an initial data analysis on the dataset. Figure 8 shows the frequency
We conducted
and log-Mel an initial
spectrogram data
in the analysis
time domainonfigures
the dataset.
of oneFigure 8 shows
of the.wav filesthe
of frequency
6 dB SNR
and log-Mel spectrogram in the time domain figures of one of the.wav files of
in the data set. A pump in normal condition operation contains high-intensity components 6 dB SNR in
thethe
in data set. A pump
frequency bandinofnormal
50 Hz tocondition operation
1 kHz. At contains high-intensity
the high-frequency band, randomlycomponents in
scattered
the frequency band of 50 Hz to 1 kHz. At the high-frequency band, randomly scattered
components are observed, which are supposed to be environment noise. In contrast, a
components are observed, which are supposed to be environment noise. In contrast, a
pump in anomalous condition showed a sudden change of sound, which implies pump
pump in anomalous condition showed a sudden change of sound, which implies pump
trouble.
trouble.

Normal Anomalous

7.36 -30 7.36

-40 -40
Frequency (kHz)
Frequency (kHz)

3.68 3.68
-50
Power (dB)
Power (dB)

-50

1.68 -60
1.68
-60

-70
0.589 0.589 -70
-80
0.0555 0.0555
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Time (s) Time (s)

Figure 8. Example of the normalized amplitude


amplitude of
of the
the pump
pump ID:
ID: 06
06 at
at SNR
SNR 66 dB,
dB, frequency
frequency in
in the
the time
time domain
domain (top
(top row)
row)
and corresponding power spectrogram (bottom row) on normal condition (left column) and anomalous condition (right
column).
column).

Likewise, Figure 9 shows the frequency and power spectrogram in the time domain
figures of one of the wav files of −6 dB SNR in the data set. In normal conditions, the
frequency band in the range of 50 Hz to 1 kHz is corrupted, and its boundaries become
unclear compared to those of the sound data with 6 dB SNR. The component in the broad
Electronics 2021, 10, x FOR PEER REVIEW 14 of 24

Likewise, Figure 9 shows the frequency and power spectrogram in the time domain
Electronics 2021, 10, 2329 13 of 23
figures of one of the wav files of −6 dB SNR in the data set. In normal conditions, the
frequency band in the range of 50 Hz to 1 kHz is corrupted, and its boundaries become
unclear compared to those of the sound data with 6 dB SNR. The component in the broad
domainof
domain of high
high frequency
frequency is
ishighlighted
highlightedbecause
becauseofofitsits
lowlowSNR.
SNR.The anomalous
The anomalouscondition
condition
data, in this case, shows hunching every 2 s. The anomalous condition
data, in this case, shows hunching every 2 s. The anomalous condition visualized visualized in the
in the
time-frequency figure is ambiguous due to less −6 dB SNR, but the log-Mel
time-frequency figure is ambiguous due to less −6 dB SNR, but the log-Mel spectrogram spectrogram
seemsto
seems tohave
have successfully
successfully highlighted
highlightedthe
thetransition
transition ofof sound
sound components,
components, which differ
which differ
from the corresponding normal condition. Note that in the dataset the data
from the corresponding normal condition. Note that in the dataset the data are labeled are labeled
only as normal and anomaly. No further description of this anomalous condition is given.
only as normal and anomaly. No further description of this anomalous condition is given.
Therefore, the anomalous condition needs to be detected as outlier data from the normal
Therefore, the anomalous condition needs to be detected as outlier data from the normal
condition.
condition.

Normal Anomalous

7.36 -30 7.36 -35

-40
-40
Frequency (kHz)
Frequency (kHz)

3.68 3.68
-45

Power (dB)
Power (dB)

-50
-50
1.68 1.68
-60 -55

0.589 -60
0.589 -70
-65
0.0555 -80 0.0555
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Time (s) Time (s)

Figure 9. 9.Example
Figure Exampleof
of normalized amplitudeofofthe
normalized amplitude the pump
pump ID:ID: 06 −6
06 at −6SNR,
at dB dB SNR, frequency
frequency in timeindomain
time domain (top
(top row), androw),
and corresponding
corresponding power
power spectrogram
spectrogram (bottom
(bottom row)
row) on on normal
normal condition
condition (left
(left column)
column) andand anomalous
anomalous condition
condition (right
(right
column).
column).

4.2.
4.2.Results
ResultsofofDimensionality
Dimensionality Reduction
Reduction
The
The PCA of ofthe
thesignals
signalswas
was performed
performed using
using the Python
the Python librarylibrary scikit-Learn,
scikit-Learn, versionver-
sion 0.22.1.
0.22.1. Figure
Figure 10 shows
10 shows graphs
graphs of theofnormal
the normal condition
condition and anomalous
and anomalous condition
condition data indata
ina atwo-dimensional
two-dimensional space
space reduced
reduced from
from thethe64 64 × 313
× 313 features
features obtained
obtained by by
thethe log-Mel
log-Mel
spectrogram
spectrogram using PCA.Pumps
using PCA. Pumpsunder
undernormal
normal conditions
conditions andand anomalous
anomalous conditions
conditions at 6 at
6dBdBSNR
SNRareareprojected
projectedtotodifferent
differentclusters
clustersinin
aa two-dimensional
two-dimensional space.
space. In contrast,
In contrast, bothboth
normalcondition
normal conditionandandanomalous
anomalous condition
condition sound
sound datadata are distributed
are distributed ontoonto similar
similar regions,
regions,there
despite despite there seeming
seeming to be someto be some clustering.
clustering. The resultTheimplies
result implies
the data theofdata
highofSNR
highcan
beSNR can be conducted
conducted in anomaly in anomaly
detection detection by conventional
by conventional clustering
clustering methods methods
such assuch as
k-mean
Electronics 2021, 10, x FOR PEER REVIEW
k-mean clustering,
clustering, but low
but low SNR SNR
data needdata
to need to be scrutinized
be scrutinized by otherbymethods
other methods
whichwhich 15
canof 24
can embrace
embrace nonlinearity
nonlinearity and reflectand reflect high-dimension
high-dimension information
information for detection.
for detection.

Figure 10. A pump ID: 06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections of the 64 ×
Figure 10. A pump ID: 06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections of the 64 × 313 log-
313 log-Mel spectrogram feature onto the 2D space by PCA. The symbols of blue and red represent the normal condition,
Mel spectrogram feature onto the 2D space by PCA. The symbols of blue and red represent the normal condition, and
and anomalous
anomalous condition,
condition, respectively.
respectively.

We also applied the stochastic neighborhood embedding method based on t


distribution (t-SNE) method to reduce the dimension.
Figure 11 shows plots of the normal condition and anomalous condition data in two-
Figure 10. A pump ID: 06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections of the 64 × 313 log-
Electronics 2021, 10, 2329 14 of 23
Mel spectrogram feature onto the 2D space by PCA. The symbols of blue and red represent the normal condition, and
anomalous condition, respectively.

Wealso
We also applied
applied the stochastic
the stochastic neighborhood
neighborhood embedding
embedding methodmethod
based onbased on t
t distribu-
distribution (t-SNE) method to reduce
tion (t-SNE) method to reduce the dimension. the dimension.
Figure1111shows
Figure showsplots
plotsofofthe
thenormal
normalcondition
conditionand
andanomalous
anomalouscondition
conditiondata
datainintwo-
two-
dimensional space reduced from the 64 × 313 features obtained
dimensional space reduced from the 64 × 313 features obtained by the log-Mel spectrogramby the log-Mel
spectrogram
using using was
t-SNE. t-SNE t-SNE. donet-SNE
usingwasthedone using
library the libraryversion
scikit-Learn, scikit-Learn,
0.22.1.version
The data0.22.1.
at
The data at 6 dB SNR are clearer clustered than the plot obtained
6 dB SNR are clearer clustered than the plot obtained by PCA dimension reduction. Theby PCA dimension
reduction.
data at −6 dBThe SNRdata at −6 dB
showed SNR showed
a cluster a cluster
of anomaly of anomaly
condition condition
data but most ofdata
the but
datamost
wereof
the data were projected with a less clear boundary between normal
projected with a less clear boundary between normal condition and anomalous condition. condition and
anomalous condition. t-SNE shows good anomaly detection performance
t-SNE shows good anomaly detection performance for data with high-SNR but noisy data for data with
high-SNR
require but
other noisy data
methods, such require other methods, such as PCA.
as PCA.

Figure 11. A pump ID06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections
Figure 11. A pump ID06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections
of 64 × 313 dimensions log-Mel spectrogram features onto the estimated 2D space by t-SNE. The
of 64 × 313 dimensions log-Mel spectrogram features onto the estimated 2D space by t-SNE. The
blue
blueand
andred
reddots
dotsrepresent
representthe
thenormal
normalcondition
conditionand
andthe
theanomalous
anomalouscondition,
condition,respectively.
respectively.

The above results possess all the information of 10 (s) in one segment. Following
The above results possess all the information of 10 (sec) in one segment. Following
the reproduction work, we also applied PCA and t-SNE dimensional reduction for 320-
the
Electronics 2021, 10, x FOR PEER REVIEW reproduction work, we also applied PCA and t-SNE dimensional reduction for 16 of320-
24
dimensional log-Mel spectrogram features. Figures 12 and 13 show the data plots embed-
dimensional log-Mel spectrogram features. Figure 12 and Figure 13 show the data plots
ded in a 2D space by using PCA and t-SNE, respectively.
embedded in a 2D space by using PCA and t-SNE, respectively.

Figure 12. A pump ID: 06 operation sound data of 6 dB SNR 6 dB (left) and −6 dB SNR (right).
Figure 12. A pump ID: 06 operation sound data of 6 dB SNR 6 dB (left) and −6 dB SNR (right).
Projections of the 320-dimension log-Mel spectrogram feature onto the 2D space by PCA. The blue
Projections of the 320-dimension log-Mel spectrogram feature onto the 2D space by PCA. The blue
and
andred
redsymbols
symbolsrepresent
representthe
thenormal
normalcondition
conditionand
andthe
theanomalous
anomalouscondition,
condition,respectively.
respectively.

Figure 13. A pump ID06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections
Electronics 2021, 10, 2329 Figure 12. A pump ID: 06 operation sound data of 6 dB SNR 6 dB (left) and −6 dB SNR 15 of 23
(right).
Projections of the 320-dimension log-Mel spectrogram feature onto the 2D space by PCA. The blue
and red symbols represent the normal condition and the anomalous condition, respectively.

Figure 13. A pump ID06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections
Figure 13. A pump ID06 operation sound data of 6 dB SNR (left) and −6 dB SNR (right). Projections
of
of the
the 320-dimensional
320-dimensional log-Mel
log-Mel spectrogram
spectrogramfeature
featureonto
ontothe
theestimated
estimated2D
2Dspace
spacebybyt-SNE.
t-SNE.The
Theblue
blue
and
and red
red symbols
symbols represent
representthe
thenormal
normalcondition
conditionand
andthe
theanomalous
anomalouscondition,
condition,respectively.
respectively.

The 320-dimensional features represent a short period of 50/313 (s) out of 10 (s) as
The 320-dimensional features represent a short period of 50/313 (sec) out of 10 (sec)
we discussed in session 3.1. The plot embedded in a 2-dimensional space using PCA
as we discussed in session 3.1. The plot embedded in a 2-dimensional space using PCA
showed a similar result as that of 313 × 64-dimensional features. On the contrary, the plot
showed a similar result as that of 313 × 64-dimensional features. On the contrary, the plot
embedded into the 2D space using t-SNE showed a broader cluster compared to that of
embedded into the 2D space using t-SNE showed a broader cluster compared to that of
the 313 × 64-dimensional features for the data at 6 dB SNR but the cluster is still clearly
the 313 × 64-dimensional features for the data at 6 dB SNR but the cluster is still clearly
separated between normal data and anomalous data. For the data at −6 dB SNR, the
separated between normal data and anomalous data. For the data at −6 dB SNR, the
clustering of each condition seems effective in comparison to that of 313 × 64-dimensional
clustering of each condition seems effective in comparison to that of 313 × 64-dimensional
features. It is implied that the impact of noise can be alleviated by focusing on a short
features. It is implied that the impact of noise can be alleviated by focusing on a short
period of time.
period of time.
4.3. Results of the Autoencoder as the Baseline Model
4.3. Results of the Autoencoder as the Baseline Model
As a baseline model, we used an autoencoder. The dataset provider presented the
As a baseline
benchmark model,
results with thewe useddeveloped
model an autoencoder.
by using Thethedataset provider
Keras library, presented
and the
we instead
benchmark
used PyTorch results with the model
to double-check developed
the feature by usingprocess
engineering the Keras
andlibrary, and wenetwork
deep neural instead
used PyTorch
models from the to double-check the feature
different approaches. Theengineering process and
anomaly detection wasdeep neural for
performed network
each
models from
segment the differentthe
by thresholding approaches. The error
reconstruction anomaly detection
averaged overwas
10 s.performed for each
The network was
segment by thresholding the reconstruction error averaged over 10 s. The
trained using the Adam optimization technique for 50 epochs to minimize the loss function.network was
trained using the Adam optimization technique for 50 epochs to minimize
The results are given in Table 2 and Figure 14. Our result supported the benchmark the loss
function.
result and the trend that noisy data exacerbate the failure detection performance. Moreover,
in theThe resultsofare
majority given
cases, wein Table
have 2 and Figure
managed 14. Our
to improve theresult supported
SNR value the benchmark
over the benchmark
result and
values, the trend
especially whenthat noisy datavalue
the benchmark exacerbate
was low.the failure detection performance.
Moreover, in the majority of cases, we have managed to improve the SNR value over the
benchmark
Table values,ofespecially
2. Comparison when the
the SNR evaluated in benchmark value
the research with thewas low. SNR presented by the
benchmark
data set provider.

Benchmark Reproduced Result in Initial Data Analysis


Input SNR Input SNR
Model ID
6 dB 0 dB −6 dB 6 dB 0 dB −6 dB
ID00 0.84 0.65 0.58 0.84 0.67 0.70
ID02 0.45 0.46 0.52 0.64 0.43 0.50
ID04 0.99 0.95 0.93 0.99 0.99 0.91
ID06 0.94 0.76 0.61 0.91 0.86 0.59
Benchmark Reproduced result in Initial Data Analysis
Input SNR Input SNR
Model ID
6 dB 0 dB −6 dB 6 dB 0 dB −6 dB
ID00 0.84 0.65 0.58 0.84 0.67 0.70
Electronics 2021, 10, 2329 ID02 0.45 0.46 0.52 0.64 0.43 0.50
16 of 23
ID04 0.99 0.95 0.93 0.99 0.99 0.91
ID06 0.94 0.76 0.61 0.91 0.86 0.59

1.00 ID00-Benchmark

0.90 ID02-Benchmark

ID04-Benchmark
0.80
ID06-Benchmark

AUC [-]
0.70 ID00-Repro

ID02-Repro
0.60
ID04-Repro
0.50
ID06-Repro

0.40
6 dB 0 dB -6 dB
Input SNR [-]

Figure 14.
Figure 14. The
The benchmark
benchmark AUC
AUC and
and the
the results
results of
of reproduction
reproduction results.
results.

4.4. Results
Resultsofofthe
theOne-Class
One-ClassSupport Vector
Support Machine
Vector as a Baseline
Machine Model Model
as a Baseline
We
We conducted
conducted an an unsupervised
unsupervised outlier
outlier detection
detection method
method forfor the
the One-Class
One-Class Support
Support
Vector
Vector Machine (OC-SVM). OC-SVM was tested on machine ID: 06 by using the library
Machine (OC-SVM). OC-SVM was tested on machine ID: 06 by using the library
scikit-Learn,
scikit-Learn, version
version 0.22.1.
0.22.1.
The model
model was
was trained
trained with
with normal
normal condition
condition data,
data, excluding
excluding data
data for
for testing.
testing. The
The
model was tested
tested using
using the
the same
same sets
sets of
of normal
normal and
and abnormal
abnormal conditions.
conditions. The
The detection
detection
success
success rate
rate is
is evaluated
evaluated by by the
the boundary
boundary determined
determined by by the
the trained
trained model.
model. Training
Training andand
testing were conducted for both features of 64 × 313 and 320
testing were conducted for both features of 64 × 313 and 320 dimensions. dimensions.
The
The results
results are
are presented
presented in in Table
Table 3.
3. We
Weobserved
observed that
that the
the OC-SVM
OC-SVM determines
determines the the
boundary
boundary of normal condition conservatively for both the feature dimensions, and
of normal condition conservatively for both the feature dimensions, and this
this
makes
makes itit difficult
difficult to
to screen
screen anomalous
anomalous conditions.
conditions.
Table 3. Successfully selected rate of the normal and anomalous condition data of pump ID: 06 using
Table 3. Successfully selected rate of the normal and anomalous condition data of pump ID: 06 using
OC-SVM.
OC-SVM.

Feature Performance Measure Input SNRInput SNR


Feature DimensionsPerformance Measure
Dimensions 6 dB 6 dB
0 dB 0 dB−6 dB
−6 dB
Accuracy
Accuracy of Normal Condition Data
of Normal 0.91 0.93 0.96
64 × 313 0.91 0.93 0.96
Condition Data
Accuracy of Anomalous Condition Data 0.78 0.40 0.58
64 × 313
Accuracy
Accuracy of Normal Condition Data
of Anomalous 0.96 0.88 0.93
320 0.78 0.40 0.58
Condition Data
Accuracy of Anomalous Condition Data 0.68 0.42 0.56
Accuracy of Normal
0.96 0.88 0.93
4.5. Results Condition Data
320 of the Autoencoder with LSTM
We have evaluated Accuracy of Anomalous
the autoencoder with LSTM architecture 0.42
on the dataset.0.56
Training
0.68
Condition
was carried out for 50 epochs. The Data
reconstruction loss function used was MSE. Table 4
displays the results of the AUC. This architecture enhanced AUC for clean sound data (6
4.5.
dB),Results of the Autoencoder
while exacerbated AUCwith LSTMsound data (−6 dB). The result implies that if the
for noisy
SNRWe of have
sound is high
evaluated theenough, thenwith
autoencoder LSTM
LSTMwhich incorporates
architecture time-directional
on the dataset. Training
was carried out for 50 epochs. The reconstruction loss function used was MSE. Table 4
displays the results of the AUC. This architecture enhanced AUC for clean sound data
(6 dB), while exacerbated AUC for noisy sound data (−6 dB). The result implies that if the
SNR of sound is high enough, then LSTM which incorporates time-directional information
works well. On the other hand, if the SNR of sound is low, LSTM cannot extract meaningful
information from the noisy data.
Electronics 2021, 10, x FOR PEER REVIEW 18 of 24

Electronics 2021, 10, 2329 17 of 23


information works well. On the other hand, if the SNR of sound is low, LSTM cannot
extract meaningful information from the noisy data.

Table 4. AUC evaluated by the autoencoder with LSTM on the pump ID:
ID: 06
06 at
at SNR of −
SNR of −66 dB.
dB.

InputInput
SNR SNR
Model
Model ID ID
6 dB 6 dB 0 dB 0 dB −6−6dBdB
ID06 ID06 0.9537 0.9537 TBA TBA 0.5941
0.5941

4.6. Results of the Generative Adversarial Network for Anomaly Detection (ANOGAN)
4.6. Results of the Generative Adversarial Network for Anomaly Detection (ANOGAN)
We tested a deep-convolutional generative adversary network for anomaly detection
We tested a deep-convolutional generative adversary network for anomaly detection
(AnoGAN) on the data set to understand how convolution works in sound data and the
(AnoGAN) on the data set to understand how convolution works in sound data and the
overall trend overall the segment time interval of 10 (sec). The input feature is prepared
overall trend overall the segment time interval of 10 (sec). The input feature is prepared
by converting the log-Mel spectrogram into a jpeg figure with a librosa built-in function.
by converting the log-Mel spectrogram into a jpeg figure with a librosa built-in function.
Pump
Pump ID:ID: 06
06 is
is used
used for the testing
for the testing at
at each
each input
input SNR
SNR value.
value. Therefore,
Therefore, the the jpeg figure
jpeg figure
contains
contains log-Mel spectrogram information for 10 (sec). The converted jpeg figures have aa
log-Mel spectrogram information for 10 (sec). The converted jpeg figures have
pixel
pixel size
size of
of 640
640 ×× 480
480and
andRGB
RGBas asshown
shown inin Figure
Figure 15.
15. Figures
Figures areare converted
converted and
and resized
resized
to
to 64
64 ××64
64pixels
pixelsand
andthen
thennormalized
normalizedtotothe therange
rangeofof
0 to 1 to
0 to fitfit
1 to the GAN.
the GAN. The model
The modelis
developed with Python PyTorch library. Training was carried out for
is developed with Python PyTorch library. Training was carried out for 50 epochs. The 50 epochs. The
Logistic
Logistic Loss
Loss function
function was
was used
used to
to measure reconstruction loss.
measure reconstruction loss.

Figure
Figure 15.
15. Example
Example of
of aa log-Mel
log-Mel spectrogram
spectrogram of pump ID: 06 at 6 dB SNR of normal condition (left)
and
and anomalous condition (right) in
anomalous condition (right) in image
image format.
format.

Table 55 shows
showsthetheresult
resultofofthe
the AnoGAN.
AnoGAN. TheThe
AUC AUC is lower
is lower thanthan 0.5 and
0.5 and indicates
indicates that
that
AnoGANAnoGAN does
does not not inwork
work in the One
the dataset. dataset.
of theOne of thereasons
potential potential reasons
is that is that
compressing
compressing
the jpeg file from 640 ×
the jpeg file from
480 × ×64480
to 64640 losttodata
64 ×in64a short
lost data
timeininterval.
a short The
timeother
interval. The
possible
other
reasonpossible
is that thereason is that
overall 10 (s)the
dataoverall
are too10large
(sec)todata areoperating
depict too largeinformation.
to depict operating
information.
Table 5. AUC evaluated by ANOGAN on the pump ID: 06.
Table 5. AUC evaluated by ANOGAN on the pump ID: 06.
Input SNR
Model ID
6 dB Input SNR
6 dB −6 dB
Model ID
ID06 0.44 6 dB 0.466 dB −60.41
dB
ID06 0.44 0.46 0.41
Table 6 shows the results of AUC for the various preprocessing methods and loss
Tablein6the
functions shows the results of AUC
autoencoder–decoder for the
neural various
network preprocessing
on the sound data ofmethods
the pump and ID:loss
06
functions in the autoencoder–decoder neural network on the sound data of
at SNR of −6 dB. The proposed schematics using UKF and MSE with the L2 regularization the pump ID:
06
termat showed
SNR ofan−6improvement
dB. The proposed
of AUC schematics
for the noisyusing
pumpUKF data and MSE(ID:
of pump with theSNR
06 at L2
regularization
−6 dB) from 0.7633term showed antoimprovement
(baseline) 0.7907 (usingofMSEAUC for L2
with theregularization).
noisy pump data of results
The pump
(ID: 06 atthat
implied SNR −6data
the dB) preprocessing
from 0.7633 (baseline) to 0.7907filters
by the adaptive (using MSE
has withon
impact L2the
regularization).
performance
The results implied that the data preprocessing by the adaptive filters
of anomaly detection using a neural network; hence, the loss function should has impact on the
be designed
performance of anomaly detection using a neural network;
in accordance with the design of the applied adaptive filters. hence, the loss function should
be designed in accordance with the design of the applied adaptive filters.
Electronics 2021,
Electronics 10, x2329
2021, 10, FOR PEER REVIEW 1918ofof 24
23

Summaryofof AUC
Table 6. Summary
Table AUC for
for the
the various
various preprocessing
preprocessing methods
methods and
and loss
loss functions
functions in
in the
the
autoencoder–decoder
autoencoder–decoderneural
neuralnetwork
networkonon thethe
sound
sounddata of the
data pump
of the ID: 06
pump ID:at06SNR of −6 of
at SNR dB.−KF—
6 dB.
Kalman Filter.Filter.
KF—Kalman UKF—Unscented Kalman
UKF—Unscented Filter.Filter.
Kalman MSE—mean
MSE—meansquare error.error.
square
Pre-Processing
Pre-Processing LossLoss
Function
Function AUC
AUC
0.7633 ± 0.0239
0.7633 ± 0.0239
Raw (unprocessed) MSEMSE
(Baseline)
(Baseline)
Raw (unprocessed) MSEMSE 0.7644
0.7644 ±±0.0165
0.0165
MSE with L2 Regularization
MSE with L2 Regularization 0.7909
0.7909 ±±0.0192
0.0192
MSEMSE 0.7764
0.7764 ±±0.0250
0.0250
KF KF
MSE MSE
withwith
L2 Regularization
L2 Regularization 0.7898
0.7898 ±±0.0200
0.0200
MSEMSE 0.7644
0.7644 ±±0.0165
0.0165
UKFUKF
MSE MSE
withwith
L2 Regularization
L2 Regularization 0.7909
0.7909 ±±0.0192
0.0192

4.7. Analysis of Misclassifications


4.7. Analysis of Misclassifications
Among the normal-condition dataset, we successfully detected normal condition
Among the normal-condition dataset, we successfully detected normal condition
with a minimum reconstruction error of 2848, 00000659.wav. On the other hand, we
with a minimum reconstruction error of 2848, 00000659.wav. On the other hand, we
mistakenly
mistakenly detected
detected as
asanomalous
anomalouscondition
conditionwith
withthe
thehighest
highestreconstruction
reconstructionerror
errorof
of6214,
6214,
00000038.wav.
00000038.wav. These
Thesesound
sound data
data are
are shown
shown in
in Figure
Figure 16.
16. The
The data
data from
from 00000659.wav
00000659.wav
showed
showed aa momentary
momentary loudloud sound
sound at
at 44 seconds elapsed.
s elapsed.

0000659.wav: Successfully detected as normal condition 0000038.wav: Wrongly detected as anomalous condition

Figure
Figure 16.
16. Examples
Examples of
of spectrum
spectrum images
images of
of the
the normal condition data
normal condition data for
for pump
pump ID
ID 06
06 at
at SNR
SNR −
−66dB.
dB.

Likewise,
Likewise, among
among the
the anomalous-condition
anomalous-condition dataset,
dataset, successfully
successfully detected
detected anomalous
anomalous
condition
condition with the highest
highest reconstruction
reconstructionerror
errorofof6736
6736is is 00000077.wav.
00000077.wav. The
The incorrectly
incorrectly de-
detected anomalous
tected anomalous condition
condition with
with the the reconstruction
lowest lowest reconstruction error
error of 2738 was of 2738 was
00000005.wav.
00000005.wav.
These sound data These sound shown
are visually data are visually
in Figure 17.shown in Figure
In the case 17. In thesomewhat
of 00000077.wav, case of
00000077.wav, somewhat periodic peaks each 2 (sec) can be observed.
periodic peaks each 2 (s) can be observed. This periodic anomalous information enabled This periodic
anomalous information
the autoencoder enabled
to detect the autoencoder
the anomaly. In contrast, to the
detect
casethe anomaly. In contrast,
of 00000005.wav the
shows that
case of 00000005.wav
the signal informationshows that the
is covered signal
with information
background is covered with background noise.
noise.
Electronics 2021, 10, 2329 19 of 23
Electronics 2021, 10, x FOR PEER REVIEW 20 of 24

0000077.wav: Successfully detected as anomalous condition 0000005.wav : Wrongly detected as normal condition

Figure 17.
Figure 17. Examples
Examples of
of spectrum
spectrum images
images of
of the
the anomalous
anomalous condition
condition data
data for
for pump
pump ID
ID 06
06 at
at SNR
SNR −
−66 dB.
dB.

5.
5. Discussion
Discussion and and Comparison
Comparison with with Similar
Similar Works
Works
Purohit
Purohit et et al. [58] presented the benchmark performance of unsupervised anomaly
detection
detection for for the dataset using the autoencoder-based model, assuming that anomalous
data
data cannot
cannot be be reconstructed
reconstructed from a compressed
compressed representation layer in the model trained
by normal condition
by normal conditiondata dataonly.
only.In In
thethe benchmark
benchmark experiment
experiment setup,setup, the Log-Mel
the Log-Mel spec-
spectrogram is considered
trogram is considered as an
as an input
input feature.
feature. TheThespectrogram
spectrogramisisbased
basedon on the conditions:
conditions:
frame
frame size
size 1024;
1024;hophopsize
size512,
512,and
andMel Melfilters
filters64.
64.This
Thisgenerates
generates 313313
frames
frames in time andand
in time 64
64 cells
cells for for
thethe frequency
frequency domain,
domain, where
where thethe total
total features
features areare × 64×in
313313 64one
in one segment
segment of 10-of
10-s sound data. The five frames in time are combined to initiate
second sound data. The five frames in time are combined to initiate a 320-dimension a 320-dimension input
feature vector. Therefore, an input
feature input feature
feature represents
represents50/313
50/313 (s) time
(sec) timedomain.
domain. The rest
The of
rest
thethe
of normal
normal segments
segments is aistest dataset.
a test dataset.
The training
The trainingof ofthe
themodel
modelisisconducted
conducted using
using normal
normal condition
condition soundsound data,
data, andandthe
the test
test is conducted
is conducted using
using anomalous
anomalous condition
condition dataand
data andnormal
normalcondition
conditionsoundsound data,
data,
excluding the
excluding the data used for training. The performance of anomaly detection is evaluated
by the Curve
by the Curve (AUC). (AUC). TheyThey concluded
concluded that that nonstationary
nonstationary machinery,
machinery, suchsuch asas slide
slide rails
rails
and valves, and noisy data, that is, low input SNR in the context,
and valves, and noisy data, that is, low input SNR in the context, is the key challenge in is the key challenge in
anomaly detection
anomaly detection of of this
this machinery.
machinery. The The impact
impact of of noise
noise onon performance
performance is is implied
implied in in
Table 7.
Table As an
7. As an instance
instance of of stationary
stationary machines,
machines, the the pumps
pumps of of ID
ID 00
00 with
with 66 dBdB input
input SNR
SNR
showed an
showed an AUC
AUC of of 0.84, while −
0.84, while −66dB
dBinputinputSNRSNRshowsshows an an AUC
AUC of of 0.58.
0.58. The
The machine
machine ID: ID:
02 showed different behavior, but a reason was not stated
02 showed different behavior, but a reason was not stated in the literature. in the literature.

7. Comparison
Table 7.
Table Comparison ofof the
the AUC
AUC values
values of
of pumps
pumps with
with ID:
ID: 00,
00, 02,
02, 04,
04, and
and 06
06 at
at the
the input
input SNR
SNR of
of 66
dB, 0 dB, and − 6 dB.
dB, 0 dB, and −6 dB.

Input
Input SNR
SNR
Input
Input SNR[36]
SNR [36]
Model ID
Model ID (This Paper)
(This Paper)
6 6dB
dB 0 dB0 dB −6 dB − 6 dB 6 dB 6 dB 0 dB0 dB −6−dB
6 dB
ID00
ID00 0.84
0.84 0.650.65 0.58 0.58 0.8212 0.8212 0.6792
0.6792 0.6741
0.6741
ID02
ID02 0.45
0.45 0.460.46 0.52 0.52 0.5938 0.5938 0.5576 0.5576 0.5293
0.5293
ID04
ID04 0.99
0.99 0.950.95 0.93 0.93 0.9979 0.9979 0.9753 0.9753 0.9226
0.9226
ID06
ID06 0.94
0.94 0.760.76 0.61 0.61 0.9281 0.9281 0.7854
0.7854 0.6518
0.6518

6. Conclusions and Future Work


In this study, we proposed an anomaly detection system for the analysis of real-life
industrial machinery failure sounds. To our knowledge, few studies are focusing on the
Electronics 2021, 10, 2329 20 of 23

6. Conclusions and Future Work


In this study, we proposed an anomaly detection system for the analysis of real-life
industrial machinery failure sounds. To our knowledge, few studies are focusing on the
relationship between the data pre-processing and cost functions in neural network archi-
tecture. The proposed system consists of the preprocessing component, which applies the
Unscented Kalman Filter (UKF) for state estimation, and of the anomaly detection compo-
nent, which has an autoencoder–decoder neural network with Tikhonov regularization
(diagonal loading).
The results implied that the data preprocessing by the adaptive filters impacts the
performance of anomaly detection using a neural network; hence, the loss function should
be designed in accordance with the design of the applied adaptive filters.
The autoencoder–decoder model showed superior performance compared to other
classification techniques in noisy data analysis.
The results of this study suggest what acoustic detection of failures could be used for
Predictive Maintenance [61] of industrial machinery in the context of Industry 4.0. The
incorporation of acoustic new sensor technologies combined with deep learning methods
can be used to avoid premature replacement of equipment, saving maintenance costs,
improving machining process safety, increasing availability of equipment, and maintaining
the acceptable levels of performance [2]. The predictive maintenance system in smart
factories based on acoustic failure pattern recognition can serve as an early warning
system for managers, especially in high-risk industrial businesses. The ability to detect
weak signals with potentially substantial strategic implications is a welcome benefit of
process automation in the corporate world. Their key benefit is real-time management and
planning, which helps to cut down on the costs of production downtime [62].
Future work will focus on modeling deep neural networks reflecting local neighbor-
hood relationships, and on feature engineering for noise reduction in the low-SNR sound
dataset. We will explore the deep convolutional neural network approach to short-time
data instead of applying overall 10-second data, and modification to loss function to re-
flect neighborhood relationship in manifold learning of the autoencoder (metric learning
approach). Furthermore, we aim to investigate methods applicable to robust speaker
identification, especially those oriented at noisy environments, which might further help
improving the quality of acoustic fault detection, within industrial environments.

Author Contributions: Conceptualization, R.M.; methodology, R.M.; software, Y.T.; validation,


R.M. and R.D.; formal analysis, R.M. and R.D.; investigation, Y.T. and R.M.; resources, R.M.; data
curation, Y.T.; writing—original draft preparation, Y.T. and R.M.; writing—review and editing, R.D.;
visualization, Y.T. and R.M.; supervision, R.M.; funding acquisition, R.D. All authors have read and
agreed to the published version of the manuscript.
Funding: This research did not receive external funding.
Data Availability Statement: The MIMII dataset is openly available at: https://zenodo.org/record/
3384388 (accessed on 5 July 2021).
Acknowledgments: The authors express their gratitude to the creators of the MIMII dataset for
making the data available for research: Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki
Nikaido, Kaori Suefusa, and Yohei Kawaguchi.
Conflicts of Interest: The authors declare that they have no conflict of interest.

References
1. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [CrossRef]
2. Çınar, Z.M.; Nuhu, A.A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine learning in predictive maintenance towards
sustainable smart manufacturing in industry 4.0. Sustainability 2020, 12, 8211. [CrossRef]
3. An, Q.; Tao, Z.; Xu, X.; El Mansori, M.; Chen, M. A data-driven model for milling tool remaining useful life prediction with
convolutional and stacked LSTM network. Measurement 2020, 15, 107461. [CrossRef]
4. Lv, Y.; Liu, Y.; Jing, W.; Woźniak, M.; Damaševičius, R.; Scherer, R.; Wei, W. Quality control of the continuous hot pressing process
of medium density fiberboard using fuzzy failure mode and effects analysis. Appl. Sci. 2020, 10, 4627. [CrossRef]
Electronics 2021, 10, 2329 21 of 23

5. Kawaguchi, Y.; Endo, T. How can we detect anomalies from subsampled audio signals? In Proceedings of the 2017 IEEE
International Workshop on Machine Learning for Signal Processing, Tokyo, Japan, 25–28 September 2017. [CrossRef]
6. Kawaguchi, Y. Anomaly detection based on feature reconstruction from subsampled audio signals. In Proceedings of the 26th
European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–8 September 2018; pp. 2538–2542.
7. Marchie, E.; Vesperini, F.; Eyben, F.; Squartini, S.; Schuller, B. A novel approach for automatic acoustic novelty detection using a
denoising autoencoder with bidirectional LSTM neural networks. In Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), South Brisbane, Australia, 19–24 April 2015; pp. 1996–2000.
8. Koizumi, Y.; Saito, S.; Uematsu, H.; Harada, N. Optimizing acoustic feature extractor for anomalous sound detection based
on Neyman-Pearson lemma. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28
August–2 September 2017; pp. 698–702.
9. Licitra, G.; Fredianelli, L.; Petri, D.; Vigotti, M.A. Annoyance evaluation due to overall railway noise and vibration in Pisa urban
areas. Sci. Total Environ. 2016, 568, 1315–1325. [CrossRef]
10. Miedema, H.M.E.; Oudshoorn, C.G.M. Annoyance from transportation noise: Relationships with exposure metrics DNL and
DENL and their confidence intervals. Environ. Health Perspect. 2001, 109, 409–416. [CrossRef]
11. Vukić, L.; Fredianelli, L.; Plazibat, V. Seafarers’ Perception and attitudes towards noise emission on board ships. Int. J. Environ.
Res. Public Health 2021, 18, 6671. [CrossRef]
12. Rossi, L.; Prato, A.; Lesina, L.; Schiavi, A. Effects of low-frequency noise on human cognitive performances in laboratory. Build.
Acoust. 2018, 25, 17–33. [CrossRef]
13. Minichilli, F.; Gorini, F.; Ascari, E.; Bianchi, F.; Coi, A.; Fredianelli, L.; Licitra, G.; Manzoli, F.; Mezzasalma, L.; Cori, L. Annoyance
judgment and measurements of environmental noise: A focus on Italian secondary schools. Int. J. Environ. Res. Public Health 2018,
15, 208. [CrossRef]
14. Erickson, L.C.; Newman, R.S. Influences of background noise on infants and children. Curr. Dir. Psychol. Sci. 2017, 26, 451–457.
[CrossRef]
15. Dratva, J.; Phuleria, H.C.; Foraster, M.; Gaspoz, J.M.; Keidel, D.; Künzli, N.; Liu, L.J.; Pons, M.; Zemp, E.; Gerbase, M.W.; et al.
Transportation noise and blood pressure in a population-based sample of adults. Environ. Health Perspect. 2012, 120, 50–55.
[CrossRef]
16. Petri, D.; Licitra, G.; Vigotti, M.A.; Fredianelli, L. Effects of exposure to road, railway, airport and recreational noise on blood
pressure and hypertension. Int. J. Environ. Res. Public Health 2021, 18, 9145. [CrossRef]
17. Babisch, W.; Beule, B.; Schust, M.; Kersten, N.; Ising, H. Traffic noise and risk of myocardial infarction. Epidemiology 2005, 16,
33–40. [CrossRef]
18. Ajitha, P.; Chandra, E. Survey on outliers detection in distributed data mining for big data. J. Basic Appl. Sci. Res. 2015, 5, 31–38.
19. Calabrese, F.; Regattieri, A.; Bortolini, M.; Gamberi, M.; Pilati, F. Predictive maintenance: A novel framework for a data-driven,
semi-supervised, and partially online prognostic health management application in industries. Appl. Sci. 2021, 11, 3380.
[CrossRef]
20. Tanuska, P.; Spendla, L.; Kebisek, M.; Duris, R.; Stremy, M. Smart anomaly detection and prediction for assembly process
maintenance in compliance with industry 4.0. Sensors 2021, 21, 2376. [CrossRef]
21. Peng, C.-Y.; Raihany, U.; Kuo, S.-W.; Chen, Y.-Z. Sound detection monitoring tool in CNC milling sounds by K-means clustering
algorithm. Sensors 2021, 21, 4288. [CrossRef]
22. Kim, D.; Lee, S.; Kim, D. An applicable predictive maintenance framework for the absence of run-to-failure data. Appl. Sci. 2021,
11, 5180. [CrossRef]
23. Skoczylas, A.; Stefaniak, P.; Anufriiev, S.; Jachnik, B. Belt conveyors rollers diagnostics based on acoustic signal collected using
autonomous legged inspection robot. Appl. Sci. 2021, 11, 2299. [CrossRef]
24. Ho, S.K.; Nedunuri, H.C.; Balachandran, W.; Kanfoud, J.; Gan, T.-H. Monitoring of industrial machine using a novel blind feature
extraction approach. Appl. Sci. 2021, 11, 5792. [CrossRef]
25. Mey, O.; Schneider, A.; Enge-Rosenblatt, O.; Mayer, D.; Schmidt, C.; Klein, S.; Herrmann, H.-G. Condition monitoring of drive
trains by data fusion of acoustic emission and vibration sensors. Processes 2021, 9, 1108. [CrossRef]
26. Serradilla, O.; Zugasti, E.; Ramirez de Okariz, J.; Rodriguez, J.; Zurutuza, U. Adaptable and explainable predictive maintenance:
Semi-supervised deep learning for anomaly detection and diagnosis in press machine data. Appl. Sci. 2021, 11, 7376. [CrossRef]
27. Wei, Y.; Li, Y.; Xu, M.; Huang, W. A review of early fault diagnosis approaches and their applications in rotating machinery.
Entropy 2019, 21, 409. [CrossRef] [PubMed]
28. Saufi, S.R.; Ahmad, Z.A.B.; Leong, M.S.; Lim, M.H. Challenges and opportunities of deep Learning models for machinery fault
detection and diagnosis: A Review. IEEE Access 2019, 7, 122644. [CrossRef]
29. Wang, D.; Brown, G.J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications; Wiley/IEEE Press: Piscataway,
NJ, USA, 2006.
30. Williamson, D.S.; Wang, D. Speech dereverberation and denoising using complex ratio masks. In Proceedings of the 2017
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017;
pp. 5590–5594. [CrossRef]
Electronics 2021, 10, 2329 22 of 23

31. Ayhan, B.; Kwan, C. Robust speaker identification algorithms and results in noisy environments; Lecture Notes in Computer
Science. In Proceedings of the Advances in Neural Networks—ISNN 2018, Minsk, Belarus, 25–28 June; Huang, T., Lv, J., Sun, C.,
Tuzikov, A., Eds.; Springer: Cham, Switzerland; Volume 10878, pp. 443–450.
32. Zhang, M.; Guo, J.; Li, X.; Jin, R. Data-driven anomaly detection approach for time-series streaming data. Sensors 2020, 20, 5646.
[CrossRef] [PubMed]
33. Pittino, F.; Puggl, M.; Moldaschl, T.; Hirschl, C. Automatic anomaly detection on in-production manufacturing machines using
statistical learning methods. Sensors 2020, 20, 2344. [CrossRef] [PubMed]
34. Hyvarinen, A.; Karhunen, H.; Oja, E. Independent Component Analysis; Wiley-Interscience: Hoboken, NJ, USA, 2001.
35. Ikeda, S.; Toyama, K. Independent component analysis for noisy data—MEG data analysis. Neural Netw. 2000, 13, 1063–1074.
[CrossRef]
36. Koganeyama, M. An effective evaluation function for ICA to separate train noise from telluric current data. In Proceedings of the
4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 1–4
April 2003; pp. 837–842.
37. Damaševičius, R.; Napoli, C.; Sidekerskienė, T.; Woźniak, M. IMF mode demixing in EMD for jitter analysis. J. Comput. Sci. 2017,
22, 240–252. [CrossRef]
38. Kebabsa, T.; Ouelaa, N.; Djebala, A. Experimental vibratory analysis of a fan motor in industrial environment. Int. J. Adv. Manuf.
Technol. 2018, 98, 2439–2447. [CrossRef]
39. Garnier, J.; Solna, K. Applications of random matrix theory for sensor array imaging with measurement noise. Random Matrices
2014, 65, 223–245.
40. Gu, X.; Akoglu, L.; Rinaldo, A. Statistical analysis of nearest neighbor methods for anomaly detection. In Proceedings of the
33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019;
pp. 10921–10931.
41. Scholkopf, B.; Williamson, R.; Smola, A.; Shawe-Taylor, J.; Platt, J. Support vector method for novelty detection. In Proceedings
of the 12th International Conference on Neural Information Processing Systems (NIPS0 99), Denver, CO, USA, 29 November–4
December 1999; pp. 582–588.
42. Hsu, J.; Wang, Y.; Lin, K.; Chen, M.; Hsu, J.H. Wind turbine fault diagnosis and predictive maintenance through statistical process
control and machine learning. IEEE Access 2020, 8, 23427–23439. [CrossRef]
43. Toma, R.N.; Prosvirin, A.E.; Kim, J. Bearing fault diagnosis of induction motors using a genetic algorithm and machine learning
classifiers. Sensors 2020, 20, 1884. [CrossRef]
44. Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the
MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, Australia, 2 December 2014; pp. 4–11.
45. Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification.
In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402.
46. Luwei, K.C.; Yunusa-Kaltungo, A.; Sha’aban, Y.A. Integrated fault detection framework for classifying rotating machine faults
using frequency domain data fusion and artificial neural networks. Machines 2018, 6, 59. [CrossRef]
47. Zhao, H.; Liu, H.; Hu, W.; Yan, X. Anomaly detection and fault analysis of wind turbine components based on deep learning
network. Renew. Energy 2018, 127, 825–834. [CrossRef]
48. Dongo, Y.; Il, D.Y. Residual error based anomaly detection using auto-encoder in SMD machine sound. Sensors 2018, 18, 5.
49. Cheng, Y.; Zhu, H.; Wu, J.; Shao, X. Machine health monitoring using adaptive kernel spectral clustering and deep long short-term
memory recurrent neural networks. IEEE Trans. Ind. Inform. 2019, 15, 987–997. [CrossRef]
50. Li, M.; Wang, S.; Fang, S.; Zhao, J. Anomaly detection of wind turbines based on deep small-world neural network. Appl. Sci.
2020, 10, 1243. [CrossRef]
51. Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A Survey. arXiv 2019, arXiv:1901.03407v2.
52. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
nets. Adv. Neural Inform. Process. Syst. 2014, 27, 1–9.
53. Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised anomaly detection with generative
adversarial networks to guide maker discovery. In Proceedings of the Information Processing in Medical Imaging 2017, Boone,
NC, USA, 25–30 June 2017; pp. 146–157.
54. Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Langs, G.; Schmidt-Erfurth, U. f-AnoGAN: Fast unsupervised anomaly detection with
generative adversarial networks. Med. Image Anal. 2019, 54, 30–44. [CrossRef]
55. Wu, J.; Zhao, Z.; Sun, C.; Yan, R.; Chen, X. Fault-attention generative probabilistic adversarial autoencoder for machine anomaly
detection. IEEE Trans. Ind. Inform. 2020, 16, 7479–7488. [CrossRef]
56. Zhang, G.; Xiao, H.; Jiang, J.; Liu, Q.; Liu, Y.; Wang, L. A Multi-index generative adversarial network for tool wear detection with
imbalanced data. Complexity 2020, 2020, 5831632. [CrossRef]
57. Zenodo Website. MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection. Available
online: https://zenodo.org/record/3384388 (accessed on 28 December 2019).
58. Purohit, H.; Tanabe, R.; Ichige, K.; Endo, T.; Nikaido, Y.; Suefusa, K.; Kawaguchi, Y. MIMII dataset: Sound dataset for malfunc-
tioning industrial machine investigation and inspection. In Proceedings of the 4th Workshop on Detection and Classification of
Acoustic Scenes and Events (DCASE), New York, NY, USA, 25–26 October 2019; pp. 209–213.
Electronics 2021, 10, 2329 23 of 23

59. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605.
60. Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the International
Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 93–104.
61. Cardoso, D.; Ferreira, L. Application of predictive maintenance concepts using artificial intelligence tools. Appl. Sci. 2021, 11, 18.
[CrossRef]
62. Pech, M.; Vrchota, J.; Bednář, J. Predictive maintenance and intelligent sensors in smart factory: Review. Sensors 2021, 21, 1470.
[CrossRef]

You might also like