0% found this document useful (0 votes)

33 views10 pages

1 s2.0 S0952197623000192 Main

This paper presents MemSeg, a semi-supervised image surface defect detection method that balances accuracy and speed by utilizing a memory-based segmentation network. MemSeg introduces simulated abnormal samples and a memory pool to enhance model learning, achieving state-of-the-art performance on the MVTec AD dataset with AUC scores of 99.56% and 98.84% at the image and pixel levels, respectively. The method allows for real-time defect detection in industrial applications without relying on reconstruction processes, making it more efficient and effective.

Uploaded by

Wakaye Abba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views10 pages

1 s2.0 S0952197623000192 Main

Uploaded by

Wakaye Abba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Engineering Applications of Artificial Intelligence 119 (2023) 105835

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai

MemSeg: A semi-supervised method for image surface defect detection using

differences and commonalities
Minghui Yang a ,∗, Peng Wu b , Hui Feng a
a Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China
b School of Computer Science, Northwestern Polytechnical University, Xian 710072, China

ARTICLE INFO ABSTRACT

Keywords: High-accuracy and real-time semi-supervised image surface defect detection is extensively needed in industrial
Defect detection scenarios. However, existing methods do not provide a good balance between accuracy and speed of defect
Semi-supervised learning detection, so this paper proposes an end-to-end memory-based segmentation network (MemSeg) to better
U-net
accomplish this task. Considering the small intra-class variance of products in the same production line, from
Deep neural network
the perspective of differences and commonalities, MemSeg introduces artificially simulated abnormal samples
and memory samples to assist the model learning. In the training phase, MemSeg explicitly learns the potential
differences between normal and simulated abnormal images to obtain a robust classification hyperplane. At
the same time, inspired by the mechanism of human memory, MemSeg uses a memory pool to store the
general patterns of normal samples. By comparing the similarities and differences between input samples and
memory samples in the memory pool to give effective guesses about abnormal regions; In the inference phase,
MemSeg directly determines the abnormal regions of the input image in an end-to-end approach. Simple but
high-performance, MemSeg achieves state-of-the-art (SOTA) performance on MVTec AD datasets with AUC
scores of 99.56% and 98.84% at the image level and pixel level, respectively, while also meeting the real-time
requirements in industrial scenarios.

1. Introduction 2021; Chen et al., 2017), abnormal regions may also be reconstructed
correctly in the inference phase, which clearly violates the basic as-
In the background of industrial intelligence, detecting surface de- sumptions of the reconstruction models. Recently, embedding-based
fects on the products in industrial scenarios is essential to reduce models (Zheng et al., 2021; Roth et al., 2021; Cohen and Hoshen,
production costs, improve production efficiency, and guarantee prod- 2020; Defard et al., 2021) have shown better anomaly detection perfor-
uct quality. The detection of surface defects is a problem of locating mance than reconstruction-based models. The fundamental principle of
abnormal regions in images, such as scratches and smudges. However, embedding-based models is the feature matching between the test and
in practical applications, anomaly detection by traditional supervised normal samples. Although such models require little time consumption
learning is more difficult due to the low probability of abnormal in the training phase, they need to perform complex operations of fea-
samples and the diverse forms of anomalies. Therefore, the methods ture matching in the inference phase, which incurs high computational
based on semi-supervised techniques for surface defect detection have costs for model inference. In addition, such models are not trained
more significant advantages in practical applications, which require using anomaly-specific datasets and directly use pre-trained parameters
only normal samples in the training phase. for feature extraction and anomaly detection, which is not sufficiently
Based on semi-supervised techniques, most image surface defect adaptable to the anomaly detection task.
detection models attempt to explore the general patterns of normal Given the shortcomings of existing methods, this paper proposes
samples efficiently. For example, reconstruction models based on au- an end-to-end memory-based segmentation network (MemSeg) to ac-
toencoder (AE) (Bergmann et al., 2018; Tang et al., 2020) or generative complish product surface defect detection. Instead of reconstructing
adversarial network (GAN) (Akcay et al., 2018; Schlegl et al., 2017a; the input images, MemSeg determines the abnormal regions in the
Zenati et al., 2018; Schlegl et al., 2017b) aim to reconstruct normal images end-to-end. Additionally, our model does not entirely rely on
images with minimal error and locate anomalies based on the recon- the pre-trained model for feature extraction and defect detection, which
struction error. However, due to the powerful generalization ability alleviates the problem of inconsistent distribution between the source
of the convolutional neural networks (CNNs) (Wheeler and Karimi, and target domains. The design of MemSeg is based on the observation

∗ Corresponding author.
E-mail address: [email protected] (M. Yang).

https://doi.org/10.1016/j.engappai.2023.105835
Received 10 October 2022; Received in revised form 5 December 2022; Accepted 5 January 2023
Available online 18 January 2023
0952-1976/© 2023 Elsevier Ltd. All rights reserved.
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Table 1
√
Main advantages of MemSeg and mainstream methods. denotes that the method has
an advantage in this item.
Method Speed Accuracy Training End-to-end
Reconstruction-based ✓ ✓
Simulation-based ✓ ✓ ✓
Embedding-based ✓
Ours ✓ ✓ ✓ ✓

information of normal patterns in the U-Net structure to assist the

model learning.
• Through the two points above, and combining the multi-scale
feature fusion module and the spatial attention module, we effec-
tively simplify semi-supervised anomaly detection into an end-to-
end semantic segmentation task, making semi-supervised image
surface defect detection more flexible.
• Validated by extensive experiments, MemSeg has high accuracy
Fig. 1. Data usage during the training and testing phases. The input of the model, in surface defect detection and localization tasks while better
ground truth (GT), and the output of the model are shown in order. The images are meeting the real-time requirements of industrial scenarios.
taken from the MVTec AD dataset (Bergmann et al., 2019).
The remainder of this paper is structured as follows: Section 2
reviews the SOTA methods for surface defect detection and compares
the differences between MemSeg and similar models; Section 3 de-
of the small intra-class variance of products in the same industrial pro-
scribes the specific implementation of MemSeg; Section 4 demonstrates
duction line, we believe that artificially creating intra-class differences
the effectiveness of MemSeg through extensive experiments; Section 5
and preserving intra-class commonalities can help the model achieve
summarizes the paper and provides recommendations for future work.
better defect detection performance. These two types of information,
differences and commonalities, can provide a more direct orientation to
2. Related works
the model learning and help the model generalize normal patterns more
comprehensively and distinguish non-normal patterns more precisely. This section introduces works related to MemSeg for semi-supervised
Specifically, from the perspective of differences, similar to self- image surface defect detection. Table 1 compares the main advantages
supervised learning, MemSeg introduces artificially simulated abnor- and differences between MemSeg and reconstruction-based, anomaly
mal samples during the training phase to make the model consciously simulation-based, and embedding-based methods. MemSeg has ad-
discriminate normal from non-normal while not requiring the simulated vantages in inference speed and accuracy, and also uses anomaly-
anomalies to be consistent with those in real scenarios, which alleviates specific datasets for training and accomplishes anomaly detection in
the deficiency that semi-supervised learning can only use normal sam- an end-to-end approach.
ples and allows the model to obtain a more robust decision boundary.
MemSeg uses normal and simulated abnormal images to finish the 2.1. Reconstruction-based methods
model training and directly judges the abnormal regions of the input
images without any auxiliary tasks in the inference phase. Fig. 1 shows One of the traditional methods for image surface defect detection is
our data usage in the training and inference phases. the reconstruction-based method. Most reconstruction-based methods
Meanwhile, from the perspective of commonalities, MemSeg intro- use an autoencoder (AE) (Bergmann et al., 2018; Tang et al., 2020)
duces a memory pool to record the general patterns of normal samples. or generative adversarial network (GAN) (Akcay et al., 2018; Schlegl
In the training and inference phases of the model, MemSeg compares et al., 2017a; Zenati et al., 2018; Schlegl et al., 2017b) to train a
the similarities and differences between the input samples and the network to reconstruct the input images. The basic assumption of the
memory samples in the memory pool to provide straightforward in- reconstruction models is that the model can reconstruct normal images
formation for the localization of abnormal regions. In addition, to with a small error and abnormal images with a large error. However,
coordinate the information from the memory pool and the input image in practical applications, the learning ability of neural networks is too
more effectively, MemSeg introduces a multi-scale feature fusion mod- strong (Wheeler and Karimi, 2021; Chen et al., 2017), so the abnormal
ule and a novel spatial attention module, which substantially improves regions in the image may also be reconstructed well, and anomaly
the performance of the model. discrimination based on the reconstruction error may be invalid. To
With the target of precise localization of abnormal regions in im- reduce the influence of abnormal regions on reconstruction models,
ages, MemSeg achieves state-of-the-art (SOTA) performance with AU- RIAD (Zavrtanik et al., 2021b), based on an autoencoder, performs
ROC scores of 99.56% at the image level and 98.84% at the pixel level a multi-scale complementary mask operation on the original image
on the MVTec AD dataset (Bergmann et al., 2019), which contains 75 and tries to cover the abnormal regions using the mask. Similarly,
different forms of anomalies in real scenarios. Meanwhile, MemSeg InTra (Pirnay and Chai, 2021), based on a transformer, uses the image
also has outstanding performances on other datasets. More accurate with masked patches as the input and completes the patch repair task.
but faster, MemSeg can process 31.34 images per second in the infer- However, in any case, the reconstruction models are still affected by the
ence phase using an NVIDIA RTX 3090 GPU, which better meets the abnormal regions in the inference phase, because the exact locations of
real-time requirements in industrial production. abnormal regions are not clear. MemSeg is still based on an AE but
To summarize, the main contributions of this paper are as follows: avoids the reconstruction process of the input image and completes the
anomaly localization in an end-to-end approach.
• We propose a well-designed anomaly simulation strategy for self-
supervised learning of the model, which integrates the three 2.2. Anomaly simulation-based methods
aspects of target foreground, textural and structural anomalies.
• We propose a memory module with a more efficient algorithm Semi-supervised image surface defect detection models use only
of feature matching, and innovatively introduce the memory normal samples as training data. For the models to explicitly learn

2
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

the potential differences between normal and abnormal samples, some 3.1. Anomaly simulation strategy
works (Zavrtanik et al., 2021a; Li et al., 2021; Song et al., 2021)
attempted to generate artificially simulated abnormal samples during In industrial scenarios, anomalies occur in various forms, and it
training. Specifically, DRAEM (Zavrtanik et al., 2021a) superimposes is impossible to cover all of them when performing data collection,
additional texture images as noise onto normal images to generate which limits the modeling with supervised learning methods. However,
abnormal regions, and this type of data augmentation method aims to in the semi-supervised framework, using only normal samples and no
create textural anomalies. CutPaste (Li et al., 2021) and AnoSeg (Song comparisons with abnormal samples is not sufficient for the model
et al., 2021) use an augmentation method similar to copy and paste. to learn what are the normal patterns. In this paper, inspired by
This kind of method randomly copies a small rectangular area from DRAEM (Zavrtanik et al., 2021a), we design a more effective strategy to
the input image and randomly pastes it to the image to simulate simulate abnormal samples and introduce them during the training of
abnormal samples. By pasting rectangular patches of different sizes, MemSeg to accomplish self-supervised learning. MemSeg summarizes
aspect ratios, and rotation angles to create structural anomalies. As a the patterns of normal samples by comparing non-normal patterns
means of data augmentation, the existing anomaly simulation methods to mitigate the drawbacks of semi-supervised learning. As shown in
only consider structural anomalies or textural anomalies one-sidedly. At Fig. 3, the anomaly simulation strategy proposed in this paper is mainly
the same time, for some datasets, there is a problem of low simulation divided into three steps.
efficiency because the target foreground and background in images In the first step, a two-dimensional Perlin noise (Perlin, 1985) 𝑃 is
cannot distinguish well. However, the anomaly simulation strategy generated, then 𝑃 is binarized by threshold T to obtain the mask 𝑀𝑃 .
used by MemSeg solves these shortcomings. The Perlin noise has several random peaks, and 𝑀𝑃 generated by it can
In addition, despite the introduction of simulated abnormal samples, extract contiguous blocks of regions in the image. At the same time,
AnoSeg and DRAEM still need to complete the reconstruction process considering that the proportion of the main body of some industrial
of the input image; CutPaste only completes defect detection at the components in the acquired image is small, if anomaly simulation is
image level, and defect localization at the pixel level is implemented by performed directly without processing, it is easy to generate noise in the
GradCAM or Gaussian density estimation. More directly, with the help background part of the image, which increases the differences between
of a well-designed anomaly simulation strategy, MemSeg does not need simulated abnormal samples and real abnormal samples on the data
the reconstruction of the input image as an auxiliary task for model distribution, which is not conducive for the model to learn effective
learning and completes the pixel-level defect localization end-to-end. discriminative information, so we adopt a foreground enhancement
strategy for this type of image. That is, the input image 𝐼 is binarized to
2.3. Embedding-based methods obtain the mask 𝑀𝐼 , and the noise generated in the binarization process
is removed using the open or close operation. After that, the final mask
Embedding-based methods (Zheng et al., 2021; Roth et al., 2021; image 𝑀 is obtained by performing element-wise product on these two
Cohen and Hoshen, 2020; Defard et al., 2021) usually use a pre-trained masks.
network on ImageNet (Deng et al., 2009) to extract the high-level In the second step, the mask image 𝑀 and the noise image 𝐼𝑛
features of the original images (not trained using anomaly-specific perform element-wise product to obtain the region of interest (ROI) in
datasets), and the anomaly score is obtained by calculating the distance 𝐼𝑛 defined by 𝑀. Consistent with DRAEM (Zavrtanik et al., 2021a),
of the features between the test sample and the normal sample to locate MemSeg introduces a transparency factor 𝛿 in this process to balance
the abnormal region. FYD (Zheng et al., 2021) designs a two-stage the fusion of the original image and the noisy image, so the patterns
coarse-to-fine feature alignment network that learns robust feature dis- of simulated anomalies are closer to the real anomalies. Therefore, the
tributions of normal images; SPADE (Cohen and Hoshen, 2020) extends noisy foreground image 𝐼𝑛′ is generated using the following equation:
the KNN anomaly detection method to pixel level and detects anomalies ( )
𝐼𝑛′ = 𝛿 𝑀 ⊙ 𝐼𝑛 + (1 − 𝛿) (𝑀 ⊙ 𝐼) (1)
in the images through the pixel-level correspondence between the test
image and normal images; PaDiM (Defard et al., 2021) uses a pre- For the noisy image 𝐼𝑛 , we want its maximum transparency to be
trained CNN to extract the patch embeddings of the input image and higher to increase the difficulty of model learning and thus improve
uses the multivariate Gaussian distribution to obtain the probability the robustness of the model. Therefore, 𝛿 in Eq. (1) will randomly and
representation of the normal samples. Due to their simplicity and uniformly sample from [0.15, 1].
effectiveness, embedding-based methods are widely used, but they In the third step, the mask image 𝑀 is inverted to obtain 𝑀, and
usually require a complex process of feature matching in the inference then the element-wise product is performed on 𝑀 and the original
phase, which greatly limits the inference speed of models. The memory image 𝐼 to obtain image 𝐼 ′ , and according to
module in MemSeg is still based on the principle of embedding-based
𝐼𝐴 = 𝑀 ⊙ 𝐼 + 𝐼𝑛′ (2)
methods, but through the design of a more efficient algorithm of feature
matching, the memory module improves the precision of the model the data-augmented image 𝐼𝐴 , namely, the simulated abnormal image,
without adding too much computational cost. is obtained. 𝐼𝐴 takes the original input image 𝐼 as the background
and the ROI in the noise image 𝐼𝑛 extracted by mask image 𝑀 as the
3. Method foreground.
Among them, the noisy image 𝐼𝑛 comes from two parts, one part
This section demonstrates our novel framework to detect and lo- from the DTD texture dataset (Cimpoi et al., 2014), which aims to
calize fine-grained anomalies. An overview of MemSeg is shown in simulate textural anomalies; the other part from the input image, which
Fig. 2. MemSeg uses U-Net (Ronneberger et al., 2015) as a framework aims to simulate structural anomalies. For the simulation of structural
to complete a semantic segmentation task with the help of simulated anomalies, we first perform random adjustments of mirror symmetry,
abnormal samples and memory information in the training phase and rotation, brightness, saturation, and hue on the input image 𝐼. Then
localizes abnormal regions in images end-to-end in the inference phase. the preliminarily processed image is uniformly divided into a 4 × 8
MemSeg consists of several essential parts, and we will describe these grid and randomly arranged to obtain the disordered image 𝐼𝑛 .
parts in the following order: generation of abnormal samples by arti- With the above anomaly simulation strategy, MemSeg obtains simu-
ficial simulation (Section 3.1), generation of memory information and lated abnormal samples from both textural and structural perspectives,
spatial attention maps (Section 3.2), multi-scale feature fusion module and most abnormal regions are generated on the target foreground,
(MSFF Module) for the fusion of memory information and high-level which maximizes the similarity between the simulated abnormal sam-
features of images (Section 3.3), and loss functions (Section 3.4). ples and the real abnormal samples.

3
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Fig. 2. An overview of MemSeg. MemSeg is based on the U-Net architecture and uses a pre-trained ResNet18 (He et al., 2016) as an encoder. From the perspective of differences
and commonalities, MemSeg introduces simulated abnormal samples and a memory module to assist the model learning in a more oriented way and thus accomplishes the semi-
supervised surface defect task in an end-to-end approach. Meanwhile, to fully fuse the memory information with the high-level features of the input image, MemSeg introduces a
multi-scale feature fusion module (MSFF Module) and a novel spatial attention module, which significantly improves the precision of anomaly localization.

Fig. 3. The three steps of our anomaly simulation strategy. In the first step, the mask image 𝑀 is generated using Perlin noise and the target foreground; in the second step, the
ROI defined by 𝑀 in the noise image 𝐼𝑛 is extracted to generate the noise foreground image 𝐼𝑛′ ; in the third step, the noise foreground image is superimposed on the original
image to obtain the simulated abnormal image 𝐼𝐴 .

3.2. Memory module and spatial attention maps together constitute the memory information 𝑀𝐼. It needs to be em-
phasized that to ensure the unification of the memory information and
the high-level features of the input images, we always freeze the model
Memory Module. For humans, our recognition of anomalies is pred- parameters of block 1, block 2, and block 3 in ResNet18, but the rest
icated on knowing what is normal, and the abnormal regions are of the model is still trainable.
identified by comparing the test image with the normal images in our
Given an input image in the training or inference phase, as shown in
memory. Inspired by the learning process of human and embedding-
Fig. 2, the encoder also extracts high-level features of the input image
based methods, MemSeg uses a small number of normal samples as
to obtain features with dimensions of 64 × 64 × 64,128 × 32 × 32,
memory samples and extracts high-level features of the memory sam-
and 256 × 16 × 16. These features with different resolutions together
ples as memory information using a pre-trained encoder (ResNet18 He
constitute the information of the input image 𝐼𝐼. After that, the L2
et al., 2016) to assist the model training.
distance between 𝐼𝐼 and all the memory information 𝑀𝐼 is calculated,
To obtain the memory information, we first randomly select 𝑁 so 𝑁 difference information 𝐷𝐼 between the input image and the
normal images from the training data as memory samples and input memory samples is obtained:
them to the encoder to obtain features of dimensions N ×64 × 64×64,
N ×128 × 32×32, and N ×256 × 16×16 from block 1, block 2, and block ⋃
𝑁
𝐷𝐼 = ‖𝑀𝐼𝑖 − 𝐼𝐼 ‖ (3)
3 of ResNet18, respectively. These features with different resolutions ‖ ‖2
𝑖=1

4
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

where 𝑁 is the number of memory samples. For 𝑁 difference informa- of information in the channel dimension, we use coordinate attention
tion, take the minimum sum of all elements in each 𝐷𝐼 as the standard (CA) (Hou et al., 2021) to capture the information relationship between
to obtain the best difference information 𝐷𝐼 ∗ between 𝐼𝐼 and 𝑀𝐼; that channels of 𝐶𝐼𝑛 . Then, for the features with different dimensions
is, weighted by coordinate attention, we continue to perform multi-scale
∑ information fusion: the feature maps of different dimensions are firstly
𝐷𝐼 ∗ = argmin 𝑥 (4)
𝐷𝐼 𝑖 ∈𝐷𝐼 𝑥∈𝐷𝐼 aligned in resolution using up-sampling, then aligned in the number of
𝑖
channels using convolution, and finally, the operation of element-wise
where 𝑖 ∈ [1, 𝑁]. The best difference information 𝐷𝐼 ∗ contains the
add is executed to achieve multi-scale feature fusion. The fused features
differences between the input sample and its most similar memory
are weighted by the spatial attention maps 𝑀𝑛 (𝑛 = 1, 2, 3) obtained in
sample, the larger the difference value at a position, the higher the
Section 3.2 and then fed to the final decoder.
probability that the region of the input image corresponding to that
position is abnormal.
Subsequently, the best difference information 𝐷𝐼 ∗ completes the 3.4. Training constraints
concatenation operation with the high-level features of the input image
𝐼𝐼 in the channel dimension to obtain the concatenated informa- To ensure that the prediction of MemSeg is close to its ground
tion 𝐶𝐼1 , 𝐶𝐼2 and 𝐶𝐼3 . Finally, the concatenated information will go truth, we use L1 loss and focal loss (Lin et al., 2017b) to guarantee
through the multi-scale feature fusion module for feature fusion, and the similarity of all pixels in the image space. The segmentation images
the fused features flow to the decoder through the skip connection of predicted under the constraint of L1 loss retain more edge information
U-Net. compared to L2 loss. Meanwhile, focal loss alleviates the problem of
area imbalance between normal and abnormal regions in images and
Spatial Attention Maps. It is evident from specific observations and
experiments (Section 4.6) that the best difference information 𝐷𝐼 ∗ has makes the model focus more on the segmentation of difficult samples
an important influence on the localization of abnormal regions. To fully to improve the accuracy of abnormal segmentation.
use the difference information, MemSeg extracts three spatial attention Specifically, MemSeg minimizes the L1 loss 𝐿𝑙1 and the focal loss
maps using 𝐷𝐼 ∗ , which are used to reinforce the guesses of the best 𝐿𝑓 between the ground truth 𝑆 of the simulated abnormal image and
difference information on the abnormal regions. the prediction 𝑆̂ of the model using (8) and (9), respectively.
For the features with three different dimensions in 𝐷𝐼 ∗ , the mean ‖ ‖
𝐿𝑙1 = ‖𝑆 − 𝑆̂ ‖ (8)
values are calculated in the channel dimension, and three feature maps ‖ ‖1
( )𝛾 ( )
of size 16×16, 32×32, and 64 × 64 are obtained. The 16 × 16 feature 𝐿𝑓 = −𝛼𝑡 1 − 𝑝𝑡 log 𝑝𝑡 (9)
map is directly used as the spatial attention map 𝑀3 . After 𝑀3 is up-
sampled, perform the element-wise product operation with the 32 × 32 where 𝑝𝑡 is equal to the predicted probability 𝑝 of the pixel category
feature map to obtain 𝑀2 ; and after 𝑀2 is up-sampled, perform the when the ground truth of the corresponding pixel in S is 1, and 𝑝𝑡 =
element-wise product operation with the 64 × 64 feature map to obtain 1 − 𝑝 when the ground truth of the pixel in 𝑆 is 0, 𝛼𝑡 and 𝛾 are
𝑀1 . As shown in Fig. 2, spatial attention maps 𝑀1 , 𝑀2 and 𝑀3 weight hyperparameters to control the degree of weighting.
the information which obtained after 𝐶𝐼1 , 𝐶𝐼2 and 𝐶𝐼3 are processed Finally, combining all constraints into an objective function, and the
by the MSFF, respectively. Mathematically, the formulas for solving following objective function is obtained:
𝑀1 , 𝑀2 and 𝑀3 are given as follows:
𝐿𝑎𝑙𝑙 = 𝜆𝑙1 𝐿𝑙1 + 𝜆𝑓 𝐿𝑓 (10)
𝐶3
1 ∑ ∗ where 𝜆𝑙1 and 𝜆𝑓 are the balancing hyper-parameters. In the training
𝑀3 = 𝐷𝐼3𝑖 (5)
𝐶3 𝑖=1 phase, the optimization goal of MemSeg is to minimize the objective
(𝐶 ) function defined by Eq. (10).
1 ∑2
∗
𝑀2 = 𝐷𝐼2𝑖 ⊙ 𝑀3𝑈 (6)
𝐶2 𝑖=1 4. Experiments
(𝐶 )
1 ∑ 1
∗
𝑀1 = 𝐷𝐼1𝑖 ⊙ 𝑀2𝑈 (7) This section evaluates the performance of MemSeg as well as the
𝐶1 𝑖=1
functionalities of different components on the semi-supervised anomaly
where 𝐶3 denotes the number of channels of 𝐷𝐼3∗ ; 𝐷𝐼3𝑖
∗ denotes the
detection datasets: MVTec AD dataset (Bergmann et al., 2019), Bean-
∗ 𝑈 𝑈
feature map of channel 𝑖 in 𝐷𝐼3 ; 𝑀3 and 𝑀2 denote the feature maps Tech AD dataset (Mishra et al., 2021), and a toy dataset.
obtained after up-sampling 𝑀3 and 𝑀2 , respectively.
4.1. Datasets and evaluation metric
3.3. Multi-scale feature fusion module

With the help of the memory module, MemSeg obtains the con- The MVTec AD dataset (Bergmann et al., 2019) is mainly aimed at
catenated information 𝐶𝐼 composed of the input image information 𝐼𝐼 the task of semi-supervised surface anomaly detection. The MVTec AD
and the best difference information 𝐷𝐼 ∗ . The direct use of 𝐶𝐼 has the dataset comprises 15 categories, including 5 different texture categories
problem of feature redundancy on the one hand; on the other hand, it and 10 different object categories, each category includes approxi-
increases the computational scale of the model and causes a decrease mately 60 to 400 normal samples for training and a mixture of normal
in the inference speed. Given the success of multi-scale feature fusion and abnormal images for testing, and the test set contains a variety
in target detection (Lin et al., 2017a; Chen et al., 2021), an intuitive of realistic anomalies with different textures and scales. The BeanTech
idea is to fully fuse the visual information and semantic information in dataset (Mishra et al., 2021) has 3 categories of 2540 images, which
the concatenated information 𝐶𝐼 with the help of the channel attention also only contain normal images in the training set.
mechanism and multi-scale feature fusion strategy. For the evaluation metric of defect detection, following the works
Our proposed multi-scale feature fusion module is shown in Fig. 4: in Zheng et al. (2021), Roth et al. (2021), Cohen and Hoshen (2020),
the concatenated information 𝐶𝐼𝑛 (𝑛 = 1, 2, 3) is initially fused by Defard et al. (2021), Zavrtanik et al. (2021a), Li et al. (2021) and Song
a 3 × 3 convolutional layer that maintains the number of channels. et al. (2021), we leverage image-level and pixel-level ROC-AUC for
Meanwhile, considering that 𝐶𝐼𝑛 is a simple concatenation of two kinds performance evaluation.

5
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Fig. 4. The multi-scale feature fusion module used by MemSeg. Considering that 𝐶𝐼n is a concatenation of two kinds of information in the channel dimension, and comes from
different depths of the encoder with different semantic information and visual information, so MemSeg use channel attention CA-Block and multi-scale strategy for feature fusion.

4.2. Implementation details Meanwhile, as shown in Fig. 5, the anomaly localization of MemSeg
at the pixel level is closer to the ground truth (GT), and the boundary
MemSeg is based on the AE model and uses ResNet18 (He et al., between the normal and abnormal regions is more precise, which
2016) as the encoder. And for the decoder part, corresponding to benefits from the end-to-end learning approach adopted by MemSeg
Fig. 2, the up-sampling layer contains a bilinear interpolation layer and the training of the model is directly guided by the pixel-level
and a basic convolution block consisting of a convolution layer, batch- ground truth of simulated abnormal samples.
normalization, and a ReLU activation function. The Conv Layer con-
tains two stacked basic convolution blocks; only the last Conv Layer 4.4. Impact of the anomaly simulation strategy
contains one basic convolution block and a 2-channel convolution
layer. The training process of MemSeg is carried out with 2700 itera- To evaluate the effectiveness of the proposed anomaly simulation
tions, the size of the input image is set to 256 × 256, and the batch size strategy for image defect detection, we remove the textural anomaly
is set to 8, which contains 4 normal samples and 4 simulated abnormal simulation, structural anomaly simulation, and foreground enhance-
samples. When performing anomaly simulation, most categories have
ment strategy in training, respectively, and compare these three cases
an equal probability of using textural anomaly simulation and struc-
with our complete strategy. Table 4 reports the AUC scores at the
tural anomaly simulation. We use a grid search to set hyper-parameters:
image and pixel level for the above four experiments. The AUC scores
the learning rate is set to 0.04; 𝛾 in the focal loss is set to 4; 𝜆𝑙1 and 𝜆𝑓
decrease when either component of the anomaly simulation strategy
in the objective function are set to 0.6 and 0.4, respectively.
is removed, which verifies that our anomaly simulation strategy is not
For most categories in both datasets, we randomly select 30 memory
only theoretically interpretable, but also has an excellent performance
samples in the training set to generate the memory information, but
in experimental validation. Additionally, to justify the choice of using
since the orientation of the screws in the MVTec AD dataset is randomly
Perlin noise to generate anomaly samples, we replaced the Perlin noise
arranged, we increased the number of their memory samples for better
with random rectangular noise and empirically limited the length and
feature matching; the sample size of toothbrushes in the training set is
width of the rectangles between 15 and 100. As shown in Table 4
too small, so we use only 10 memory samples while ensuring adequate
(Rect. Noise), there is only a slight impact on the performance of defect
training samples. MemSeg obtains the anomaly score of each pixel in
detection when using rectangular noise for anomaly region generation.
the image in an end-to-end approach, and the mean of the scores of
This shows that even though the shape of the defects used in the
the top 100 most abnormal pixels in the image is used as the anomaly
score at the image level. training process is significantly different from the shape of the real-
world defects, MemSeg can still adapt well to defect detection in real
4.3. Comparison with existing methods scenes, which further demonstrates the robustness of the approach that
introduces the self-supervised task to solve the semi-supervised task.
This subsection compares MemSeg with different methods. The AUC Now, to evaluate the role of simulated abnormal samples more fully,
scores of different methods are listed in Tables 2 and 3. We can see that we also want to know the data distribution of simulated and real abnor-
our method outperforms most existing methods, which demonstrates mal samples after the training of the model, so we visualize the outputs
the effectiveness of our method. For the MVTec AD dataset, at the of the encoder of simulated abnormal samples, real abnormal samples
image-level anomaly detection, the mean AUC score of MemSeg for tex- in the test set, and normal samples using t-SNE (Van der Maaten and
ture categories and object categories is 99.8% and 99.4%, respectively, Hinton, 2008). As shown in Fig. 6, for most categories, there is some
which is better than all the models compared, and MemSeg also has overlap in the spatial distribution of simulated abnormal samples and
outstanding defect localization performance at the pixel level. For the real abnormal samples, while abnormal samples are separated from
BeanTech AD dataset, although the anomaly localization of MemSeg at normal samples, which further proves the validity of our anomaly
the pixel level is not as good as PaDiM (97.1% vs 97.3%), the AUC simulation strategy.
score of MemSeg is better than all the models in the experiment at It is important to emphasize that the features we visualized are
the image level, which indicates that our model still has an accurate generated only at the bottleneck structure of U-Net. Although the sepa-
anomaly detection capability for other datasets. rability of features in some categories is not strong in two-dimensional

6
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Table 2
Comparison between MemSeg and different methods on the MVTec AD dataset in terms of ROC-AUC % with the format of (Image-level, Pixel-level).
Category SPADE (Cohen and PaDiM (Defard DRAEM (Zavrtanik CutPaste (Li P-SVDD (Yi and P-SVDD-C (Ahn Ours
Hoshen, 2020) et al., 2021) et al., 2021a) et al., 2021) Yoon, 2020) and Kim, 2022)
Carpet (–, 97.5) (–, 98.9) (97.0, 95.5) (92.9, 92.6) (93.1, 98.3) (94.4, 92.9) (99.6, 99.2)
Grid (–, 93.7) (–, 94.9) (99.9, 99.7) (94.6, 96.2) (99.9, 97.5) (95.6, 97.2) (100, 99.3)
Leather (–, 97.6) (–, 99.1) (100, 98.6) (90.9, 97.4) (100, 99.5) (96.1, 98.2) (100, 99.7)
Texture
Tile (–, 87.4) (–, 91.2) (99.6, 99.2) (97.8, 91.4) (93.4, 90.5) (93.5, 91.9) (100, 99.5)
Wood (–, 85.5) (–, 93.6) (99.1, 96.4) (96.5, 90.8) (98.6, 95.5) (98.0, 92.1) (99.6, 98.0)
Average (–, 92.3) (–, 95.6) (99.1, 97.9) (94.5, 93.7) (97.0, 96.3) (95.5, 94.5) (99.8, 99.1)
Bottle (–, 98.4) (–, 98.1) (99.2, 99.1) (98.6, 98.1) (98.3, 97.6) (99.5, 98.6) (100, 99.3)
Cable (–, 97.2) (–, 95.8) (91.8, 94.7) (90.3, 96.8) (80.6, 90) (97.8, 97.6) (98.2, 97.4)
Capsule (–, 99.0) (–, 98.3) (98.5, 94.3) (76.7, 95.8) (96.2, 97.4) (88.7, 96.3) (100, 99.3)
Hazelnut (–, 99.1) (–, 97.7) (100, 99.7) (92.0, 97.5) (97.3, 97.3) (97.9, 98.2) (100, 98.8)
Metal nut (–, 98.1) (–, 96.7) (98.7, 99.5) (94.0, 98.0) (99.3, 93.1) (96.5, 98.1) (100, 99.3)
Object Pill (–, 96.5) (–, 94.7) (98.9, 97.6) (86.1, 95.1) (92.4, 95.7) (91.9, 92.4) (99.0, 99.5)
Screw (–, 98.9) (–, 97.4) (93.9, 97.6) (81.3, 95.7) (86.3, 96.7) (83.3, 95.3) (97.8, 98.0)
Toothbrush (–, 97.9) (–, 98.7) (100, 98.1) (100, 98.1) (98.3, 98.1) (95.6, 96.0) (100, 99.4)
Transistor (–, 94.1) (–, 97.2) (93.1, 90.9) (91.5, 97) (95.5, 93.0) (92.1, 93.5) (99.2, 97.3)
Zipper (–, 96.5) (–, 98.2) (100, 98.8) (97.9, 95.1) (99.4, 99.3) (95.9, 96.0) (100, 98.8)
Average (–, 97.57) (–, 97.3) (97.4, 97.0) (90.8, 96.7) (94.3, 95.8) (93.9, 96.2) (99.4, 98.7)
Average (85.5, 96.0) (95.3, 96.7) (98.0, 97.3) (95.2, 96.0) (92.1, 95.7) (94.4, 95.6) (99.56, 98.84)

Table 3
Comparison between our method and different methods on the BeanTech AD dataset in terms of ROC-AUC % with the format of (Image-level, Pixel-level).
Category PatchCore (Roth et al., 2021) SPADE (Cohen and Hoshen, 2020) PaDiM (Defard et al., 2021) P-SVDD (Yi and Yoon, 2020) Ours
01 (90.9, 95.5) (91.4, 97.3) (99.8, 97.0) (95.7, 91.6) (98.7, 98.9)
02 (79.3, 94.7) (71.4, 94.4) (82.0, 96.0) (72.1, 93.6) (87.0, 96.2)
03 (99.8, 99.3) (99.9, 99.1) (99.4, 98.8) (82.1, 91.0) (99.4, 96.3)
Mean (90.0, 96.5) (87.6, 96.9) (93.7, 97.3) (83.3, 92.1) (95.0, 97.1)

Fig. 5. Comparison of MemSeg with PaDiM (Defard et al., 2021) and SPADE (Cohen and Hoshen, 2020) for anomaly localization on the MVTec AD dataset (before thresholding).
Our model has a more precise judgment of the abnormal regions.

Fig. 6. Separability display of normal samples, simulated abnormal samples, and real abnormal samples.

space, our model can still be corrected in the decoder part using infor- as real abnormal samples. MemSeg is based on the semi-supervised
mation from the skip connection. Besides, MemSeg does not completely learning framework, the reason we introduce the simulated abnormal
require the distribution of simulated abnormal samples to be the same samples during training is simply to make the model explicitly learn the

7
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Table 4
Evaluating the components of our anomaly simulation strategy on the MVTec AD
dataset. The AUC scores are reported for different strategies.
w/o w/o w/o Rect. Noise MemSeg
Texture Structure Foreground
Image-level 98.80 98.77 99.34 98.90 99.56
Pixel-level 97.31 98.09 98.40 98.13 98.84

Table 5
AUC scores of MemSeg on the MVTec AD dataset when using different
loss functions.
L1 loss Focal loss (Lin et al., 2017b) Image-level Pixel-level Fig. 8. Effects of the different numbers of memory samples on AUC scores. The vertical
coordinates report the mean AUC scores of the 13 categories in the MVTec AD dataset,
✓ 84.82 73.38
excluding screw and toothbrush.
✓ 98.92 98.64
✓ ✓ 99.56 98.84

Fig. 7. The effect of L1 loss on anomaly localization. When L1 loss and focal loss are
used simultaneously, the edges of the segmentation images obtained by MemSeg are
more precise.

Fig. 9. The generation process of spatial attention maps 𝑀1 , 𝑀2 , and 𝑀3 . This process
visually demonstrates the effectiveness of the memory module, multi-scale strategy, and
difference between normal and non-normal, so the model can better spatial attention for defect localization of images.

generalize the general patterns of normal samples and then treats

samples outside the normal patterns as abnormal samples. As shown
in Fig. 6(d), for leather, although the distribution of real abnormal, effect on abnormal localization. To further explore the effect of memory
simulated abnormal, and normal samples in two-dimensional space is sample size on anomaly localization, we report the changes in AUC
not ideal, MemSeg can finally complete the accurate defect localization scores when the number of memory samples is 1, 15, 30, and 70
at the pixel level with an AUC score of 99.7%. in Fig. 8. Within a certain range, as the number of memory samples
increases, the model locates the abnormal regions more accurately.
4.5. Impact of different losses However, when the number of memory samples is too large, it causes
insufficient training samples and leads to a slight decrease on AUC
MemSeg uses L1 loss and focal loss as the loss functions. Table 5 re- scores. Hence, an appropriate number of memory samples is crucial for
ports the AUC scores of MemSeg for anomaly detection and localization MemSeg.
when using different loss functions.
As shown in Table 6, the multi-scale strategy, spatial attention, and
Since the ratio of normal samples to simulated abnormal samples
CA contribute significantly to the performance of anomaly detection,
is 1:1 in the training phase, and the proportion of abnormal regions
and the multi-scale strategy has the greatest impact. In Fig. 9, we
in the simulated abnormal samples to the image is small, there exists
visualize the generation process of spatial attention maps 𝑀1 , 𝑀2 and
data imbalance in the training samples at the pixel level, so the model
𝑀3 following the steps in Section 3.2. As shown in the figure, the best
is difficult to train using only L1 loss. Therefore, focal loss needs to be
introduced to make the model focus more on the abnormal regions in difference information 𝐷𝐼𝑛∗ (𝑛 = 1, 2, 3) with different scales, which
the training samples. As shown in Fig. 7, the simultaneous use of L1 is generated relying on the memory information, already contains a
loss and focal loss helps the model learn better anomaly discrimination blurred guess of the abnormal region in the input image. In the process
while converging more quickly. of generating the spatial attention maps from 𝐷𝐼𝑛∗ , after multi-scale
fusion, the noise in the heatmap decreases, and the guess of the abnor-
4.6. Impact of module components mal regions becomes more certain. Although the final generated spatial
attention map 𝑀1 is still different from the final prediction of MemSeg
This subsection mainly discusses the effects of the memory module, to some extent, it is close to the ground truth of the abnormal regions,
multi-scale strategy, spatial attention, and CA (Hou et al., 2021) on which visually demonstrates the effectiveness of memory information,
MemSeg. As shown in Table 6, memory information has a significant multi-scale fusion, and spatial attention.

8
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Table 6
The AUC scores of MemSeg on the MVTec AD dataset when using different module
components.
Memory Multi-scale Spatial Coordinate Image-level Pixel-level
attention attention
96.42 96.08
✓ 98.41 98.27
✓ ✓ ✓ 99.08 98.60
✓ ✓ ✓ 99.26 98.67
✓ ✓ ✓ 98.96 98.44
✓ ✓ ✓ ✓ 99.56 98.84

Table 7
AUC scores of PatchCore, SPADE, PaDiM, and MemSeg on the toy dataset.
PatchCore (Roth et al., 2021) SPADE (Cohen and Hoshen, 2020) PaDiM (Defard et al., 2021) Ours
Image-level 99.75 95.75 99.30 99.83
Pixel-level 99.33 98.72 98.70 99.77

is 0.0319 s for MemSeg, which is a 10-fold improvement. In addition,

when the number of memory samples increases by every 10, the
inference time of a single image of MemSeg increases by only 1.5 ms,
demonstrating the efficiency of the feature-matching algorithm in the
memory module. Meanwhile, compared with the reconstruction-based
models, MemSeg avoids the conventional way of reconstructing the
input samples and achieves the segmentation of abnormal regions in
an end-to-end approach, which also has a competitive advantage in
inference speed.
Fig. 10. The toy dataset is generated using six regular shapes and regular colors to
verify the generalization ability of MemSeg to regular noise.
4.9. Limitations

4.7. Evaluation with a toy dataset The above experiments demonstrate the effectiveness of MemSeg,
as well as the robustness of using self-supervised task to solve semi-
The noise generated by MemSeg using the anomaly simulation supervised task. Although MemSeg has good anomaly detection perfor-
strategy is irregular. To verify the ability of MemSeg to generalize mance on several datasets, there are still some limitations. As shown in
regular noise, we generate a toy dataset using normal samples in the Table 2, for the MVTec AD dataset, at the image level, the category
test set of the MVTec AD dataset. As shown in Fig. 10, the shapes of the with the worst anomaly detection effect is the screw. On the one
generated noise are rectangle, triangle, lightning bolt, star, heart, and hand, the reason is that our model relies more on the alignment of
circle; and the size, color, position, angle, and aspect ratio of the noise detection targets in space, but the orientation of screws in the dataset
are random. For the abnormal samples in the toy dataset, it is never is randomly arranged, which makes it difficult to generate effective
seen in the training phase of MemSeg. We apply the trained model difference information; on the other hand, the reason is that some
directly to the toy dataset and compare the performance with three abnormal regions of screws in the test set are small and difficult to
models. The AUC scores of the different models are shown in Table 7. distinguish, and the model is prone to misclassification, which is also
MemSeg achieves precise localization of the abnormal regions with reflected in other models. At the pixel level, the category with the worst
an AUC score close to 100%, which further demonstrates the strong anomaly detection effect is the transistor, because when the transistor
generalization ability of our model to localize unknown anomalies. has global logic anomalies such as misalignment or missing, it is
difficult for MemSeg to give the accurate location of anomalies. Mean-
4.8. Inference speed while, although MemSeg uses the foreground enhancement strategy in
anomaly simulation, MemSeg still makes false positive judgments on
Compared to reconstruction-based methods (Bergmann et al., 2018; background regions during inference, such as the capsule in Fig. 11,
Tang et al., 2020; Akcay et al., 2018; Schlegl et al., 2017a; Zenati et al., which places higher demands on the quality of the dataset. For these
2018; Schlegl et al., 2017b), the embedding-based methods (Zheng limitations, we can try to solve them in the future by increasing image
et al., 2021; Roth et al., 2021; Cohen and Hoshen, 2020; Defard et al., resolution, adding modules for global relationship modeling, and using
2021) achieve better performance in semi-supervised image surface better data preprocessing and postprocessing methods.
defect detection, but this kind of model needs to perform complex
feature matching in the inference phase, which is difficult to be applied 5. Conclusion
in industrial scenarios with high real-time requirements. Therefore, we
are also interested in the inference speed of MemSeg. Our experiments In this paper, we propose an end-to-end memory-based segmenta-
are carried out on a PC with an NVIDIA RTX 3090 GPU. Overall, tion network to detect surface defects in industrial products. Consider-
the total number of parameters in MemSeg is 80.12 MB, and the ing the small intra-class variation of products in the same production
memory requirement is 212.15 MB during inference. Thanks to the fully line, from the perspective of differences, we propose a well-designed
convolutional network (FCN) structure (Long et al., 2015), MemSeg anomaly simulation strategy for self-supervised learning of the model,
is friendly to parallel computing and has good scalability. For the which accounts for target foreground, textural and structural anoma-
inference speed, we calculate the time consumption of PaDiM (Defard lies; from the perspective of commonalities, we propose a memory
et al., 2021) and SPADE (Cohen and Hoshen, 2020) in the inference module and design an efficient feature-matching algorithm. Through
phase, and the time to process one image is 0.319 s and 0.339 s for the two points above, and combining the multi-scale feature fusion
these two models, respectively, while the time to process one image module and the spatial attention module, we effectively transform

9
M. Yang, P. Wu and H. Feng Engineering Applications of Artificial Intelligence 119 (2023) 105835

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C., 2019. MVTec AD-a comprehensive
real-world dataset for unsupervised anomaly detection, In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9592–9600.
Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., Steger, C., 2018. Improving
unsupervised defect segmentation by applying structural similarity to autoencoders.
arXiv preprint arXiv:1807.02011.
Chen, S., Cheng, Z., Zhang, L., Zheng, Y., 2021. SnipeDet: Attention-guided pyrami-
dal prediction kernels for generic object detection. Pattern Recognit. Lett. 152,
302–310.
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L., 2017. Deeplab:
Semantic image segmentation with deep convolutional nets, atrous convolution,
and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848.
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A., 2014. Describing textures
in the wild, In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 3606–3613.
Cohen, N., Hoshen, Y., 2020. Sub-image anomaly detection with deep pyramid
correspondences. arXiv preprint arXiv:2005.02357.
Defard, T., Setkov, A., Loesch, A., Audigier, R., 2021. Padim: a patch distribution
modeling framework for anomaly detection and localization, In: International
Conference on Pattern Recognition, pp. 475–489.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: a large-
scale hierarchical image database, In: 2009 IEEE conference on computer vision
and pattern recognition, pp. 248–255.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition,
In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 770–778.
Hou, Q., Zhou, D., Feng, J., 2021. Coordinate attention for efficient mobile network
design, In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Fig. 11. Analysis of Limitations of MemSeg. Using screw, transistor, and capsule as Pattern Recognition, pp. 13713–13722.
Li, C.L., Sohn, K., Yoon, J., Pfister, T., 2021. CutPaste: self-supervised learning for
examples respectively, demonstrate the limitations of MemSeg in fine-grained anomaly
anomaly detection and localization, In: Proceedings of the IEEE/CVF Conference
localization, global judgment, and false positives in the background.
on Computer Vision and Pattern Recognition, pp. 9664–9674.
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S., 2017. Feature
pyramid networks for object detection, In: Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 2117–2125.
semi-supervised anomaly detection into an end-to-end semantic seg-
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017. Focal loss for dense object
mentation task, making semi-supervised image surface defect detection detection, In: Proceedings of the IEEE international conference on computer vision,
more flexible. Simple but high-performance, MemSeg achieves SOTA pp. 2980–2988.
performance while meeting the real-time requirements of industrial Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic
segmentation, In: Proceedings of the IEEE conference on computer vision and
scenarios. In future work, we are interested in extending this paradigm
pattern recognition, pp. 3431–3440.
to address semi-supervised anomaly detection tasks in more scenarios, Mishra, P., Verk, R., Fornasier, D., Piciarelli, C., Foresti, G.L., 2021. VT-ADL: a vision
such as anomaly detection in 3D and medical scenes. transformer network for image anomaly detection and localization. arXiv preprint
arXiv:2104.10036.
Perlin, K., 1985. An image synthesizer. ACM Siggraph Comput. Graph. 19 (3), 287–296.
CRediT authorship contribution statement Pirnay, J., Chai, K., 2021. Inpainting transformer for anomaly detection. arXiv preprint
arXiv:2104.13897.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for biomed-
Minghui Yang: Conceptualization, Methodology, Software, Valida-
ical image segmentation, In: International Conference on Medical image computing
tion, Formal analysis, Investigation, Data curation, Writing – original and computer-assisted intervention, pp. 234–241.
draft, Writing – review & editing, Visualization. Peng Wu: Methodol- Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P., 2021. Towards
ogy, Writing – review & editing, Supervision. Hui Feng: Methodology, total recall in industrial anomaly detection. arXiv preprint arXiv:2106.08265.
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G., 2017.
Writing – review & editing. Unsupervised anomaly detection with generative adversarial networks to guide
marker discovery, In: International conference on information processing in medical
Declaration of competing interest imaging, pp. 146–157.
Schlegl, T., Seebock, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G., 2017. Unsu-
pervised anomaly detection with generative adversarial networks to guide marker
The authors declare that they have no known competing finan- discovery, In: International Conference on Information Processing in Medical
cial interests or personal relationships that could have appeared to Imaging, pp. 146–157.
Song, J., Kong, K., Park, Y.I., Kim, S.G., Kang, S.J., 2021. AnoSeg: anomaly seg-
influence the work reported in this paper.
mentation network using self-supervised learning. arXiv preprint arXiv:2110.
03396.
Data availability Tang, T.W., Kuo, W.H., Lan, J.H., Ding, C.F., Hsu, H., Young, H.T., 2020. Anomaly
detection neural network with dual auto-encoders GAN and its industrial inspection
applications. Sensors 20 (12), 3336.
The data is available online. Van der Maaten, L., Hinton, G., 2008. Visualizing data using t-SNE. J. Mach. Learn.
Res. 9 (11).
Wheeler, J.B., Karimi, H.A., 2021. A semantically driven self-supervised algorithm for
Acknowledgment detecting anomalies in image sets. Comput. Vis. Image Underst. 213.
Yi, J., Yoon, S., 2020. Patch svdd: patch-level svdd for anomaly detection and
Thanks are due to Prof. Jing Liu for the support of the experiment segmentation, In: Proceedings of the Asian Conference on Computer Vision.
Zavrtanik, V., Kristan, M., Skočaj, D., 2021. Draem-a discriminatively trained re-
and valuable discussions.
construction embedding for surface anomaly detection, In: Proceedings of the
IEEE/CVF International Conference on Computer Vision, pp. 8330–8339.
References Zavrtanik, V., Kristan, M., Skočaj, D., 2021b. Reconstruction by inpainting for visual
anomaly detection. Pattern Recognit. 112.
Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R., 2018. Efficient
Ahn, J.Y., Kim, G., 2022. Application of optimal clustering and metric learning to gan-based anomaly detection. arXiv preprint arXiv:1802.06222.
patch-based anomaly detection. Pattern Recognit. Lett.. Zheng, Y., Wang, X., Deng, R., Bao, T., Zhao, R., Wu, L., 2021. Focus your distribution:
Akcay, S., Atapour-Abarghouei, A., Breckon, T.P., 2018. Ganomaly: Semi-supervised coarse-to-fine non-contrastive learning for anomaly detection and localization. arXiv
anomaly detection via adversarial training, In: Asian conference on computer preprint arXiv:2110.04538.
vision, pp. 622–637.

U-Shape Network For Chip Surface Defect Detection
No ratings yet
U-Shape Network For Chip Surface Defect Detection
8 pages
DDSNet: Dual-Branch Defect Segmentation
No ratings yet
DDSNet: Dual-Branch Defect Segmentation
16 pages
Sensors 24 00264 v2
No ratings yet
Sensors 24 00264 v2
17 pages
Algorithms
No ratings yet
Algorithms
30 pages
1 s2.0 S0925231220303386 Main
No ratings yet
1 s2.0 S0925231220303386 Main
9 pages
1 s2.0 S0263224123001781 Main
No ratings yet
1 s2.0 S0263224123001781 Main
12 pages
Auto-Annotated Deep Segmentation For Surface Defect Detection
No ratings yet
Auto-Annotated Deep Segmentation For Surface Defect Detection
10 pages
1 s2.0 S0278612522001054 Main
No ratings yet
1 s2.0 S0278612522001054 Main
14 pages
Automatic Detection and Classification of Defective Areas On Metal Parts by Using Adaptive Fusion of Faster R-CNN and Shape From Shading
No ratings yet
Automatic Detection and Classification of Defective Areas On Metal Parts by Using Adaptive Fusion of Faster R-CNN and Shape From Shading
9 pages
Anomaly Detection of Defect Using Energy of Poi - 2024 - Engineering Application
No ratings yet
Anomaly Detection of Defect Using Energy of Poi - 2024 - Engineering Application
15 pages
Tim 2020 3033726
No ratings yet
Tim 2020 3033726
12 pages
2022-Semantic Segmentation Ofdefects Based On DCNN and Itsapplication On Fatigue Lifetimeprediction For SLM Ti-6Al-4Valloy
No ratings yet
2022-Semantic Segmentation Ofdefects Based On DCNN and Itsapplication On Fatigue Lifetimeprediction For SLM Ti-6Al-4Valloy
16 pages
A Systematic Review On Deep Learning With CNNs Applied To Surface Defect Detection
No ratings yet
A Systematic Review On Deep Learning With CNNs Applied To Surface Defect Detection
29 pages
2021 DL+MV+IGAN+Defect Detection
No ratings yet
2021 DL+MV+IGAN+Defect Detection
9 pages
Surface Defect Detection for Industry
No ratings yet
Surface Defect Detection for Industry
8 pages
Applied Sciences: Research On A Surface Defect Detection Algorithm Based On Mobilenet-Ssd
No ratings yet
Applied Sciences: Research On A Surface Defect Detection Algorithm Based On Mobilenet-Ssd
17 pages
ETDNet Efficient Transformer-Based Detection Network For Surface Defect Detection
No ratings yet
ETDNet Efficient Transformer-Based Detection Network For Surface Defect Detection
14 pages
Deep Learning for Surface Defect Detection
No ratings yet
Deep Learning for Surface Defect Detection
17 pages
Sensors 728583 English
No ratings yet
Sensors 728583 English
19 pages
Defect Tracking Using GAI
No ratings yet
Defect Tracking Using GAI
6 pages
Weakly Supervised Image Segmentation For Detecting Defects From Scanning Electron Microscopy Images in Semiconductor
No ratings yet
Weakly Supervised Image Segmentation For Detecting Defects From Scanning Electron Microscopy Images in Semiconductor
13 pages
1 s2.0 S0957417422019844 Main
No ratings yet
1 s2.0 S0957417422019844 Main
15 pages
Towards Total Recall in Industrial Anomaly Detection
No ratings yet
Towards Total Recall in Industrial Anomaly Detection
18 pages
Liu SimpleNet A Simple Network For Image Anomaly Detection and Localization CVPR 2023 Paper
No ratings yet
Liu SimpleNet A Simple Network For Image Anomaly Detection and Localization CVPR 2023 Paper
10 pages
SDDNet A Fast and Accurate Network For Surface Defect Detection
No ratings yet
SDDNet A Fast and Accurate Network For Surface Defect Detection
13 pages
DG-GAN A High Quality Defect Image Generation Meth
No ratings yet
DG-GAN A High Quality Defect Image Generation Meth
19 pages
A Generic Automated Surface Defect Detection Based PDF
No ratings yet
A Generic Automated Surface Defect Detection Based PDF
17 pages
Image Defect Detection Algorithm Based On Deep Lea
No ratings yet
Image Defect Detection Algorithm Based On Deep Lea
8 pages
Approved 2
No ratings yet
Approved 2
25 pages
1 s2.0 S0952197625000065 Main
No ratings yet
1 s2.0 S0952197625000065 Main
16 pages
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
No ratings yet
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
6 pages
A Survey of Defect Detection Application
No ratings yet
A Survey of Defect Detection Application
20 pages
A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach With Application To Defect Detection
No ratings yet
A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach With Application To Defect Detection
14 pages
Improving Unsupervised Defect Segmentation by Applying Structural Similarity To Autoencoders
No ratings yet
Improving Unsupervised Defect Segmentation by Applying Structural Similarity To Autoencoders
9 pages
Bai 2021 J. Phys. Conf. Ser. 1992 032029
No ratings yet
Bai 2021 J. Phys. Conf. Ser. 1992 032029
6 pages
28703-Article Text-32757-1-2-20240324
No ratings yet
28703-Article Text-32757-1-2-20240324
9 pages
1 s2.0 S0950705121005347 Main
No ratings yet
1 s2.0 S0950705121005347 Main
13 pages
Steel Defect Detection via CNN
No ratings yet
Steel Defect Detection via CNN
12 pages
A Multi-Branch U-Net For Steel Surface Defect Type
No ratings yet
A Multi-Branch U-Net For Steel Surface Defect Type
19 pages
RealNet: Advanced Anomaly Detection System
No ratings yet
RealNet: Advanced Anomaly Detection System
21 pages
Few-Shot Steel Surface Defect Detection
No ratings yet
Few-Shot Steel Surface Defect Detection
12 pages
CutPaste Self-Supervised Learning For Anomaly Detection and Localization
No ratings yet
CutPaste Self-Supervised Learning For Anomaly Detection and Localization
11 pages
2 - Self-Supervised Learning For Anomaly Detection and Localization
No ratings yet
2 - Self-Supervised Learning For Anomaly Detection and Localization
28 pages
FEGAN - A Feature Extraction Based Approach For GAN Anomaly Detection and Localization
No ratings yet
FEGAN - A Feature Extraction Based Approach For GAN Anomaly Detection and Localization
15 pages
Roth Towards Total Recall in Industrial Anomaly Detection CVPR 2022 Paper
No ratings yet
Roth Towards Total Recall in Industrial Anomaly Detection CVPR 2022 Paper
11 pages
1-S2.0-S1474034625005129-Main - Definicion Manufactura Aditiva
No ratings yet
1-S2.0-S1474034625005129-Main - Definicion Manufactura Aditiva
12 pages
Dream
No ratings yet
Dream
10 pages
Computer Vision for Defect Detection
No ratings yet
Computer Vision for Defect Detection
3 pages
2D Detection Model of Defect On The Surface of Ceramic Tile by An Artificial Neural Network
No ratings yet
2D Detection Model of Defect On The Surface of Ceramic Tile by An Artificial Neural Network
8 pages
Deep Learning in Manufacturing Defect Detection
No ratings yet
Deep Learning in Manufacturing Defect Detection
23 pages
Machines: Segmented Embedded Rapid Defect Detection Method For Bearing Surface Defects
No ratings yet
Machines: Segmented Embedded Rapid Defect Detection Method For Bearing Surface Defects
1 page
Auto Encoder
No ratings yet
Auto Encoder
13 pages
Nwad130 Supplemental File
No ratings yet
Nwad130 Supplemental File
2 pages
Surface Defect Detection Boost
No ratings yet
Surface Defect Detection Boost
9 pages
Surface Defect Detection of Industrial Parts Based
No ratings yet
Surface Defect Detection of Industrial Parts Based
11 pages
SSRN Id4460036
No ratings yet
SSRN Id4460036
22 pages
1 s2.0 S2352012422005227 Main
No ratings yet
1 s2.0 S2352012422005227 Main
8 pages
Research Article: A Model For Surface Defect Detection of Industrial Products Based On Attention Augmentation
No ratings yet
Research Article: A Model For Surface Defect Detection of Industrial Products Based On Attention Augmentation
12 pages
Assignement 2
No ratings yet
Assignement 2
3 pages
Flutter App Development Basics
No ratings yet
Flutter App Development Basics
2 pages
Design and Implementation of An Online Store: BY WAKAYE ABBA MAXWELL BHU/16/04/05/0035 Project Supervisor: Dr. Egena Onu
No ratings yet
Design and Implementation of An Online Store: BY WAKAYE ABBA MAXWELL BHU/16/04/05/0035 Project Supervisor: Dr. Egena Onu
14 pages
Personal Statement
No ratings yet
Personal Statement
1 page
Research and Development (R&D) As A Means of Sustainable Technological Development in The Nigerian Army
No ratings yet
Research and Development (R&D) As A Means of Sustainable Technological Development in The Nigerian Army
8 pages
Tour Questions
No ratings yet
Tour Questions
3 pages
Web Application Development Insights
No ratings yet
Web Application Development Insights
5 pages
Online Art Gallery Auction Website Requirements: - A Domain Name Is Purchased and Renewed Annually
No ratings yet
Online Art Gallery Auction Website Requirements: - A Domain Name Is Purchased and Renewed Annually
2 pages
Python Program To Convert Celsius To Fahrenheit
No ratings yet
Python Program To Convert Celsius To Fahrenheit
1 page
Software Engineering Course Outline
No ratings yet
Software Engineering Course Outline
3 pages
Setting Up Trays and Trolleys for Service
No ratings yet
Setting Up Trays and Trolleys for Service
6 pages
LP MATH Problem Solving[1]
No ratings yet
LP MATH Problem Solving[1]
4 pages
Past Present and Future of Decision Support Techno
No ratings yet
Past Present and Future of Decision Support Techno
17 pages
Transformative Learningtheory
No ratings yet
Transformative Learningtheory
15 pages
Human Growth Cycle Stages
No ratings yet
Human Growth Cycle Stages
11 pages
Privacy Tech for Data Protection
No ratings yet
Privacy Tech for Data Protection
5 pages
Histogram-Based Outlier Score (HBOS) : A Fast Unsupervised Anomaly Detection Algorithm
No ratings yet
Histogram-Based Outlier Score (HBOS) : A Fast Unsupervised Anomaly Detection Algorithm
5 pages
Behaviour Change Communication
100% (2)
Behaviour Change Communication
17 pages
Bmi 2-19 Boys en
No ratings yet
Bmi 2-19 Boys en
1 page
Grade 6 Term 1 Common Fractions Lesson 3 2
No ratings yet
Grade 6 Term 1 Common Fractions Lesson 3 2
4 pages
A Robotic Arm Vehicle Using Voice Recognition For Physically Challenged People
No ratings yet
A Robotic Arm Vehicle Using Voice Recognition For Physically Challenged People
5 pages
Prof-Code of Ethics
No ratings yet
Prof-Code of Ethics
1 page
Brown Scrapbook Museum of History Infographic PDF
No ratings yet
Brown Scrapbook Museum of History Infographic PDF
1 page
Intrusion Detection System - False Positive Alert Reduction Technique
No ratings yet
Intrusion Detection System - False Positive Alert Reduction Technique
4 pages
Terms of Reference
No ratings yet
Terms of Reference
9 pages
Mintz Lor
No ratings yet
Mintz Lor
2 pages
Text This Brilliant Woman Could Have Won A Physics Nobel For India
No ratings yet
Text This Brilliant Woman Could Have Won A Physics Nobel For India
5 pages
Guidance & Counseling Exam Ans
No ratings yet
Guidance & Counseling Exam Ans
8 pages
Questionnaire-The Effects of Homework Load, Parental Pressure, Peer Pressure, and Teacher's Expectation To The Academicperformance Among Students
No ratings yet
Questionnaire-The Effects of Homework Load, Parental Pressure, Peer Pressure, and Teacher's Expectation To The Academicperformance Among Students
2 pages
Growth Mindset: Key to Personal Development
No ratings yet
Growth Mindset: Key to Personal Development
2 pages
University Sports: A Must for All
No ratings yet
University Sports: A Must for All
6 pages
Nitika ResumeCANADA
No ratings yet
Nitika ResumeCANADA
1 page
Set 3
No ratings yet
Set 3
9 pages
Ideal Person in My Life
No ratings yet
Ideal Person in My Life
5 pages
Eco Club Mission Life Reg School List 21.11.2025
No ratings yet
Eco Club Mission Life Reg School List 21.11.2025
11 pages
7-Identify Project Activities
No ratings yet
7-Identify Project Activities
4 pages
Self-Handicapping and Self-Esteem Dynamics
No ratings yet
Self-Handicapping and Self-Esteem Dynamics
17 pages
Grade 7 Creative Writing Project Guide
No ratings yet
Grade 7 Creative Writing Project Guide
11 pages
Simile Mini-Lesson for 3rd Graders
No ratings yet
Simile Mini-Lesson for 3rd Graders
4 pages
Biology Curriculum Overview
No ratings yet
Biology Curriculum Overview
1 page

1 s2.0 S0952197623000192 Main

Uploaded by

1 s2.0 S0952197623000192 Main

Uploaded by

Engineering Applications of Artificial Intelligence 119 (2023) 105835

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

MemSeg: A semi-supervised method for image surface defect detection using

ARTICLE INFO ABSTRACT

information of normal patterns in the U-Net structure to assist the

generalize the general patterns of normal samples and then treats

is 0.0319 s for MemSeg, which is a 10-fold improvement. In addition,

You might also like