0% found this document useful (0 votes)
31 views12 pages

Complex CNNs for Image Denoising

dew

Uploaded by

srinathgudur11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views12 pages

Complex CNNs for Image Denoising

dew

Uploaded by

srinathgudur11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pattern Recognition 111 (2021) 107639

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: [Link]/locate/patcog

Image denoising using complex-valued deep CNN


Yuhui Quan a, Yixin Chen a,1, Yizhen Shao a,1, Huan Teng a, Yong Xu a,∗, Hui Ji b
a
School of Computer Science and Engineering, South China University of Technology, China
b
Department of Mathematics, National University of Singapore, Singapore

a r t i c l e i n f o a b s t r a c t

Article history: While complex-valued transforms have been widely used in image processing and have their deep con-
Received 13 February 2020 nections to biological vision systems, complex-valued convolutional neural networks (CNNs) have not
Revised 3 August 2020
seen their applications in image recovery. This paper aims at investigating the potentials of complex-
Accepted 6 September 2020
valued CNNs for image denoising. A CNN is developed for image denoising with its key mathematical
Available online 16 September 2020
operations defined in the complex number field to exploit the merits of complex-valued operations, in-
Keywords: cluding the compactness of convolution given by the tensor product of 1D complex-valued filters, the
Complex-valued operations nonlinear activation on phase, and the noise robustness of residual blocks. The experimental results
Convolutional neural network show that, the proposed complex-valued denoising CNN performs competitively against existing state-
Image denoising of-the-art real-valued denoising CNNs, with better robustness to possible inconsistencies of noise models
Deep learning between training samples and test images. The results also suggest that complex-valued CNNs provide
another promising deep-learning-based approach to image denoising and other image recovery tasks.
© 2020 Elsevier Ltd. All rights reserved.

1. Introduction mance [5], and noise-robust memory mechanisms [6]. In many


recognition tasks, the performance of complex-valued NNs has
Image denoising refers to the task of removing the measure- been very competitive against that of their real-valued counter-
ment noise from an input image. It is not only of practical im- parts. However, to the best of our knowledge, complex-valued
portance with the prevalence of photography using mobile devices, NNs have not been investigated for their potential applications in
but also serves as a key component in most image recovery tasks; image processing, which contradicts the wide adoption of many
see e.g. [1,2]. Inspired by the great success of deep leaning in many well-known complex-valued transforms in image processing. To list
computer vision applications, in recent years, there have been ex- some, discrete Fourier transform, Gabor transform, and dual-tree
tensive studies on deep-learning-based image denoising methods. complex wavelet transform.
Most of these methods are built upon convolutional neural net- Indeed, complex-valued transforms have deep connections to
works (CNNs). Such CNN-based approaches showed promising per- biological vision and visual perception. It is known that the pri-
formance, provided that the training samples fit well the character- mate’s area V1 (visual cortex) is dominated by complex cells (see
istics of test data, in terms of both image content and noise char- e.g. [7]), i.e., the cells whose responses are characterized by the se-
acteristics. lectivity to orientation and frequency. Most cells of area V4 were
also found to be more similar to V1’s complex cells rather than
simple cells (see e.g. [8]). The receptive fields and responses of
1.1. Motivations
complex cells are usually modeled by Gabor wavelets [9]. Fur-
thermore, the so-called phase introduced by the complex-valued
All the existing CNN-based methods for image denoising are
representation dominates the perception of visual scenes [10]. Re-
built upon real-valued CNNs. In recent years, complex-valued neu-
call that a signal f under a complex-valued transform, denoted by
ral networks (NNs) have started to receive increasing attention.
F ( f ), can be interpreted in terms to two quantities: magnitude
Many works suggest that using complex numbers in NNs could
enhance the representational capacity [3] and lead to other ad-
|F ( f )| and phase φ (f):
vantages, e.g. easier optimization [4], better generalization perfor- F ( f ) = |F ( f )|eiφ ( f ) . (1)
It is also known in computer vision, the phase of an image pro-

Corresponding author. vides sufficient information of objects on shapes, edges and ori-
E-mail address: yxu@[Link] (Y. Xu). entations. It is sufficient to recover most information of an image
1
Equal contribution only using the phase of the Fourier transform of this image [11].

[Link]
0031-3203/© 2020 Elsevier Ltd. All rights reserved.
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

not only has the same activation mechanism as the real-valued


ReLU in terms of magnitude, but also has a quite complicated
non-linear activation on the phase of the input. The concatenation
of such non-linear operations on phase enables the definition of
very sophisticated mappings in the phase domain. In other words,
complex-valued NN allows defining the mapping between noisy
images and noise-free images in both the amplitude domain and
the phase domain. Residual learning. The complex-valued repre-
sentation is also related to the residual learning of NNs. It is shown
in [6] that using complex numbers in the memory units could fa-
cilitate noise-robust retrieval mechanisms on the associative mem-
ory. In fact, the widely-used residual block [13] shares a similar
architecture with the memory unit, in the sense that in each block
Fig. 1. Complex-valued 2D filters generated by the tensor products of all pairs of
seven 1D Gabor filters in C7 .
the residual is computed and inserted into the “memory” provided
by the identity connection [6]. Such a similarity implies the possi-
ble better robustness of the complex-valued residual block over its
The benefits of complex-valued transforms motivated us to investi- real-valued counterpart. It also implies the potential of complex-
gate the potentials of complex-valued NNs for image recovery, and valued representation in residual learning for better robustness to
this paper focuses on one core problem: image denoising. noise model inconsistencies, i.e., the noise characteristics of train-
ing samples are different from that of test images.
1.2. Merits of complex-valued representation

The main mathematical operations to build a CNN for image


recovery include convolution, activation and residual learning. In 1.3. Contributions and significance
the next, we discuss some merits of the complex-valued versions
of these operations for image denoising. Convolution. A 2D The contributions of our paper are three-fold:
complex-valued filter has its special structure which is different
from its real-valued counterpart. The filters with orientation se-
• First work that studies complex-valued CNN for image de-
lectivity are usually preferred in image processing, as local image
noising. In the past years, real-valued CNNs are the promi-
edges are oriented in different directions. These filters are usually
nent choices of designing deep-learning-based methods for im-
non-separable such that they cannot be expressed as the tensor
age recovery. In contrast, complex-valued representations and
product of two 1D real-valued filters, except the ones with the hor-
transforms are also widely used in image processing, including
izontal/vertical orientation. In contrast, the tensor product of two
Gabor transform and dual-tree complex wavelet transform. The
1D complex-valued filters, denoted by a1 + ib1 and a2 + ib2 , is not
research in biological vision also showed the connections be-
separable regarding its real part and imaginary part:
tween complex-valued transforms and low-level processing in
(a1 + ib1 )(a2 + b2 i ) = (a1 a2 − b1 b2 ) + i · (a1 b2 + b1 a2 ). (2) visual perception. This paper is the first one that investigates
In other words, complex numbers allow using the tensor product the potential of complex-valued CNN for low-level vision tasks,
of two 1D complex-valued filters to simulate 2D non-separable fil- such as image denoising. Our study showed that the complex-
ters with different orientations. See Fig. 1 for an illustration using valued CNN has its merits for image denoising.
1D Gabor filters. Such a property leads to a more compact form • New design of complex-valued essential mathematical op-
of 2D non-separable filters with fewer freedoms. More specifically, erations involved in a denoising CNN. It is known that the
the tensor product of two 1D complex-valued filters defined in CL , success of NNs to solve a problem lies in both the careful de-
will lead to two 2D real-valued filters defined in RL×L : one is from sign of NN architecture and the appropriate choice of essen-
the real part, and the other is from the imaginary part. Thus, com- tial operations. In this paper, we developed a complex-valued
plex numbers enable using 4L freedoms to generate two 2D non- CNN for image denoising, as well as defined several basic oper-
separable filters defined in RL×L , which needs 2L2 freedoms when ations in the complex number field to exploit possible advan-
using real numbers. As a result, complex numbers allow a more tages over their counterparts in the real number field. Namely,
compact representation for the operation of 2D convolution, which (i) a compact representation of 2D filters via the tensor product
helps avoiding overfitting. This could be important when designing of 1D complex-valued filters, which is helpful to avoid overfit-
CNNs for image denoising. In image denoising, the training sam- ting; (ii) non-linear activation on phase, which helps improving
ples include both truth images and their noisy versions. Although denoising performance; and (iii) residual blocks with better ro-
the set of truth images can be sufficient for training, their noisy bustness to the noise model inconsistencies that often occur in
counterparts could be insufficient, especially when the distribution practice.
of noise is complex or is spatially-varying. The amount of noisy • A practical denoising CNN with state-of-the-art performance
data could be overwhelming in order to train a CNN with good and good robustness to noise model inconsistencies. The ex-
generalizability. Activation function. There are many candidates of perimental results on standard datasets show that our complex-
activation functions for complex-valued NNs. One is the generaliza- valued CNN offers an alternative approach to designing effective
tion of the widely-used Rectified Linear Unit (ReLU) function from denoising CNNs. The proposed complex-valued CNN showed
the real domain to the complex domain, which is defined as CReLU competitive performance to the state-of-the-art real-valued de-
[12]: noising CNNs in the setting where the noise model of train-
ing samples exactly fits that of test images. Moreover, it has
CReLU(z ) = ReLU((z )) + i · ReLU((z )), (3) its advantages over other methods on the robustness to noise
where ( · ) and ( · ) denote the real part and imaginary part re- model inconsistencies, including the case where the noise lev-
spectively. Recall that one key quantity introduced by the complex- els of test images are unknown, where the standard deviation
valued representation is the phase, which the real-valued repre- of noise varies among different pixels, and where the noise is a
sentation lacks. Consider a complex number in Fig. 2. The CReLU mixture of different types of noise.

2
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Fig. 2. Activation on the amplitude and the phase of the input using CReLU.

1.4. Organization of the work deep CNN called DnCNN, with residual learning for blind denois-
ing. The DnCNN is trained to map noisy images to the noise, which
The remaining part of the paper is organized as follows. In helps the robustness of the NN to different noise levels. Nowadays,
Section 2, we give a literature review on related image denoising the DnCNN has become the benchmark method for evaluating
methods and existing complex-valued NNs designed for different CNN-based denoisers. Zhang et al.[37] further extended their work
applications. The proposed complex-valued denoising CNN is pre- to deal with spatially-varying noise, which is done by combining a
sented in Section 3 with all details. In Section 4, the experiments tunable noise level map into the input of the CNN. To obtain more
are conducted for the evaluation of the proposed method and the training images for blind denoising, Chen et al.[38] trained a gener-
comparison to other closely-related methods. Section 5 concludes ative adversarial network (GAN) that estimates the noise distribu-
the paper. tion over the input noisy images and generates noisy image sam-
ples as additional training data. Lefkimmiatis et al. [39] inserted
non-local filtering layers into the CNN to exploit the inherent patch
2. Related work recurrence of natural images. All the above CNNs are real-valued
CNNs.
2.1. Image denoising
2.2. Complex-valued convolutional networks
There is abundant literature on image denoising. This section
focuses more on the discussion on the deep-learning-based ap- The early works on complex-valued NNs mainly focus on ad-
proaches. The very early approaches modeled image denoising as dressing the basics of learning; see [3,5,40] for more details.
either a filtering problem or a diffusion process; see [14] for a In recent years, there have been extensive studies on complex-
comprehensive survey. In the last two decades, sparsity-based reg- valued CNNs. Oyallon and Mallat [41] constructed a learning-free
ularizations have become one preferred choice for image denois- CNN with well-designed complex-valued wavelet filters. The re-
ing, which regularize a noise-free image by exploiting the sparsity sulting complex-valued CNN has its mathematical treatment, but
prior of image under certain transforms, such as complex-valued with limited adaptivity as it is not learnable. Then, some math-
ridgelet transform [15], wavelet [16] and adaptive dictionaries [17– ematical understandings were presented in [42] for a trainable
19]. Another prominent approach is non-local methods (e.g. [20– complex-valued CNN. The practical techniques for building train-
23]), which are based on the patch recurrence prior of natural im- able complex-valued CNNs were comprehensively studied and dis-
ages. The BM3D method [20] is arguably the most popular one cussed in [3,12]. For gaining certain invariance, several complex-
which applies collaborative filtering to similar patches. The WNNM valued CNNs were developed. Chintala et al.[43] proposed a
[24] and TWSC [25] are another two popular non-local methods complex-valued CNN with scale invariance. A similar architecture
that exploit the low rank structures of similar patches for denois- was proposed in [44]. Worrall et al.[45] replaced regular CNN fil-
ing. ters with circular harmonics for the invariance to complicated rota-
Instead of using the sparsity prior or the patch recurrence tions. We note that the applications of above complex-valued CNNs
prior, some approaches learn image priors from visual data. Por- all focus on recognition tasks.
tilla et al.[26] proposed to learn a Gaussian scale mixture model on
the wavelet coefficients of natural images. Roth et al.[27] proposed 3. Proposed method
to learn a high-order Markov random field for modeling natural
images. The classic EPLL approach [28] learns a Gaussian mixture 3.1. Framework
model of image patches. Xu et al.[29] proposed to learn the dis-
tribution prior on similar patch groups. Instead of learning image The complex-valued CNN proposed in this paper is called
priors, an alternative approach is to directly learn the denoising CDNet (Complex-valued Denoising Network). The CDNet maps a
process. Schmidt et al.[30] unfolded the variational model of im- noisy image Y to a noise-free image X:
age denoising into a process with learnable parameters of filters
CDNet : Y ∈ RM1 ×M2 → X ∈ RM1 ×M2 . (4)
and of shrinkage. Chen et al.[31] turned the diffusion process to a
trainable one. Such approaches indeed can be viewed as training See Fig. 3 for the outline of CDNet. Briefly, the input image is
an NN for denoising. passed to 24 sequentially-connected convolutional units. Each con-
Recently, many NN-based image denoisers have been pro- volutional unit except the first one contains a complex-valued con-
posed [1]. Jain et al.[32] trained a shallow CNN for denoising. volutional layer which is sequentially followed by the complex-
Burger et al.[33] trained a multi-layer perceptron to denoise im- valued batch normalization and the complex-valued ReLU. All
age patches. Agostinelli [34] trained a denoising auto-encoder for the convolutional layers use 64 convolutional kernels. The mid-
removing different types of noises. Vemulapalli et al.[35] unfolded dle 18 convolutional units are implemented as 9 residual blocks
the Gaussian conditional random field to a deep NN with auto- [13] equipped with complex-valued representations for better per-
matic estimation of noise variance. Zhang et al.[36] proposed a formance and faster convergence [13]. To enlarge the receptive

3
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Fig. 3. Diagram of framework of CDNet. Abbreviations: Conv for Convolution, ↓2 for downsampling by stride 2, BN for Batch Normalization, RB for Residual Block, and ReLU
for Rectified Linear Unit.

field for further improvement and better computational efficiency, lutional layer is the mapping of
we adopt a convolutional/deconvolutional layer with stride 2 in the    
convolutional units before the first residual block and after the last (· ) ∈ RN1 ×N2 ×D (· ) ∈ RN1 ×N2 ×D
→ , (9)
residual block for the function of down-scaling/upscaling the fea- (· ) ∈ RN1 ×N2 ×D (· ) ∈ RN1 ×N2 ×D
ture maps. Finally, a merging layer is employed to transform the
which allows using existing real-number-based toolboxes. It is
complex-valued features from the previous convolutional unit to a
noted that such an implementation is similar to a double-width
real-valued image. We also use the skip connection to connect the
CNN. The main difference is that the convolution in complex num-
input of the first residual block with the output of the last resid-
ber field introduces additional interactions between the real part
ual block for preserving image details. It can be seen that there
and the imaginary part of a complex-valued feature. Such interac-
are mainly five basic blocks in CDNet: (i) complex-valued convolu-
tions can be implemented in a real-valued NN with additional con-
tional layer; (ii) complex-valued ReLU; (iii) complex-valued batch
nections, but no real-valued NNs will do such a connection without
normalization; (iv) complex-valued residual block; and (v) merg-
the motivations from complex-valued convolutions.
ing layer.
The back-propagation about the complex-valued convolution
kernels is similar to that of their real-valued counterparts, except
3.2. Complex-valued convolutional layer that the related operations are defined on complex numbers. More
specifically, let K, A denote a complex-valued kernel and an input
The complex-valued convolutional layer is constructed by sim- complex-valued feature map respectively. Let B = A ∗ K and f(B) is
ply replacing the real-valued kernel by the complex-valued kernel a scalar function on B. This sufficiently covers the calculation of
in the convolution process. The layer takes a complex-valued fea- the gradients encountered in the training of complex-valued CNNs.
ture cube A as input and outputs another complex-valued feature By the chain rule in complex analysis, we have
cube A˜ :
∂ f (B ) ∂ f (B ) ∂ B ∂ f (B )
Conv : A ∈ CN1 ×N2 ×D1 → A˜ ∈ CN1 ×N2 ×D2 . = = ∗ A. (10)
(5) ∂K ∂B ∂K ∂B
More specifically, the layer is composed of D2 convolution opera- Note that ∂ ∂f (BB) and A are both complex-valued, and thus ∂ ∂f (KB) is
D2
tions with the complex-valued filters {K i ∈ CL×L×D1 }i=1 , which ex- also complex-valued with the form: ∂ ∂f (KB) = ( ∂ ∂f (KB) ) + i · ( ∂ ∂f (KB) ).
tends through the full depth of A (i.e. D1 ). During the forward pass, Based on (8) we have
each kernel Ki is convolved across the width and height of A as     
follows: ∂ f (B ) ∂ f (B ) ∂ f (B )
  = ∗  (A ) −  ∗  ( A ), (11)
(A ∗ K i )(x, y ) = A ( x − x0 , y − y0 , z0 )K i ( x0 , y0 , z0 ), (6) ∂K ∂B ∂B
     
x0 ,y0 ,z0 ∂ f (B ) ∂ f (B ) ∂ f (B )
 = ∗  (A ) +  ∗  ( A ). (12)
which produces a 2-dimensional feature map regarding Ki . Stack- ∂K ∂B ∂B
ing the feature maps for all filters along the depth dimension
forms the full output. In practice, we set L = 3 for all the convo-
lutional layers. 3.3. CReLU for complex numbers
In all residual blocks, the 2D convolution is implemented by
two consecutive 1D convolutions. Concretely, we use the following The ReLU is arguably the prominent choice for the activation
scheme: functions in CNNs for image recovery, which is also used in CDNet.
1 2
Same as the real-valued one, the complex-valued ReLU (CReLU) ac-
A ∗ K i → A ∗ k i ∗ ki , (7) tivation is an element-wise mapping denoted by
where k1i
∈ CL×1×D , k2i
∈ C1×L×D .
Such a factorization represents CReLU : CN1 ×N2 ×D → CN1 ×N2 ×D , (13)
the 2D convolution in a more compact way, i.e., the number of pa-
rameters of each convolutional layer is reduced from DL2 to 2DL,
where D is number of channels. CReLU(A )(k ) = CReLU(A(k )). (14)
Many existing CNN toolboxes do not support complex-valued There are many choices for these CReLUs. In this paper, we propose
convolutions. We implement the complex-valued convolution us- to use the CReLU [12] as the CReLU that enables sophisticated non-
ing the real-valued convolutions available in existing toolboxes. linear operations on the phase, which is defined by
The complex-valued convolution can be expressed as
CReLU(z ) = ReLU((z )) + i · ReLU((z )), (15)
A ∗ K = ((A ) ∗ (K ) − (A ) ∗ (K ))
for a complex-valued vector z. The CReLU applies the ReLU acti-
+ ((A ) ∗ (K ) + (A ) ∗ (K ))i. (8)
vation on the real part and imaginary part of the input respec-
It can be seen from (8), the complex-valued convolution can be tively. It can be seen from Fig. 2 that the CReLU allows four differ-
implemented by four real-valued convolutions. Thus, each convo- ent patterns in the ring of phase. In addition to CReLU, there are

4
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

3.6. Merging layer

The merging layer transforms the complex-valued feature maps


into the output real-valued image:
Merging : CM1 ×M2 ×D → RM1 ×M2 . (21)
Given a feature cube A ∈ CM1 ×M2 ×D
as input, we first apply a con-
volution with a L × L × D kernel to map A to a feature map B ∈
Fig. 4. Diagram of complex-valued residual block. Abbreviations: Conv for Convo- CM1 ×M2 . The conversion of B into a real-valued image X ∈ RM1 ×M2
lution, RB for Residual Block, and ReLU for Rectified Linear Unit.
is done as follows:

X=  ( B )2 +  ( B )2 . (22)
also other options for CReLUs. One is the Modular ReLU (MoeReLU)
[3] defined by In other words, the conversion is done by only taking the ampli-
 z
tude of a complex-valued signal.
( |z | + b ) , if |z| + b ≥ 0;
ModReLU(z ) = |z | (16) 3.7. Training
0, if |z| + b < 0,
where b ∈ R is a trainable bias parameter. The ModReLU can main- Given the training data {(Y k , X̄ k )}Kk=1 where Y k , X̄ k represent
tain the phase after the activation. It is implemented by feeding the kth noisy image (patch) and its truth respectively. Let θ de-
the magnitude to the real-valued ReLU, followed by the modulation note the parameter vector encoding all parameters of the CDNet.
with the original phase. Another choice is the zReLU [3] defined by The loss function for training is simply defined by the mean square
error as follows:

z, if φ (z ) ∈ [0, π2 ]; 1
K
zReLU(z ) = (17) (θ ) := CDNet(Y k ; θ ) − X̄ k 2
0, otherwise, 2. (23)
K
k=1
where φ ( · ) denotes the phase. The zReLU is activated when both The weights are initialized using Xavier, and the training loss is
the real part and the imaginary part are positive. It is implemented optimized by Adam.
by
ReLU((z ) · (z )) ReLU((z ) · (z )) 4. Experiments
zReLU(z ) = +i . (18)
 (z )  (z )
We applied the CDNet to image denoising under different set-
It will be shown in the experiments that, with a simple yet effec- tings, including non-blind removal of additive white Gaussian noise
tive nonlinear operation on phase, CReLU yields better results than (AWGN), blind AWGN removal, and removal of spatially-varying
ModReLU and zReLU. noises. The quantitative results reported in the following are the
average values over multiple runs. The CDNet is implemented us-
3.4. Complex-valued batch normalization ing TensorFlow with CUDA acceleration. The experiments were car-
ried out on a workstation with a 3.2GHz Intel Core i7-8700 CPU,
Batch normalization is a commonly-used module for better gen- 32G RAM and an NVIDIA Gefore RTX 2080Ti GPU. Throughout the
eralization performance as well as better convergence of training, experiments, all convolutional layers are with kernel sizes of 3 × 3
which is denoted by and zero padding of length 2. The number of channels is set to 64
BN : CN1 ×N2 ×D → CN1 ×N2 ×D . (19) for all convolutional layers except that it is the same as the num-
ber of channels of the input image for the last convolutional layer.
We adapt batch normalization to the complex number field by As described in Section 3, there are totally 24 convolutional layers
separately running batch normalization on the real and imaginary in CDNet. It is worth mentioning that the depth and width param-
parts of complex numbers respectively: eters were simply adjusted by trying some common values used in
BN(z ) = ReBN((z )) + i · ReBN((z )) (20) the existing methods such that the resulting model has similar size
with the standard benchmark methods, e.g. DnCNN. We did not
where ReBN( · ) is the standard batch normalization with real- fine-tune the depth and width, as this is a very time-consuming
valued input. task.

3.5. Complex-valued residual block 4.1. Non-blind removal of AWGN

Residual blocks [13] not only deal with the vanishing gradi- 4.1.1. Test methodology
ents in the back propagation during training, but also benefit the In this setting, we aim at denoising the images degraded by the
preservation of image details in denoising by passing the previous AWGN of known noise levels. Since the noise levels are known, the
features to the subsequent layers. The generalization of residual denoisers are separately trained and tested on different noise lev-
blocks to the complex number field is straightforward. See Fig. 4 els. Regarding the training of CDNet, we follow [31,31,36] for fair
for an illustration of the structure of the complex-valued resid- comparison, which uses 400 BSD images [36] of size 180 × 180.
ual block. It is noted that the residual block shares similarity with Same as [36], the images are cut into patches of size 40 × 40 for
the memory unit, by regarding that the residual is computed and data augmentation, and then 226800 of them are sampled and de-
inserted into the “memory” provided by the identity connection graded by the AWGN for training. Both low and high noise lev-
[6]. It is shown in [6] that introducing complex numbers into the els are used for training, including σ = 15, 25, 35, 50, 60, 70, 75, 80.
memory units could facilitate noise-robust retrieval mechanisms For each noise level, the CDNet model is trained with 70 epochs
on the associative memory. Therefore, the extension of the resid- and with the learning rate decaying from 1 × 10−2 to 1 × 10−3 . The
ual block to complex number field is beneficial to the robustness training time takes around 9 h on each noise level. For test, we se-
of the CNN to noise model inconsistencies. lect two widely-used benchmark datasets including Set12 [36] and

5
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Table 1
Average PSNR(dB) of denoised images by different methods on Set12 in non-blind AWGN
removal.

σ 15 25 35 50 60 70 75 80

BM3D 32.37 29.97 28.40 26.72 25.95 25.25 24.93 24.63


WNNM 32.69 30.25 28.69 27.05 26.20 25.50 25.19 24.91
EPLL 32.14 29.69 28.11 26.47 25.60 24.89 24.59 24.30
TNRD 32.50 30.05 n/a 26.81 n/a n/a n/a n/a
IRCNN 32.77 30.38 28.80 27.14 n/a n/a n/a n/a
DnCNN 32.85 30.43 28.82 27.17 26.27 25.67 25.33 25.01
TWSC 32.60 30.18 28.63 26.96 26.08 25.34 25.00 24.67
FFDNet 32.75 30.43 28.92 27.32 26.54 25.83 25.49 25.22
CDNet 32.87 30.53 28.99 27.38 26.58 25.89 25.58 25.30

Table 2
Average PSNR(dB) of denoised images by different methods on BSD68 in non-blind
AWGN removal.

σ 15 25 35 50 60 70 75 80

BM3D 31.07 28.57 27.08 25.62 25.07 24.52 24.28 24.05


WNNM 31.37 28.83 27.30 25.87 25.16 24.63 24.38 24.15
EPLL 31.21 28.68 27.21 25.67 25.01 24.43 24.18 23.95
TNRD 31.42 28.92 n/a 25.97 n/a n/a n/a n/a
IRCNN 31.63 29.15 27.66 26.19 n/a n/a n/a n/a
DnCNN 31.73 29.23 27.69 26.23 25.41 24.87 24.70 24.44
TWSC 31.28 28.76 27.25 25.77 25.04 24.44 24.17 23.93
SF-20L 31.29 28.82 n/a 26.02 n/a n/a 24.43 n/a
FFDNet 31.63 29.19 27.73 26.29 25.62 25.05 24.79 24.55
CDNet 31.74 29.28 27.77 26.36 25.67 25.10 24.85 24.63

Table 3
Average SSIM of denoised images by different methods on Set12 in non-blind AWGN removal.

σ 15 25 35 50 60 70 75 80

BM3D 0.8963 0.8509 0.8111 0.7661 0.7377 0.7088 0.6980 0.6860


WNNM 0.8938 0.8457 0.8071 0.7562 0.7289 0.7039 0.6932 0.6838
EPLL 0.8938 0.8457 0.8071 0.7562 0.7289 0.7039 0.6932 0.6838
IRCNN 0.9008 0.8601 0.8256 0.7804 n/a n/a n/a n/a
DnCNN 0.9027 0.8618 0.8259 0.7827 0.7369 0.7026 0.6842 0.6710
TWSC 0.8989 0.8549 0.8192 0.7731 0.7454 0.7200 0.7086 0.6974
FFDNet 0.9029 0.8641 0.8316 0.7906 0.7673 0.7451 0.7332 0.7214
CDNet 0.9034 0.8646 0.8328 0.7924 0.7696 0.7494 0.7393 0.7308

BSD68 [27]. There is no overlap between the set of test images and pared methods on both light AWGN and heavy AWGN. We show
the set of training images, and the content of test images has suf- some denoising results in Figs. 5 and 6 for visual comparison.
ficient variations and is different from that of training images. All
the images used in the experiments are gray-scale. With each of 4.2. Blind AWGN removal
the above noise levels, we use the corresponding AWGN to corrupt
the test images, and then the CDNet trained on that level is used 4.2.1. Test methodology
to denoise the noisy images. The peak signal-to-noise ratio (PSNR) In practice, the blind AWGN removal (i.e. denoising without
and structural similarity (SSIM) on the denoised images are used knowing the noise level) is more valuable. We evaluate our CDNet
for quantifying the performance. on the blind AWGN removal on the previously-used Set12 and
BSD68 datasets. The noisy images are generated as follows. Given
4.1.2. Results and comparison a clean image, the noise level σ is randomly picked up from [5,80].
We compare our CDNet with both the classic methods and Then the AWGN with σ is added to the image. For the train-
the state-of-the-art ones, including BM3D [20], WNNM [24], EPLL ing of CDNet for blind denoising, we follow the scheme used in
[28], TNRD [31], DnCNN [36], IRCNN [2], SF-20L [46], UNLNet [39], [39], which divides the noise level σ ∈ [5, 80] into three inter-
TWSC [25] and FFDNet [37]. These methods cover different types vals: [5,30], [30,55] and [55,80]. Then the CDNet is trained on
of image denoisers. Among them, BM3D, WNNM, EPLL and TWSC these intervals respectively. The aforementioned 400 BSD images
are four representative traditional methods, while others are the are used for training. Similar to the previous experiment, the im-
recent deep-learning-based methods. The reported results of the ages are cut into 40 × 40 patches for data augmentation, and
compared methods, whenever possible, are quoted from the pub- then 226800 of them are sampled and degraded by the AWGN
lished works. Or otherwise they are produced by the codes pub- with the randomly-chosen levels in the interval for training. The
lished by the original authors. The test and training (if possible) CDNet model is trained with 140 epochs and with the learning
of these compared methods are done in the same manner as ours, rate decaying from 1 × 10−2 to 1 × 10−4 . The training takes around
which makes the comparison fair. 15 h. In test, we generate the images corrupted by the AWGN with
Tables 1 and 2 show the PSNR values of all the compared meth- σ = 15, 25, 35, 50, 60, 70, 75, 80 respectively and report the results
ods with different noise levels on the Set12 and BSD68 datasets re- for each noise level.
spectively. The corresponding SSIM values are given in Tables 3 and In addition to gray-scale images, we also train our model for
4. It can be seen that our CDNet performs better than other com- denoising color images with known noise levels. We use a color

6
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Table 4
Average SSIM of denoised images by different methods on BSD68 in non-blind AWGN removal.

σ 15 25 35 50 60 70 75 80

BM3D 0.8744 0.8044 0.7511 0.6931 0.6627 0.6363 0.6248 0.6137


WNNM 0.8780 0.8100 0.7553 0.6984 0.6660 0.6458 0.6333 0.6213
EPLL 0.8824 0.8120 0.7558 0.6915 0.6581 0.6307 0.6181 0.6073
IRCNN 0.8881 0.8249 0.7746 0.7171 n/a n/a n/a n/a
DnCNN 0.8906 0.8278 0.7765 0.7189 0.6422 0.5998 0.5785 0.5657
TWSC 0.8782 0.8077 0.7530 0.6903 0.6573 0.6293 0.6168 0.6050
FFDNet 0.8902 0.8295 0.7815 0.7261 0.6959 0.6697 0.6579 0.6443
CDNet 0.8916 0.8314 0.7833 0.7272 0.6974 0.6737 0.6619 0.6518

Fig. 5. Denoising results of an image from BSD68 dataset in non-blind AWGN removal with noise level σ = 60.

Fig. 6. Denoising results of image “starfish” in non-blind AWGN removal with noise level σ = 60.

version of the Berkeley segmentation dataset, of which 432 color pared methods are quoted from Lefkimmiatis [39] whenever avail-
images are used for training and the remaining 68 images are used able, or otherwise produced by the published codes from the au-
to form the test set. Similar to the gray image denoising, the color thors of the original works. Note that UNLNet has no available re-
images are cut into patches of size 40 × 40 for training. The model sults on Set12 and no published code either, we only compare it
are trained at three different noise levels, including σ = 25, 35, 50, on BSD68. It can be seen that our method is powerful in the task
and the number of epoch per training is fixed at 51. We compare of blind AWGN removal. On both datasets, our CDNet achieved the
our CDNet with CBM3D (BM3D for color images) and DnCNN [36]. best results among all the compared methods. We visualize some
denoising results in Figs. 7 and 8. Compared to other methods,
4.2.2. Results and comparison CDNet can better handle both the textured regions and smooth re-
Results on gray-scale image denoising. We compare our gions. The improvement of CDNet over other compared methods
CDNet with two CNN-based denoisers that support blind denois- has demonstrated benefits of using complex numbers in denoising
ing, including DnCNN [36] and UNLNet [39]. The BM3D [20] is also CNN for better generalization to the processing of unknown noise
used for comparison. The test and training (if possible) of these level which is an often-seen type of noise model inconsistencies. It
compared methods are also done in the same blind manner as can also be observed that the PSNR improvement of CDNet tends
ours. The PSNR results are summarized in Tables 5 and 6, and the to be larger as the noise level increases. The reason is probably
SSIM results are given in Tables 7 and 8. The results of the com- that higher unknown noise levels may cause worse inconsistencies

7
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Table 5
Average PSNR(dB) of denoised images by different methods on Set12 in blind AWGN
removal.

σ 15 25 35 50 60 70 75 80

BM3D 32.37 29.97 28.40 26.72 25.95 25.25 24.93 24.63


DnCNN 32.71 30.29 28.69 27.11 26.25 25.53 25.22 24.85
CDNet 32.79 30.47 28.71 27.34 26.46 25.87 25.55 25.29

Table 6
Average PSNR(dB) of denoised images by different methods on BSD68 in blind AWGN
removal.

σ 15 25 35 50 60 70 75 80

BM3D 31.07 28.57 27.08 25.62 25.07 24.52 24.28 24.05


UNLNet 31.47 28.96 27.50 26.04 n/a n/a n/a n/a
DnCNN 31.61 29.11 27.54 26.17 25.44 24.85 24.61 24.32
CDNet 31.64 29.18 27.61 26.27 25.56 25.05 24.78 24.59

Table 7
Average SSIM of denoised images by different methods on Set12 in blind AWGN removal.

σ 15 25 35 50 60 70 75 80

BM3D 0.8963 0.8509 0.8111 0.7661 0.7377 0.7088 0.6980 0.6860


DnCNN 0.8970 0.8499 0.8191 0.7616 0.7417 0.7039 0.6848 0.6671
CDNet 0.9022 0.8616 0.8211 0.7894 0.7634 0.7467 0.0751 0.7272

Table 8
Average SSIM of denoised images by different methods on BSD68 in blind AWGN removal.

σ 15 25 35 50 60 70 75 80

BM3D 0.8744 0.8044 0.7511 0.6931 0.6627 0.6363 0.6248 0.6137


DnCNN 0.8803 0.7977 0.7611 0.6808 0.6616 0.6276 0.6085 0.5894
CDNet 0.8911 0.8228 0.7764 0.7140 0.6956 0.6627 0.6476 0.6411

Fig. 7. Denoising results on two noisy images in blind AWGN removal with noise level σ = 60.

Fig. 8. Denoising results on two noisy images in blind AWGN removal with noise level σ = 75.

8
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Fig. 9. Denoising results on a color image with noise level σ = 50.

Table 9 Table 10
Average PSNR(dB) and SSIM results of color image denoising by dif- Average PSNR(dB) of denoised images on Set12 and BSD68 datasets by different
ferent methods on BSD68 in non-blind AWGN removal. methods in removing spatially-varying noises.

Method PSNR SSIM Method Mixed AWGN AWGN + Uniform

25 35 50 25 35 50 Setting#1 Setting#2 s=5 s = 10 s = 15

CBM3D 30.64 28.83 27.31 0.931 0.901 0.870 SET12 BM3D 25.55 25.46 40.48 40.2 37.32
DnCNN 31.31 29.65 28.01 0.884 0.844 0.792 EPLL 25.16 25.05 41.24 40.82 37.89
CDNet 31.34 29.84 28.14 0.937 0.915 0.882 WNNM 24.85 24.74 40.21 39.22 36.19
DnCNN 25.91 25.77 41.14 40.25 39.51
CDNet 26.15 26.04 44.13 42.52 40.66
BSD68 BM3D 24.79 24.68 41.21 40.68 36.72
of noise models and our CDNet has better robustness to those in- EPLL 24.68 24.58 42.55 41.74 37.73
consistencies. WNNM 24.14 24.06 41.02 40.05 35.73
Results on color image denoising. Table 9 shows the PSNR and DnCNN 25.09 24.99 41.89 40.94 39.72
CDNet 25.31 25.22 45.78 43.21 40.74
SSIM values of all the compared methods with different noise lev-
els on the color version of BSD68 datasets respectively. It can be
seen that our CDNet is the top performer among all compared Table 11
methods. We show some denoising results in Fig. 9 for visual com- Average PSNR(dB) by CDNet with different modifications in blind
AWGN denoising with noise level σ = 60. ’Original’: CDNet without
parison. modifications; ’2D Conv’: replace 1D convolutions with 2D ones in
residual blocks; ’ModReLU’: replace CReLU with ModReLU; ’zRELU’:
4.3. Removal of spatially-varying noises replace CReLU with zReLU; Real: replace all complex-valued units
with real-valued ones and with double number of channels.
4.3.1. Test methodology Dataset Original 2D Conv zReLU ModReLU Real
We further evaluate the performance of our CDNet on blindly
Set12 26.46 26.43 22.53 26.23 26.11
removing spatially-varying noises. We use a similar setting to [38], BSD68 25.56 25.55 22.69 25.40 25.28
with two types of spatially-varying noises considered. The first is
the AWGN with spatially-varying high noise levels. We use two
settings: (i) 70% pixels corrupted by N (0, 60 ) and 30% pixels cor-
of training samples. Such improved generalizability comes from the
rupted by N (0, 75 ); and (ii) 50% pixels corrupted by N (0, 60 ), 35%
better robustness of CDNet to noise model inconsistencies.
pixels corrupted by N (0, 70 ) and 15% pixels corrupted by N (0, 80 ).
The second type is the spatially-varying light AWGN/uniform noise.
Each pixel is either degraded with the AWGN of a small variance, 4.4. More discussions
or the uniform noise in the range of [−s, s]. We fix the AWGN to
be N (0, 1 ) on 20% pixels and N (0, 0.02 ) on 70% pixels and set 4.4.1. Effectiveness of 1D convolution
s = 5, 10, 15. Recall that in CDNet the convolutional layers of residual blocks
In the blind setting, it is complicated to generate the training are built upon 1D complex-valued convolutions. We replace such
images that contain the above spatially-mixed noises of all possible convolutional layers with the ones that directly use 2D complex-
combinations. Therefore, we do not re-train our model but instead valued convolutions and then test the performance of the mod-
directly use the one trained for blind AWGN removal during the ified CDNet in nonblind AWGN removal. On all the results the
test. This indeed tests the generalizability and transferability of a PSNR changes are bounded in [−0.06dB, 0.06dB]. In addition, the
trained CNN to the processing of other noise models. original CDNet outperforms the modified one on more than half
the results. In other words, the complex-valued CNN allows us-
4.3.2. Results and comparison ing compact 1D convolutions which have comparable expressibility
For comparison, we select BM3D [20], WNNM [24], EPLL and even better generalization performance than the 2D ones. See
[28] and DnCNN [36]. The former three are the training-free meth- Table 11 for some results.
ods and we run their published codes with their parameters finely
tuned. Regarding DnCNN, for fair comparison with ours, we also 4.4.2. ReLU selection
use its pre-trained model in the blind AWGN removal for the test. The definition of the CReLU is not unique. Recall that there are
Table 10 summarizes the PSNR results on the removal of spatially- another two choices: ModReLU defined by (16) and zReLU defined
varying noises. Some denoising results are shown in Figs. 10 and by (17). We are interested in how these CReLUs perform in denois-
11. Both the PSNR results and visual results have demonstrated the ing. Thus, we replace all CReLUs in CDNet with the ModReLUs and
superior performance of CDNet over other approaches when gener- zReLUs respectively, and re-conduct the denoising experiments in
alized for handling the noise with different characteristic from that blind AWGN removal.

9
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

Fig. 10. Denoising results of an image from BSD68 dataset in spatially-varying AWGN removal with Setting 1.

Fig. 11. Denoising results of an image from BSD68 dataset in spatially-varying AWGN removal with Setting 2.

Fig. 12. Denoising results of image “Lena” in blind AWGN removal with noise level σ = 60.

See Table 11 for some results. The ModReLU performed worse inal CDNet on the Set12/BSD68 dataset. See Table 11 for the re-
that CReLU with 0.15dB-0.3dB PSNR gap. The reason is probably sults and Fig. 12 for some visual comparison. While a larger real-
that the ModReLU keeps the phase unchanged, which limits the valued model can gain better expressibility, the side-effect is possi-
expressibility of the complex-valued CNN for denoising. Note that ble overfitting. In contrast, complex-valued NNs implicitly impose
the phase indeed encodes the main image structures and may be additional regularizations on the convolution processes, which is
corrupted by noises, which should be deliberately treated in de- helpful for alleviating the overfitting. In other words, CDNet is not
noising. The zReLU performed even much worse than CReLU. We a simple double-dimension real-valued CNN; it has its own specific
note the expressibility of zReLU is not as good as CReLU, consider- characteristics. Such characteristics can lead to better denoising re-
ing that zReLU generates only two different patterns in the ring of sults over its real-valued counterpart. We also evaluated the run-
phase, with limited operations on the phase. Recall from Fig. 2 that ning time. The results show that CDNet is 1.4 times slower than
CReLU generates four different patterns with richer phase opera- its real-valued counterpart.
tions. Another disadvantage of zReLU is that its implementation is
more complicated than the other two ReLUs, which may increase
5. Conclusion
the difficulty in optimizing the resulting loss.
In this paper, we proposed the CDNet, a complex-valued CNN
4.4.3. Benefits of complex-valued architecture for image denoising. Introducing complex-valued essential opera-
We evaluate the benefits of using complex numbers in denois- tions to the CNN-based denoiser has several merits: compact form
ing CNN by comparing it to a real-valued version of CDNet. The of 2D non-separable convolution, non-linear activation on phase,
real-valued version is constructed by replacing all complex-valued and better noise robustness of residual blocks. By exploiting these
units in CDNet with the real-valued ones. The number of channels merits in the proposed CDNet, the CDNet showed its good per-
in each convolution is doubled for fairness. The comparison is done formance on non-blind AWGN removal, as well as its advantages
on the blind AWGN removal with σ = 60, in which the PSNR re- on blind AWGN removal and blind removal of noise with spatially-
sult of the real-valued version is 0.35dB/0.28dB less than the orig- varying standard deviations.

10
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

In the past, many studies have shown that complex-valued [12] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J.F. Santos,
CNNs can benefit high-level vision tasks such as image recogni- S. Mehri, N. Rostamzadeh, Y. Bengio, C.J. Pal, Deep complex networks, in: Proc.
Int. Conf. Learning Representations, 2018.
tion, but none has been conducted to investigate their potentials [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
in low-level vision tasks. Our work is the first one that showed the in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2016, pp. 770–778.
potential of the complex-valued CNN in a fundamental low-level [14] Y. Zhang, H. Cheng, J. Huang, X. Tang, An effective and objective criterion for
evaluating the performance of denoising filters, Pattern Recognit. 45 (7) (2012)
task, i.e. image denoising. The results in this paper provide strong 2743–2757.
inspirations to the development of complex-valued CNNs for other [15] G. Chen, B. Kégl, Image denoising with complex ridgelets, Pattern Recognit. 40
low-level vision tasks. Though our method was only tested on real- (2) (2007) 578–585.
[16] G. Chen, T. Bui, A. Krzyźak, Image denoising with neighbour dependency and
valued images, with small modifications it can be directly applied
customized wavelet and threshold, Pattern Recognit. 38 (1) (2005) 115–124.
to processing complex-valued signals. [17] Z. Hou, Adaptive singular value decomposition in wavelet domain for image
In future, we would like to extend the proposed CDNet to solv- denoising, Pattern Recognit. 36 (8) (2003) 1747–1763.
[18] M. Elad, M. Aharon, Image denoising via sparse and redundant representations
ing other image recovery problems, especially the ones involving
over learned dictionaries, IEEE Trans. Image Process. 15 (12) (2006) 3736–3745.
complex-valued images. In addition, we would like to further re- [19] J. Wang, M. Wang, X. Hu, S. Yan, Visual data denoising with a unified Schat-
fine the architecture and operations of the CDNet to have more ten-p norm and Lq norm regularized principal component pursuit, Pattern
performance gain in image recovery. One possible direction is de- Recognit. 48 (10) (2015) 3135–3144.
[20] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3-d
signing other non-linear activation functions on phase and intro- transform-domain collaborative filtering, IEEE Trans. Image Process. 16 (8)
ducing convolution-based operations on phase. The design of such (2007) 2080–2095.
functions and operations is a challenging task. Recall that the di- [21] Z. Sun, S. Chen, L. Qiao, A general non-local denoising model using multi-ker-
nel-induced measures, Pattern Recognit. 47 (4) (2014) 1751–1763.
rect calculation of phase is not numerically stable when its corre- [22] Y. Quan, H. Ji, Z. Shen, Data-driven multi-scale non-local wavelet frame con-
sponding value is small and is wrapped to [0, 2π ] (or [−π , π ]) for struction and image recovery, J. Sci. Comput. 63 (2) (2015) 307–329.
resolving the periodicity ambiguity of phase. As a result, the NN [23] H. Li, C.Y. Suen, A novel non-local means image denoising method based on
grey theory, Pattern Recognit. 49 (2016) 237–248.
will suffer from possible instability of back propagation and related [24] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, L. Zhang, Weighted nuclear norm min-
computational issues for gradient-descend-based training. Thus, it imization and its applications to low level vision, Int. J. Comput. Vis. 121 (2)
is a better option to design the activation functions and operations (2017) 183–208.
[25] J. Xu, L. Zhang, D. Zhang, A trilateral weighted sparse coding scheme for real–
on phase without explicitly calling phase as input. As phase has
world image denoising, in: Proc. European Conf. Comput. Vision, 2018.
clear physical meaning, it is highly non-trivial to design such func- [26] J. Portilla, V. Strela, M.J. Wainwright, E.P. Simoncelli, Image denoising using
tions and operations with strong physical motivations. scale mixtures of Gaussians in the wavelet domain, IEEE Trans. Image Process.
12 (11) (2003) 1338–1351.
[27] S. Roth, M.J. Black, Fields of experts, Proc. Int. J. Comput. Vis. 82 (2) (2009)
Declaration of Competing Interest 205.
[28] D. Zoran, Y. Weiss, From learning models of natural image patches to
whole image restoration, in: Proc. IEEE Int. Conf. Comput. Vision, IEEE, 2011,
The authors declare that they have no known competing finan-
pp. 479–486.
cial interests or personal relationships that could have appeared to [29] J. Xu, L. Zhang, W. Zuo, D. Zhang, X. Feng, Patch group based nonlocal self-
influence the work reported in this paper. -similarity prior learning for image denoising, in: Proc. IEEE Int. Conf. Comput.
Vision, 2015, pp. 244–252.
[30] U. Schmidt, S. Roth, Shrinkage fields for effective image restoration, in: Proc.
Acknowledgments IEEE Conf. Comput. Vision Pattern Recognition, 2014, pp. 2774–2781.
[31] Y. Chen, T. Pock, Trainable nonlinear reaction diffusion: a flexible framework
for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell.
This work is supported by National Natural Science Founda-
39 (6) (2017) 1256–1272.
tion of China (61872151, U1611461), Natural Science Foundation of [32] V. Jain, S. Seung, Natural image denoising with convolutional networks, in: Ad-
Guangdong Province (2020A1515011128), Science and Technology vances in Neural Inform. Process. Syst., 2009, pp. 769–776.
[33] H.C. Burger, C.J. Schuler, S. Harmeling, Image denoising: can plain neural net-
Program of Guangzhou (201802010055). Ji Hui also would like to
works compete with BM3D? in: Proc. IEEE Conf. Comput. Vision Pattern Recog-
acknowledge the support of Singapore MOE AcRF (MOE2017-T2- nition, IEEE, 2012, pp. 2392–2399.
2-156). The authors would like to thank Peikang Lin from South [34] F. Agostinelli, M.R. Anderson, H. Lee, Adaptive multi-column deep neural net-
China University of Technology for his help on the experiments. works with application to robust image denoising, in: Advances in Neural In-
form. Process. Syst., 2013, pp. 1493–1501.
[35] R. Vemulapalli, O. Tuzel, M.-Y. Liu, Deep gaussian conditional random field net-
References work: a model-based deep network for discriminative denoising, in: Proc. IEEE
Conf. Comput. Vision Pattern Recognition, 2016, pp. 4801–4809.
[1] I. Hong, Y. Hwang, D. Kim, Efficient deep learning of image denoising using [36] K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a gaussian denoiser:
patch complexity local divide and deep conquer, Pattern Recognit. 96 (2019) residual learning of deep CNN for image denoising, IEEE Trans. Image Proc. 26
106945. (7) (2017) 3142–3155.
[2] K. Zhang, W. Zuo, S. Gu, L. Zhang, Learning deep CNN denoiser prior for im- [37] K. Zhang, W. Zuo, L. Zhang, FFDNet: toward a fast and flexible solution for CNN
age restoration, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, vol. 2, based image denoising, IEEE Trans. Image Proc. (2018).
2017. [38] J. Chen, J. Chen, H. Chao, M. Yang, Image blind denoising with generative ad-
[3] N. Guberman, On complex valued convolutional neural networks, arXiv:1602. versarial network based noise modeling, in: Proc. IEEE Conf. Comput. Vision
09046(2016). Pattern Recognition, 2018, pp. 3155–3164.
[4] T. Nitta, On the critical points of the complex-valued neural network, in: Proc. [39] S. Lefkimmiatis, Universal denoising networks: a novel CNN architecture for
Int. Conf. Neural Info. Process., vol. 3, IEEE, 2002, pp. 1099–1103. image denoising, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2018,
[5] A. Hirose, S. Yoshida, Generalization characteristics of complex-valued feedfor- pp. 3204–3213.
ward neural networks in relation to signal coherence, IEEE Trans. Neural Netw. [40] A. Hirose, Complex-valued neural networks: an introduction, in: Complex-Val-
Learn. Syst. 23 (4) (2012) 541–551. ued Neural Netw.: Theories and App., World Scientific, 2003, pp. 1–6.
[6] I. Danihelka, G. Wayne, B. Uria, N. Kalchbrenner, A. Graves, Associative long [41] E. Oyallon, S. Mallat, Deep roto-translation scattering for object classifica-
short-term memory, arXiv:1602.03032(2016). tion, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2015, pp. 2865–
[7] B.M. Dow, et al., Functional classes of cells and their laminar distribution in 2873.
monkey visual cortex, J. Neurophysiol. 37 (1974) 927–946. [42] M. Tygert, J. Bruna, S. Chintala, Y. LeCun, S. Piantino, A. Szlam, A mathematical
[8] J.L. Gallant, J. Braun, D.C. Van Essen, Selectivity for polar, hyperbolic, and carte- motivation for complex-valued convolutional networks, Neural Comput. 28 (5)
sian gratings in macaque visual cortex, Science 259 (5091) (1993) 100–103. (2016) 815–825.
[9] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency, [43] S. Chintala, A. Szlam, Y. Tian, M. Tygert, W. Zaremba, et al., Scale-invariant
and orientation optimized by two-dimensional visual cortical filters, J. Opt. learning and convolutional networks, Appl. Comput. Harmonic Anal. 42 (1)
Soc. Am. A 2 (7) (1985) 1160–1169. (2017) 154–166.
[10] I.E. Gordon, Theories of Visual Perception, Psychology press, 2004. [44] M. Wilmanski, C. Kreucher, A. Hero, Complex input convolutional neural net-
[11] A.V. Oppenheim, J.S. Lim, The importance of phase in signals, Proc. IEEE 69 (5) works for wide angle SAR ATR, in: IEEE Global Conf. Signal Inf. Process., IEEE,
(1981) 529–541. 2016, pp. 1037–1041.

11
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639

[45] D.E. Worrall, S.J. Garbin, D. Turmukhambetov, G.J. Brostow, Harmonic net- Huan Teng received the [Link]. degree in computer science from South China Uni-
works: deep translation and rotation equivariance, in: Proc. IEEE Conf. Comput. versity of Technology in 2018. He is currently a M.A candidate in South China
Vision Pattern Recognition, IEEE, 2017, pp. 7168–7177. University of Technology. Her research interests include computer vision, image pro-
[46] C. Godard, K. Matzen, M. Uyttendaele, Deep burst denoising, in: Proc. European cessing, and sparse coding.
Conf. Comput. Vision, 2018.
Yong Xu received the B.S., M.S., and Ph.D. degrees in mathematics from Nanjing
Yuhui Quan received the Ph.D. degree in computer science from South China Uni- University, Nanjing, China, in 1993, 1996, and 1999, respectively. He was a Post-
versity of Technology in 2013. He worked as the postdoctoral research fellow in doctoral Research Fellow of computer science with South China University of Tech-
Mathematics at National University of Singapore from 2013 to 2016. He is currently nology, Guangzhou, China, from 1999 to 2001, where he became a Faculty Member
the associate professor at School of Computer Science and Engineering in South and where he is currently a Professor with the School of Computer Science and En-
China University of Technology. His research interests include computer vision, im- gineering. His current research interests include image analysis, video recognition,
age processing and sparse representation. and image quality assessment.

Yixin Chen received the [Link]. degree in network engineering from South China Hui Ji received the [Link]. degree in mathematics from Nanjing University in China,
University of Technology in 2017. He is currently a M.A candidate in South China the [Link]. degree in Mathematics from National University of Singapore and the
University of Technology. His research interests include computer vision, image pro- Ph.D. degree in Computer Science from the University of Maryland, College Park. In
cessing, and sparse coding. 2006, he joined National University of Singapore as an assistant professor in Mathe-
matics. Currently, he is an associate professor in mathematics at National University
Yizhen Shao received the [Link]. degree in computer science from South China Uni- of Singapore. His research interests include computational harmonic analysis, opti-
versity of Technology in 2018. He is currently a M.A candidate in South China Uni- mization, computational vision, image processing and biological imaging.
versity of Technology. Her research interests include computer vision, image pro-
cessing, and sparse coding.

12

You might also like