Complex CNNs for Image Denoising
Complex CNNs for Image Denoising
Pattern Recognition
journal homepage: [Link]/locate/patcog
a r t i c l e i n f o a b s t r a c t
Article history: While complex-valued transforms have been widely used in image processing and have their deep con-
Received 13 February 2020 nections to biological vision systems, complex-valued convolutional neural networks (CNNs) have not
Revised 3 August 2020
seen their applications in image recovery. This paper aims at investigating the potentials of complex-
Accepted 6 September 2020
valued CNNs for image denoising. A CNN is developed for image denoising with its key mathematical
Available online 16 September 2020
operations defined in the complex number field to exploit the merits of complex-valued operations, in-
Keywords: cluding the compactness of convolution given by the tensor product of 1D complex-valued filters, the
Complex-valued operations nonlinear activation on phase, and the noise robustness of residual blocks. The experimental results
Convolutional neural network show that, the proposed complex-valued denoising CNN performs competitively against existing state-
Image denoising of-the-art real-valued denoising CNNs, with better robustness to possible inconsistencies of noise models
Deep learning between training samples and test images. The results also suggest that complex-valued CNNs provide
another promising deep-learning-based approach to image denoising and other image recovery tasks.
© 2020 Elsevier Ltd. All rights reserved.
[Link]
0031-3203/© 2020 Elsevier Ltd. All rights reserved.
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
2
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Fig. 2. Activation on the amplitude and the phase of the input using CReLU.
1.4. Organization of the work deep CNN called DnCNN, with residual learning for blind denois-
ing. The DnCNN is trained to map noisy images to the noise, which
The remaining part of the paper is organized as follows. In helps the robustness of the NN to different noise levels. Nowadays,
Section 2, we give a literature review on related image denoising the DnCNN has become the benchmark method for evaluating
methods and existing complex-valued NNs designed for different CNN-based denoisers. Zhang et al.[37] further extended their work
applications. The proposed complex-valued denoising CNN is pre- to deal with spatially-varying noise, which is done by combining a
sented in Section 3 with all details. In Section 4, the experiments tunable noise level map into the input of the CNN. To obtain more
are conducted for the evaluation of the proposed method and the training images for blind denoising, Chen et al.[38] trained a gener-
comparison to other closely-related methods. Section 5 concludes ative adversarial network (GAN) that estimates the noise distribu-
the paper. tion over the input noisy images and generates noisy image sam-
ples as additional training data. Lefkimmiatis et al. [39] inserted
non-local filtering layers into the CNN to exploit the inherent patch
2. Related work recurrence of natural images. All the above CNNs are real-valued
CNNs.
2.1. Image denoising
2.2. Complex-valued convolutional networks
There is abundant literature on image denoising. This section
focuses more on the discussion on the deep-learning-based ap- The early works on complex-valued NNs mainly focus on ad-
proaches. The very early approaches modeled image denoising as dressing the basics of learning; see [3,5,40] for more details.
either a filtering problem or a diffusion process; see [14] for a In recent years, there have been extensive studies on complex-
comprehensive survey. In the last two decades, sparsity-based reg- valued CNNs. Oyallon and Mallat [41] constructed a learning-free
ularizations have become one preferred choice for image denois- CNN with well-designed complex-valued wavelet filters. The re-
ing, which regularize a noise-free image by exploiting the sparsity sulting complex-valued CNN has its mathematical treatment, but
prior of image under certain transforms, such as complex-valued with limited adaptivity as it is not learnable. Then, some math-
ridgelet transform [15], wavelet [16] and adaptive dictionaries [17– ematical understandings were presented in [42] for a trainable
19]. Another prominent approach is non-local methods (e.g. [20– complex-valued CNN. The practical techniques for building train-
23]), which are based on the patch recurrence prior of natural im- able complex-valued CNNs were comprehensively studied and dis-
ages. The BM3D method [20] is arguably the most popular one cussed in [3,12]. For gaining certain invariance, several complex-
which applies collaborative filtering to similar patches. The WNNM valued CNNs were developed. Chintala et al.[43] proposed a
[24] and TWSC [25] are another two popular non-local methods complex-valued CNN with scale invariance. A similar architecture
that exploit the low rank structures of similar patches for denois- was proposed in [44]. Worrall et al.[45] replaced regular CNN fil-
ing. ters with circular harmonics for the invariance to complicated rota-
Instead of using the sparsity prior or the patch recurrence tions. We note that the applications of above complex-valued CNNs
prior, some approaches learn image priors from visual data. Por- all focus on recognition tasks.
tilla et al.[26] proposed to learn a Gaussian scale mixture model on
the wavelet coefficients of natural images. Roth et al.[27] proposed 3. Proposed method
to learn a high-order Markov random field for modeling natural
images. The classic EPLL approach [28] learns a Gaussian mixture 3.1. Framework
model of image patches. Xu et al.[29] proposed to learn the dis-
tribution prior on similar patch groups. Instead of learning image The complex-valued CNN proposed in this paper is called
priors, an alternative approach is to directly learn the denoising CDNet (Complex-valued Denoising Network). The CDNet maps a
process. Schmidt et al.[30] unfolded the variational model of im- noisy image Y to a noise-free image X:
age denoising into a process with learnable parameters of filters
CDNet : Y ∈ RM1 ×M2 → X ∈ RM1 ×M2 . (4)
and of shrinkage. Chen et al.[31] turned the diffusion process to a
trainable one. Such approaches indeed can be viewed as training See Fig. 3 for the outline of CDNet. Briefly, the input image is
an NN for denoising. passed to 24 sequentially-connected convolutional units. Each con-
Recently, many NN-based image denoisers have been pro- volutional unit except the first one contains a complex-valued con-
posed [1]. Jain et al.[32] trained a shallow CNN for denoising. volutional layer which is sequentially followed by the complex-
Burger et al.[33] trained a multi-layer perceptron to denoise im- valued batch normalization and the complex-valued ReLU. All
age patches. Agostinelli [34] trained a denoising auto-encoder for the convolutional layers use 64 convolutional kernels. The mid-
removing different types of noises. Vemulapalli et al.[35] unfolded dle 18 convolutional units are implemented as 9 residual blocks
the Gaussian conditional random field to a deep NN with auto- [13] equipped with complex-valued representations for better per-
matic estimation of noise variance. Zhang et al.[36] proposed a formance and faster convergence [13]. To enlarge the receptive
3
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Fig. 3. Diagram of framework of CDNet. Abbreviations: Conv for Convolution, ↓2 for downsampling by stride 2, BN for Batch Normalization, RB for Residual Block, and ReLU
for Rectified Linear Unit.
field for further improvement and better computational efficiency, lutional layer is the mapping of
we adopt a convolutional/deconvolutional layer with stride 2 in the
convolutional units before the first residual block and after the last (· ) ∈ RN1 ×N2 ×D (· ) ∈ RN1 ×N2 ×D
→ , (9)
residual block for the function of down-scaling/upscaling the fea- (· ) ∈ RN1 ×N2 ×D (· ) ∈ RN1 ×N2 ×D
ture maps. Finally, a merging layer is employed to transform the
which allows using existing real-number-based toolboxes. It is
complex-valued features from the previous convolutional unit to a
noted that such an implementation is similar to a double-width
real-valued image. We also use the skip connection to connect the
CNN. The main difference is that the convolution in complex num-
input of the first residual block with the output of the last resid-
ber field introduces additional interactions between the real part
ual block for preserving image details. It can be seen that there
and the imaginary part of a complex-valued feature. Such interac-
are mainly five basic blocks in CDNet: (i) complex-valued convolu-
tions can be implemented in a real-valued NN with additional con-
tional layer; (ii) complex-valued ReLU; (iii) complex-valued batch
nections, but no real-valued NNs will do such a connection without
normalization; (iv) complex-valued residual block; and (v) merg-
the motivations from complex-valued convolutions.
ing layer.
The back-propagation about the complex-valued convolution
kernels is similar to that of their real-valued counterparts, except
3.2. Complex-valued convolutional layer that the related operations are defined on complex numbers. More
specifically, let K, A denote a complex-valued kernel and an input
The complex-valued convolutional layer is constructed by sim- complex-valued feature map respectively. Let B = A ∗ K and f(B) is
ply replacing the real-valued kernel by the complex-valued kernel a scalar function on B. This sufficiently covers the calculation of
in the convolution process. The layer takes a complex-valued fea- the gradients encountered in the training of complex-valued CNNs.
ture cube A as input and outputs another complex-valued feature By the chain rule in complex analysis, we have
cube A˜ :
∂ f (B ) ∂ f (B ) ∂ B ∂ f (B )
Conv : A ∈ CN1 ×N2 ×D1 → A˜ ∈ CN1 ×N2 ×D2 . = = ∗ A. (10)
(5) ∂K ∂B ∂K ∂B
More specifically, the layer is composed of D2 convolution opera- Note that ∂ ∂f (BB) and A are both complex-valued, and thus ∂ ∂f (KB) is
D2
tions with the complex-valued filters {K i ∈ CL×L×D1 }i=1 , which ex- also complex-valued with the form: ∂ ∂f (KB) = ( ∂ ∂f (KB) ) + i · ( ∂ ∂f (KB) ).
tends through the full depth of A (i.e. D1 ). During the forward pass, Based on (8) we have
each kernel Ki is convolved across the width and height of A as
follows: ∂ f (B ) ∂ f (B ) ∂ f (B )
= ∗ (A ) − ∗ ( A ), (11)
(A ∗ K i )(x, y ) = A ( x − x0 , y − y0 , z0 )K i ( x0 , y0 , z0 ), (6) ∂K ∂B ∂B
x0 ,y0 ,z0 ∂ f (B ) ∂ f (B ) ∂ f (B )
= ∗ (A ) + ∗ ( A ). (12)
which produces a 2-dimensional feature map regarding Ki . Stack- ∂K ∂B ∂B
ing the feature maps for all filters along the depth dimension
forms the full output. In practice, we set L = 3 for all the convo-
lutional layers. 3.3. CReLU for complex numbers
In all residual blocks, the 2D convolution is implemented by
two consecutive 1D convolutions. Concretely, we use the following The ReLU is arguably the prominent choice for the activation
scheme: functions in CNNs for image recovery, which is also used in CDNet.
1 2
Same as the real-valued one, the complex-valued ReLU (CReLU) ac-
A ∗ K i → A ∗ k i ∗ ki , (7) tivation is an element-wise mapping denoted by
where k1i
∈ CL×1×D , k2i
∈ C1×L×D .
Such a factorization represents CReLU : CN1 ×N2 ×D → CN1 ×N2 ×D , (13)
the 2D convolution in a more compact way, i.e., the number of pa-
rameters of each convolutional layer is reduced from DL2 to 2DL,
where D is number of channels. CReLU(A )(k ) = CReLU(A(k )). (14)
Many existing CNN toolboxes do not support complex-valued There are many choices for these CReLUs. In this paper, we propose
convolutions. We implement the complex-valued convolution us- to use the CReLU [12] as the CReLU that enables sophisticated non-
ing the real-valued convolutions available in existing toolboxes. linear operations on the phase, which is defined by
The complex-valued convolution can be expressed as
CReLU(z ) = ReLU((z )) + i · ReLU((z )), (15)
A ∗ K = ((A ) ∗ (K ) − (A ) ∗ (K ))
for a complex-valued vector z. The CReLU applies the ReLU acti-
+ ((A ) ∗ (K ) + (A ) ∗ (K ))i. (8)
vation on the real part and imaginary part of the input respec-
It can be seen from (8), the complex-valued convolution can be tively. It can be seen from Fig. 2 that the CReLU allows four differ-
implemented by four real-valued convolutions. Thus, each convo- ent patterns in the ring of phase. In addition to CReLU, there are
4
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Residual blocks [13] not only deal with the vanishing gradi- 4.1.1. Test methodology
ents in the back propagation during training, but also benefit the In this setting, we aim at denoising the images degraded by the
preservation of image details in denoising by passing the previous AWGN of known noise levels. Since the noise levels are known, the
features to the subsequent layers. The generalization of residual denoisers are separately trained and tested on different noise lev-
blocks to the complex number field is straightforward. See Fig. 4 els. Regarding the training of CDNet, we follow [31,31,36] for fair
for an illustration of the structure of the complex-valued resid- comparison, which uses 400 BSD images [36] of size 180 × 180.
ual block. It is noted that the residual block shares similarity with Same as [36], the images are cut into patches of size 40 × 40 for
the memory unit, by regarding that the residual is computed and data augmentation, and then 226800 of them are sampled and de-
inserted into the “memory” provided by the identity connection graded by the AWGN for training. Both low and high noise lev-
[6]. It is shown in [6] that introducing complex numbers into the els are used for training, including σ = 15, 25, 35, 50, 60, 70, 75, 80.
memory units could facilitate noise-robust retrieval mechanisms For each noise level, the CDNet model is trained with 70 epochs
on the associative memory. Therefore, the extension of the resid- and with the learning rate decaying from 1 × 10−2 to 1 × 10−3 . The
ual block to complex number field is beneficial to the robustness training time takes around 9 h on each noise level. For test, we se-
of the CNN to noise model inconsistencies. lect two widely-used benchmark datasets including Set12 [36] and
5
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Table 1
Average PSNR(dB) of denoised images by different methods on Set12 in non-blind AWGN
removal.
σ 15 25 35 50 60 70 75 80
Table 2
Average PSNR(dB) of denoised images by different methods on BSD68 in non-blind
AWGN removal.
σ 15 25 35 50 60 70 75 80
Table 3
Average SSIM of denoised images by different methods on Set12 in non-blind AWGN removal.
σ 15 25 35 50 60 70 75 80
BSD68 [27]. There is no overlap between the set of test images and pared methods on both light AWGN and heavy AWGN. We show
the set of training images, and the content of test images has suf- some denoising results in Figs. 5 and 6 for visual comparison.
ficient variations and is different from that of training images. All
the images used in the experiments are gray-scale. With each of 4.2. Blind AWGN removal
the above noise levels, we use the corresponding AWGN to corrupt
the test images, and then the CDNet trained on that level is used 4.2.1. Test methodology
to denoise the noisy images. The peak signal-to-noise ratio (PSNR) In practice, the blind AWGN removal (i.e. denoising without
and structural similarity (SSIM) on the denoised images are used knowing the noise level) is more valuable. We evaluate our CDNet
for quantifying the performance. on the blind AWGN removal on the previously-used Set12 and
BSD68 datasets. The noisy images are generated as follows. Given
4.1.2. Results and comparison a clean image, the noise level σ is randomly picked up from [5,80].
We compare our CDNet with both the classic methods and Then the AWGN with σ is added to the image. For the train-
the state-of-the-art ones, including BM3D [20], WNNM [24], EPLL ing of CDNet for blind denoising, we follow the scheme used in
[28], TNRD [31], DnCNN [36], IRCNN [2], SF-20L [46], UNLNet [39], [39], which divides the noise level σ ∈ [5, 80] into three inter-
TWSC [25] and FFDNet [37]. These methods cover different types vals: [5,30], [30,55] and [55,80]. Then the CDNet is trained on
of image denoisers. Among them, BM3D, WNNM, EPLL and TWSC these intervals respectively. The aforementioned 400 BSD images
are four representative traditional methods, while others are the are used for training. Similar to the previous experiment, the im-
recent deep-learning-based methods. The reported results of the ages are cut into 40 × 40 patches for data augmentation, and
compared methods, whenever possible, are quoted from the pub- then 226800 of them are sampled and degraded by the AWGN
lished works. Or otherwise they are produced by the codes pub- with the randomly-chosen levels in the interval for training. The
lished by the original authors. The test and training (if possible) CDNet model is trained with 140 epochs and with the learning
of these compared methods are done in the same manner as ours, rate decaying from 1 × 10−2 to 1 × 10−4 . The training takes around
which makes the comparison fair. 15 h. In test, we generate the images corrupted by the AWGN with
Tables 1 and 2 show the PSNR values of all the compared meth- σ = 15, 25, 35, 50, 60, 70, 75, 80 respectively and report the results
ods with different noise levels on the Set12 and BSD68 datasets re- for each noise level.
spectively. The corresponding SSIM values are given in Tables 3 and In addition to gray-scale images, we also train our model for
4. It can be seen that our CDNet performs better than other com- denoising color images with known noise levels. We use a color
6
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Table 4
Average SSIM of denoised images by different methods on BSD68 in non-blind AWGN removal.
σ 15 25 35 50 60 70 75 80
Fig. 5. Denoising results of an image from BSD68 dataset in non-blind AWGN removal with noise level σ = 60.
Fig. 6. Denoising results of image “starfish” in non-blind AWGN removal with noise level σ = 60.
version of the Berkeley segmentation dataset, of which 432 color pared methods are quoted from Lefkimmiatis [39] whenever avail-
images are used for training and the remaining 68 images are used able, or otherwise produced by the published codes from the au-
to form the test set. Similar to the gray image denoising, the color thors of the original works. Note that UNLNet has no available re-
images are cut into patches of size 40 × 40 for training. The model sults on Set12 and no published code either, we only compare it
are trained at three different noise levels, including σ = 25, 35, 50, on BSD68. It can be seen that our method is powerful in the task
and the number of epoch per training is fixed at 51. We compare of blind AWGN removal. On both datasets, our CDNet achieved the
our CDNet with CBM3D (BM3D for color images) and DnCNN [36]. best results among all the compared methods. We visualize some
denoising results in Figs. 7 and 8. Compared to other methods,
4.2.2. Results and comparison CDNet can better handle both the textured regions and smooth re-
Results on gray-scale image denoising. We compare our gions. The improvement of CDNet over other compared methods
CDNet with two CNN-based denoisers that support blind denois- has demonstrated benefits of using complex numbers in denoising
ing, including DnCNN [36] and UNLNet [39]. The BM3D [20] is also CNN for better generalization to the processing of unknown noise
used for comparison. The test and training (if possible) of these level which is an often-seen type of noise model inconsistencies. It
compared methods are also done in the same blind manner as can also be observed that the PSNR improvement of CDNet tends
ours. The PSNR results are summarized in Tables 5 and 6, and the to be larger as the noise level increases. The reason is probably
SSIM results are given in Tables 7 and 8. The results of the com- that higher unknown noise levels may cause worse inconsistencies
7
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Table 5
Average PSNR(dB) of denoised images by different methods on Set12 in blind AWGN
removal.
σ 15 25 35 50 60 70 75 80
Table 6
Average PSNR(dB) of denoised images by different methods on BSD68 in blind AWGN
removal.
σ 15 25 35 50 60 70 75 80
Table 7
Average SSIM of denoised images by different methods on Set12 in blind AWGN removal.
σ 15 25 35 50 60 70 75 80
Table 8
Average SSIM of denoised images by different methods on BSD68 in blind AWGN removal.
σ 15 25 35 50 60 70 75 80
Fig. 7. Denoising results on two noisy images in blind AWGN removal with noise level σ = 60.
Fig. 8. Denoising results on two noisy images in blind AWGN removal with noise level σ = 75.
8
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Table 9 Table 10
Average PSNR(dB) and SSIM results of color image denoising by dif- Average PSNR(dB) of denoised images on Set12 and BSD68 datasets by different
ferent methods on BSD68 in non-blind AWGN removal. methods in removing spatially-varying noises.
CBM3D 30.64 28.83 27.31 0.931 0.901 0.870 SET12 BM3D 25.55 25.46 40.48 40.2 37.32
DnCNN 31.31 29.65 28.01 0.884 0.844 0.792 EPLL 25.16 25.05 41.24 40.82 37.89
CDNet 31.34 29.84 28.14 0.937 0.915 0.882 WNNM 24.85 24.74 40.21 39.22 36.19
DnCNN 25.91 25.77 41.14 40.25 39.51
CDNet 26.15 26.04 44.13 42.52 40.66
BSD68 BM3D 24.79 24.68 41.21 40.68 36.72
of noise models and our CDNet has better robustness to those in- EPLL 24.68 24.58 42.55 41.74 37.73
consistencies. WNNM 24.14 24.06 41.02 40.05 35.73
Results on color image denoising. Table 9 shows the PSNR and DnCNN 25.09 24.99 41.89 40.94 39.72
CDNet 25.31 25.22 45.78 43.21 40.74
SSIM values of all the compared methods with different noise lev-
els on the color version of BSD68 datasets respectively. It can be
seen that our CDNet is the top performer among all compared Table 11
methods. We show some denoising results in Fig. 9 for visual com- Average PSNR(dB) by CDNet with different modifications in blind
AWGN denoising with noise level σ = 60. ’Original’: CDNet without
parison. modifications; ’2D Conv’: replace 1D convolutions with 2D ones in
residual blocks; ’ModReLU’: replace CReLU with ModReLU; ’zRELU’:
4.3. Removal of spatially-varying noises replace CReLU with zReLU; Real: replace all complex-valued units
with real-valued ones and with double number of channels.
4.3.1. Test methodology Dataset Original 2D Conv zReLU ModReLU Real
We further evaluate the performance of our CDNet on blindly
Set12 26.46 26.43 22.53 26.23 26.11
removing spatially-varying noises. We use a similar setting to [38], BSD68 25.56 25.55 22.69 25.40 25.28
with two types of spatially-varying noises considered. The first is
the AWGN with spatially-varying high noise levels. We use two
settings: (i) 70% pixels corrupted by N (0, 60 ) and 30% pixels cor-
of training samples. Such improved generalizability comes from the
rupted by N (0, 75 ); and (ii) 50% pixels corrupted by N (0, 60 ), 35%
better robustness of CDNet to noise model inconsistencies.
pixels corrupted by N (0, 70 ) and 15% pixels corrupted by N (0, 80 ).
The second type is the spatially-varying light AWGN/uniform noise.
Each pixel is either degraded with the AWGN of a small variance, 4.4. More discussions
or the uniform noise in the range of [−s, s]. We fix the AWGN to
be N (0, 1 ) on 20% pixels and N (0, 0.02 ) on 70% pixels and set 4.4.1. Effectiveness of 1D convolution
s = 5, 10, 15. Recall that in CDNet the convolutional layers of residual blocks
In the blind setting, it is complicated to generate the training are built upon 1D complex-valued convolutions. We replace such
images that contain the above spatially-mixed noises of all possible convolutional layers with the ones that directly use 2D complex-
combinations. Therefore, we do not re-train our model but instead valued convolutions and then test the performance of the mod-
directly use the one trained for blind AWGN removal during the ified CDNet in nonblind AWGN removal. On all the results the
test. This indeed tests the generalizability and transferability of a PSNR changes are bounded in [−0.06dB, 0.06dB]. In addition, the
trained CNN to the processing of other noise models. original CDNet outperforms the modified one on more than half
the results. In other words, the complex-valued CNN allows us-
4.3.2. Results and comparison ing compact 1D convolutions which have comparable expressibility
For comparison, we select BM3D [20], WNNM [24], EPLL and even better generalization performance than the 2D ones. See
[28] and DnCNN [36]. The former three are the training-free meth- Table 11 for some results.
ods and we run their published codes with their parameters finely
tuned. Regarding DnCNN, for fair comparison with ours, we also 4.4.2. ReLU selection
use its pre-trained model in the blind AWGN removal for the test. The definition of the CReLU is not unique. Recall that there are
Table 10 summarizes the PSNR results on the removal of spatially- another two choices: ModReLU defined by (16) and zReLU defined
varying noises. Some denoising results are shown in Figs. 10 and by (17). We are interested in how these CReLUs perform in denois-
11. Both the PSNR results and visual results have demonstrated the ing. Thus, we replace all CReLUs in CDNet with the ModReLUs and
superior performance of CDNet over other approaches when gener- zReLUs respectively, and re-conduct the denoising experiments in
alized for handling the noise with different characteristic from that blind AWGN removal.
9
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
Fig. 10. Denoising results of an image from BSD68 dataset in spatially-varying AWGN removal with Setting 1.
Fig. 11. Denoising results of an image from BSD68 dataset in spatially-varying AWGN removal with Setting 2.
Fig. 12. Denoising results of image “Lena” in blind AWGN removal with noise level σ = 60.
See Table 11 for some results. The ModReLU performed worse inal CDNet on the Set12/BSD68 dataset. See Table 11 for the re-
that CReLU with 0.15dB-0.3dB PSNR gap. The reason is probably sults and Fig. 12 for some visual comparison. While a larger real-
that the ModReLU keeps the phase unchanged, which limits the valued model can gain better expressibility, the side-effect is possi-
expressibility of the complex-valued CNN for denoising. Note that ble overfitting. In contrast, complex-valued NNs implicitly impose
the phase indeed encodes the main image structures and may be additional regularizations on the convolution processes, which is
corrupted by noises, which should be deliberately treated in de- helpful for alleviating the overfitting. In other words, CDNet is not
noising. The zReLU performed even much worse than CReLU. We a simple double-dimension real-valued CNN; it has its own specific
note the expressibility of zReLU is not as good as CReLU, consider- characteristics. Such characteristics can lead to better denoising re-
ing that zReLU generates only two different patterns in the ring of sults over its real-valued counterpart. We also evaluated the run-
phase, with limited operations on the phase. Recall from Fig. 2 that ning time. The results show that CDNet is 1.4 times slower than
CReLU generates four different patterns with richer phase opera- its real-valued counterpart.
tions. Another disadvantage of zReLU is that its implementation is
more complicated than the other two ReLUs, which may increase
5. Conclusion
the difficulty in optimizing the resulting loss.
In this paper, we proposed the CDNet, a complex-valued CNN
4.4.3. Benefits of complex-valued architecture for image denoising. Introducing complex-valued essential opera-
We evaluate the benefits of using complex numbers in denois- tions to the CNN-based denoiser has several merits: compact form
ing CNN by comparing it to a real-valued version of CDNet. The of 2D non-separable convolution, non-linear activation on phase,
real-valued version is constructed by replacing all complex-valued and better noise robustness of residual blocks. By exploiting these
units in CDNet with the real-valued ones. The number of channels merits in the proposed CDNet, the CDNet showed its good per-
in each convolution is doubled for fairness. The comparison is done formance on non-blind AWGN removal, as well as its advantages
on the blind AWGN removal with σ = 60, in which the PSNR re- on blind AWGN removal and blind removal of noise with spatially-
sult of the real-valued version is 0.35dB/0.28dB less than the orig- varying standard deviations.
10
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
In the past, many studies have shown that complex-valued [12] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J.F. Santos,
CNNs can benefit high-level vision tasks such as image recogni- S. Mehri, N. Rostamzadeh, Y. Bengio, C.J. Pal, Deep complex networks, in: Proc.
Int. Conf. Learning Representations, 2018.
tion, but none has been conducted to investigate their potentials [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
in low-level vision tasks. Our work is the first one that showed the in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2016, pp. 770–778.
potential of the complex-valued CNN in a fundamental low-level [14] Y. Zhang, H. Cheng, J. Huang, X. Tang, An effective and objective criterion for
evaluating the performance of denoising filters, Pattern Recognit. 45 (7) (2012)
task, i.e. image denoising. The results in this paper provide strong 2743–2757.
inspirations to the development of complex-valued CNNs for other [15] G. Chen, B. Kégl, Image denoising with complex ridgelets, Pattern Recognit. 40
low-level vision tasks. Though our method was only tested on real- (2) (2007) 578–585.
[16] G. Chen, T. Bui, A. Krzyźak, Image denoising with neighbour dependency and
valued images, with small modifications it can be directly applied
customized wavelet and threshold, Pattern Recognit. 38 (1) (2005) 115–124.
to processing complex-valued signals. [17] Z. Hou, Adaptive singular value decomposition in wavelet domain for image
In future, we would like to extend the proposed CDNet to solv- denoising, Pattern Recognit. 36 (8) (2003) 1747–1763.
[18] M. Elad, M. Aharon, Image denoising via sparse and redundant representations
ing other image recovery problems, especially the ones involving
over learned dictionaries, IEEE Trans. Image Process. 15 (12) (2006) 3736–3745.
complex-valued images. In addition, we would like to further re- [19] J. Wang, M. Wang, X. Hu, S. Yan, Visual data denoising with a unified Schat-
fine the architecture and operations of the CDNet to have more ten-p norm and Lq norm regularized principal component pursuit, Pattern
performance gain in image recovery. One possible direction is de- Recognit. 48 (10) (2015) 3135–3144.
[20] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3-d
signing other non-linear activation functions on phase and intro- transform-domain collaborative filtering, IEEE Trans. Image Process. 16 (8)
ducing convolution-based operations on phase. The design of such (2007) 2080–2095.
functions and operations is a challenging task. Recall that the di- [21] Z. Sun, S. Chen, L. Qiao, A general non-local denoising model using multi-ker-
nel-induced measures, Pattern Recognit. 47 (4) (2014) 1751–1763.
rect calculation of phase is not numerically stable when its corre- [22] Y. Quan, H. Ji, Z. Shen, Data-driven multi-scale non-local wavelet frame con-
sponding value is small and is wrapped to [0, 2π ] (or [−π , π ]) for struction and image recovery, J. Sci. Comput. 63 (2) (2015) 307–329.
resolving the periodicity ambiguity of phase. As a result, the NN [23] H. Li, C.Y. Suen, A novel non-local means image denoising method based on
grey theory, Pattern Recognit. 49 (2016) 237–248.
will suffer from possible instability of back propagation and related [24] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, L. Zhang, Weighted nuclear norm min-
computational issues for gradient-descend-based training. Thus, it imization and its applications to low level vision, Int. J. Comput. Vis. 121 (2)
is a better option to design the activation functions and operations (2017) 183–208.
[25] J. Xu, L. Zhang, D. Zhang, A trilateral weighted sparse coding scheme for real–
on phase without explicitly calling phase as input. As phase has
world image denoising, in: Proc. European Conf. Comput. Vision, 2018.
clear physical meaning, it is highly non-trivial to design such func- [26] J. Portilla, V. Strela, M.J. Wainwright, E.P. Simoncelli, Image denoising using
tions and operations with strong physical motivations. scale mixtures of Gaussians in the wavelet domain, IEEE Trans. Image Process.
12 (11) (2003) 1338–1351.
[27] S. Roth, M.J. Black, Fields of experts, Proc. Int. J. Comput. Vis. 82 (2) (2009)
Declaration of Competing Interest 205.
[28] D. Zoran, Y. Weiss, From learning models of natural image patches to
whole image restoration, in: Proc. IEEE Int. Conf. Comput. Vision, IEEE, 2011,
The authors declare that they have no known competing finan-
pp. 479–486.
cial interests or personal relationships that could have appeared to [29] J. Xu, L. Zhang, W. Zuo, D. Zhang, X. Feng, Patch group based nonlocal self-
influence the work reported in this paper. -similarity prior learning for image denoising, in: Proc. IEEE Int. Conf. Comput.
Vision, 2015, pp. 244–252.
[30] U. Schmidt, S. Roth, Shrinkage fields for effective image restoration, in: Proc.
Acknowledgments IEEE Conf. Comput. Vision Pattern Recognition, 2014, pp. 2774–2781.
[31] Y. Chen, T. Pock, Trainable nonlinear reaction diffusion: a flexible framework
for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell.
This work is supported by National Natural Science Founda-
39 (6) (2017) 1256–1272.
tion of China (61872151, U1611461), Natural Science Foundation of [32] V. Jain, S. Seung, Natural image denoising with convolutional networks, in: Ad-
Guangdong Province (2020A1515011128), Science and Technology vances in Neural Inform. Process. Syst., 2009, pp. 769–776.
[33] H.C. Burger, C.J. Schuler, S. Harmeling, Image denoising: can plain neural net-
Program of Guangzhou (201802010055). Ji Hui also would like to
works compete with BM3D? in: Proc. IEEE Conf. Comput. Vision Pattern Recog-
acknowledge the support of Singapore MOE AcRF (MOE2017-T2- nition, IEEE, 2012, pp. 2392–2399.
2-156). The authors would like to thank Peikang Lin from South [34] F. Agostinelli, M.R. Anderson, H. Lee, Adaptive multi-column deep neural net-
China University of Technology for his help on the experiments. works with application to robust image denoising, in: Advances in Neural In-
form. Process. Syst., 2013, pp. 1493–1501.
[35] R. Vemulapalli, O. Tuzel, M.-Y. Liu, Deep gaussian conditional random field net-
References work: a model-based deep network for discriminative denoising, in: Proc. IEEE
Conf. Comput. Vision Pattern Recognition, 2016, pp. 4801–4809.
[1] I. Hong, Y. Hwang, D. Kim, Efficient deep learning of image denoising using [36] K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a gaussian denoiser:
patch complexity local divide and deep conquer, Pattern Recognit. 96 (2019) residual learning of deep CNN for image denoising, IEEE Trans. Image Proc. 26
106945. (7) (2017) 3142–3155.
[2] K. Zhang, W. Zuo, S. Gu, L. Zhang, Learning deep CNN denoiser prior for im- [37] K. Zhang, W. Zuo, L. Zhang, FFDNet: toward a fast and flexible solution for CNN
age restoration, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, vol. 2, based image denoising, IEEE Trans. Image Proc. (2018).
2017. [38] J. Chen, J. Chen, H. Chao, M. Yang, Image blind denoising with generative ad-
[3] N. Guberman, On complex valued convolutional neural networks, arXiv:1602. versarial network based noise modeling, in: Proc. IEEE Conf. Comput. Vision
09046(2016). Pattern Recognition, 2018, pp. 3155–3164.
[4] T. Nitta, On the critical points of the complex-valued neural network, in: Proc. [39] S. Lefkimmiatis, Universal denoising networks: a novel CNN architecture for
Int. Conf. Neural Info. Process., vol. 3, IEEE, 2002, pp. 1099–1103. image denoising, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2018,
[5] A. Hirose, S. Yoshida, Generalization characteristics of complex-valued feedfor- pp. 3204–3213.
ward neural networks in relation to signal coherence, IEEE Trans. Neural Netw. [40] A. Hirose, Complex-valued neural networks: an introduction, in: Complex-Val-
Learn. Syst. 23 (4) (2012) 541–551. ued Neural Netw.: Theories and App., World Scientific, 2003, pp. 1–6.
[6] I. Danihelka, G. Wayne, B. Uria, N. Kalchbrenner, A. Graves, Associative long [41] E. Oyallon, S. Mallat, Deep roto-translation scattering for object classifica-
short-term memory, arXiv:1602.03032(2016). tion, in: Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2015, pp. 2865–
[7] B.M. Dow, et al., Functional classes of cells and their laminar distribution in 2873.
monkey visual cortex, J. Neurophysiol. 37 (1974) 927–946. [42] M. Tygert, J. Bruna, S. Chintala, Y. LeCun, S. Piantino, A. Szlam, A mathematical
[8] J.L. Gallant, J. Braun, D.C. Van Essen, Selectivity for polar, hyperbolic, and carte- motivation for complex-valued convolutional networks, Neural Comput. 28 (5)
sian gratings in macaque visual cortex, Science 259 (5091) (1993) 100–103. (2016) 815–825.
[9] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency, [43] S. Chintala, A. Szlam, Y. Tian, M. Tygert, W. Zaremba, et al., Scale-invariant
and orientation optimized by two-dimensional visual cortical filters, J. Opt. learning and convolutional networks, Appl. Comput. Harmonic Anal. 42 (1)
Soc. Am. A 2 (7) (1985) 1160–1169. (2017) 154–166.
[10] I.E. Gordon, Theories of Visual Perception, Psychology press, 2004. [44] M. Wilmanski, C. Kreucher, A. Hero, Complex input convolutional neural net-
[11] A.V. Oppenheim, J.S. Lim, The importance of phase in signals, Proc. IEEE 69 (5) works for wide angle SAR ATR, in: IEEE Global Conf. Signal Inf. Process., IEEE,
(1981) 529–541. 2016, pp. 1037–1041.
11
Y. Quan, Y. Chen, Y. Shao et al. Pattern Recognition 111 (2021) 107639
[45] D.E. Worrall, S.J. Garbin, D. Turmukhambetov, G.J. Brostow, Harmonic net- Huan Teng received the [Link]. degree in computer science from South China Uni-
works: deep translation and rotation equivariance, in: Proc. IEEE Conf. Comput. versity of Technology in 2018. He is currently a M.A candidate in South China
Vision Pattern Recognition, IEEE, 2017, pp. 7168–7177. University of Technology. Her research interests include computer vision, image pro-
[46] C. Godard, K. Matzen, M. Uyttendaele, Deep burst denoising, in: Proc. European cessing, and sparse coding.
Conf. Comput. Vision, 2018.
Yong Xu received the B.S., M.S., and Ph.D. degrees in mathematics from Nanjing
Yuhui Quan received the Ph.D. degree in computer science from South China Uni- University, Nanjing, China, in 1993, 1996, and 1999, respectively. He was a Post-
versity of Technology in 2013. He worked as the postdoctoral research fellow in doctoral Research Fellow of computer science with South China University of Tech-
Mathematics at National University of Singapore from 2013 to 2016. He is currently nology, Guangzhou, China, from 1999 to 2001, where he became a Faculty Member
the associate professor at School of Computer Science and Engineering in South and where he is currently a Professor with the School of Computer Science and En-
China University of Technology. His research interests include computer vision, im- gineering. His current research interests include image analysis, video recognition,
age processing and sparse representation. and image quality assessment.
Yixin Chen received the [Link]. degree in network engineering from South China Hui Ji received the [Link]. degree in mathematics from Nanjing University in China,
University of Technology in 2017. He is currently a M.A candidate in South China the [Link]. degree in Mathematics from National University of Singapore and the
University of Technology. His research interests include computer vision, image pro- Ph.D. degree in Computer Science from the University of Maryland, College Park. In
cessing, and sparse coding. 2006, he joined National University of Singapore as an assistant professor in Mathe-
matics. Currently, he is an associate professor in mathematics at National University
Yizhen Shao received the [Link]. degree in computer science from South China Uni- of Singapore. His research interests include computational harmonic analysis, opti-
versity of Technology in 2018. He is currently a M.A candidate in South China Uni- mization, computational vision, image processing and biological imaging.
versity of Technology. Her research interests include computer vision, image pro-
cessing, and sparse coding.
12