0% found this document useful (0 votes)
21 views15 pages

Feature Detection 4

Uploaded by

Golu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

Feature Detection 4

Uploaded by

Golu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL.

61, 2023 5606115

R2FD2: Fast and Robust Matching of Multimodal


Remote Sensing Images via Repeatable Feature
Detector and Rotation-Invariant Feature Descriptor
Bai Zhu , Chao Yang, Jinkun Dai, Jianwei Fan , Yao Qin , Student Member, IEEE,
and Yuanxin Ye , Member, IEEE

Abstract— Identifying feature correspondences between multi-


modal images is facing enormous challenges because of the signif-
icant differences both in radiation and geometry. To address these
problems, we propose a novel feature matching method (named
R2 FD2 ) that is robust to radiation and rotation differences,
which consists of a repeatable feature detector and a rotation-
invariant feature descriptor. In the first stage, a repeatable feature
detector called the multichannel autocorrelation of the log-Gabor
(MALG) is presented for feature detection, which combines
the multichannel autocorrelation strategy with the log-Gabor
wavelets to detect interest points (IPs) with high repeatability and
uniform distribution. In the second stage, a rotation-invariant
feature descriptor is constructed, named the rotation-invariant
maximum index map of the log-Gabor (RMLG), which includes
the fast assignment of dominant orientation and construction
of feature representation. In the process of fast assignment of
dominant orientation, a rotation-invariant maximum index map
(RMIM) is built to address rotation deformations. Then, the
proposed RMLG incorporates the rotation-invariant RMIM with
the spatial configuration of DAISY to improve the resistance of
RMLG to radiation and rotation variances. Finally, we conduct
experiments to validate the matching performance of our R2 FD2
utilizing different types of multimodal image datasets. Experi-
mental results show that the proposed R2 FD2 outperforms five
state-of-the-art feature matching methods. Moreover, our R2 FD2
achieves the accuracy of matching within two pixels and has a
great advantage in matching efficiency over contrastive methods. Fig. 1. Challenges of MRSIM.
Index Terms— Multichannel autocorrelation of the log-Gabor
(MALG), multimodal image matching, R2 FD2 , rotation-invariant
maximum index map (RMIM), rotation-invariant maximum and terrestrial platforms, a large number of multimodal remote
index map of the log-Gabor (RMLG).
sensing images (MRSIs) can be obtained at different times by
I. I NTRODUCTION different sensors [1], such as infrared, multispectral, hyper-
spectral, and synthetic aperture radar (SAR) images [2], [3],
W ITH the launch of numerous multisensors integrated
stereo observation facilities from spaceborne, airborne, [4], [5], [6]. Also, the integration of MRSIs can provide
complementary information for diverse applications in the
Manuscript received 12 December 2022; revised 13 February 2023; accepted field of bundle block adjustment [7], image fusion [8], and
28 March 2023. Date of publication 5 April 2023; date of current ver- change detection [9]. There is a prerequisite for the integration
sion 17 April 2023. This work was supported by the National Natural
Science Foundation of China under Grant 42271446 and Grant 41971281. of MRSIs, that is, the process of robust MRSI matching
(Corresponding author: Yuanxin Ye.) (MRSIM) is indispensable.
Bai Zhu, Chao Yang, Jinkun Dai, and Yuanxin Ye are with the Faculty Generally, MRSIM aims to automatically identify accurate
of Geosciences and Environmental Engineering, Southwest Jiaotong Uni-
versity, Chengdu 610031, China, and also with the State-Province Joint correspondences or control points (CPs) between two or more
Engineering Laboratory of Spatial Information Technology for High-Speed multimodal images [10], [11]. However, there are extensive
Railway Safety, Southwest Jiaotong University, Chengdu 611756, China differences in scale, rotation, and radiation among MRSIs,
(e-mail: [email protected]; [email protected]).
Jianwei Fan is with the School of Computer and Information Tech- and the image quality of different sensors is easily disturbed
nology, Xinyang Normal University, Xinyang 464000, China (e-mail: by noise, clouds, and blur, which are inevitable problems
[email protected]). and entail enormous challenges for the reliable matching
Yao Qin is with the Northwest Institute of Nuclear Technology, Xi’an
710025, China (e-mail: [email protected]). of MRSIs. Fig. 1 exemplarily shows these challenges of
Digital Object Identifier 10.1109/TGRS.2023.3264610 multimodal image matching mentioned above.
1558-0644 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

In view of the above challenges, numerous methods have pipeline has been developed for NRD between multimodal
been proposed for multimodal matching in the past two images. These methods of this pipeline evaluate the simi-
decades. These multimodal matching methods can be generally larity of generated features rather than intensity information
grouped into three categories: area-based methods, feature- by using the above similarity metrics (i.e., SSD, NCC, and
based methods, and learning-based methods [12], [13]. With phase correlation). Histogram of orientated phase congruency
the development of deep learning technology, learning-based (HOPC) [21], phase congruency structural descriptor (PCSD)
matching methods exhibit excellent matching performance [22], channel features of orientated gradients (CFOG) [23],
and have developed into a pipeline that cannot be ignored and optical-SAR-phase correlation (PC) [24] are the most
in the field of MRSIM [14]. Wang et al. [15] presented an representative ones. However, area-based matching methods
effective deep neural network to optimize the whole processing are very sensitive to geometric distortions (i.e., scale and
(learning mapping function) through information feedback, rotation deformations) between images and usually require
and transfer learning was used to improve their framework’s georeferencing implementation to eliminate the significant
performance and efficiency. To contrast different stages of global geometric distortions [25].
feature matching, Hughes et al. [16] proposed a fully auto- In contrast, feature-based matching methods rely on the
mated SAR–optical matching framework that was composed salient and distinctive features (i.e., points, lines, and regions)
of a goodness network, correspondence network, and outlier between images and are more robust to geometric distortions
reduction network, and each of these subnetworks has been and NRD compared with area-based methods [26]. Among
proven to individually improve the matching performance. Fur- these methods, point features are the most common local
thermore, Zhou et al. [17] employed deep learning techniques invariant features in the remote sensing domain. This matching
to refine structure features and designed the multiscale con- pipeline usually consists of two key components: feature
volutional gradient features (MCGFs) by utilizing a shallow detection and feature description. In the past several decades,
pseudo-Siamese network. Similarly, Quan et al. [18] exploited feature matching methods of monomodal images have been
more similar features using a self-distillation feature learning well-studied, and many classical feature detectors and feature
network (SDNet) for optimization enhancement of deep net- descriptors have been developed. These traditional feature
work, which achieved robust matching of optical–SAR images. detectors detect salient features between images based on
Ye et al. [19] designed a multiscale framework without costly the gradient information of images, such as Moravec [27],
ground truth labels and a novel loss function paradigm based Harris [28], differences of Gaussian (DoG) [29], and features
on structural similarity, which can directly learn the end-to-end from accelerated segment test (FAST) [30]. Nevertheless,
mapping from multimodal image pairs to their transformation these gradient-based detectors are difficult to detect inter-
parameters. Also, their matching framework has the steady est points (IPs) with high repeatability among multimodal
performance to be robust to nonlinear radiometric differences images. According to the inherent properties of optical and
(NRD) between multimodal image pairs. SAR images, Xiang et al. [31] constructed two Harris scale
Although learning-based matching methods can signifi- spaces to extract IPs by designing consistent gradients for
cantly improve their resistance to geometric and radiation optical and SAR images utilizing the multiscale Sobel and
distortion by extracting finer common features than traditional multiscale ratio of exponentially weighted averages operators,
handcrafted features, the main limitations of this pipeline are respectively. Furthermore, some studies have found that the use
also found to be significant. On the one hand, supervised of the phase consistency (PC) model can effectively resistance
learning-based methods often rely on a large amount of to significant NRD and extract more stable and repeatable
training data [15], [16], [17], [18], and the transferability of the IPs than using only the gradient information. Ye et al. [32]
trained model is poor resulting in their matching performance combined the minimum moment of PC with the Laplacian of
generally dropping sharply on different test datasets. On the Gaussian (MMPC-Lap) to detect stable IPs in image scale
other hand, despite unsupervised learning methods that can space. Subsequently, Li et al. [33] detected corner feature
overcome the dependence on training data, the process of points and edge feature points on the minimum moment map
converting various parameters is very complex [19], and its and maximum moment map of the PC, respectively. Although
efficiency depends on the basic configuration of the computer these PC-based feature detectors have a certain resistance to
infrastructure. These deficiencies mentioned above limit the NRD between multimodal images, they have the cost of high
widespread application of learning-based pipeline in multi- computational complexity.
modal matching fields. Once the feature detection of MRSIs is completed, then
Generally, traditional area-based methods identify corre- corresponding local invariant feature descriptors must be
spondences by selecting some classical similarity metrics to explored. Similarly, the construction of many well-known
evaluate the similarity of intensity information within a tem- feature descriptors also utilizes the gradient information of
plate window. There are three commonly used similarity met- images, and hence, they cannot achieve the robust matching
rics in the spatial domain: sum of squared differences (SSD), of MRSIs with both geometric distortion and radiation differ-
normalized cross correlation (NCC), and mutual information ences. Scale-invariant feature transform (SIFT) [29], gradient
(MI). In addition, phase correlation is the most commonly used location and orientation histogram (GLOH) [34], DAISY [35],
similarity metric in the frequency domain because of its illu- and their improved variants [31], [36], [37] are the most repre-
mination invariance [20]. Recently, a structure feature-based sentative feature descriptors. As shown in Fig. 1, in particular,

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

such significant intensity differences and severe speckle noise that are also time-consuming and easily prone to generate
in multimodal images will further decline the matching per- outliers [32], [43], [45], which vastly affects the final matching
formance of these gradient-based descriptors, accompanied by performance.
the difficulty of identifying accurate correspondences. Although numerous efforts have been made to enhance
A recently popular pipeline for radiation-robust description the robustness of MRSIM, the current feature detectors in
is structural features because it is more resistant to modality the aspect of IPs repeatability are still not efficacious, and
variations than the gradient-based description that is already feature descriptors remain challenging in rotation invariance.
described in the above literature. With a number of descriptors To address the aforementioned limitations of pivotal compo-
derived from structural features having been developed for nents in feature matching, we present a robust and efficient
multimodal image matching, the most commonly used feature feature-based method (called R2 FD2 ) for multimodal matching
descriptors can be divided into two categories. The former in this work. First, to improve the repeatability of feature
is based on local self-similar (LSS) descriptors utilizing a detection, we construct a repeatable feature detector called the
log-polar spatial structure as feature descriptors, which can multichannel autocorrelation of the log-Gabor (MALG). The
effectively capture the internal geometric composition of self- MALG detector combines the multichannel autocorrelation
similarities within local image patches and is less sensitive strategy with the log-Gabor wavelets, which can be capable
to significant NRD to a certain extent [38], [39], [40], [41]. of extracting evenly distributed IPs with high repeatability.
Ye and Shan [40] introduced the LSS descriptor as a new Subsequently, we build a rotation-invariant feature descriptor
similarity metric to detect correspondences for the matching named the rotation-invariant maximum index map of the log-
of multispectral remote sensing images. Based on LSS, a shape Gabor (RMLG). The MALG descriptor consists of a fast
descriptor named dense local self-similarity (DLSS) was fur- assignment strategy of dominant orientation and an advanced
ther designed for optical and SAR image matching [41]. descriptor configuration. In the process of fast assignment of
Sedaghat and Mohammadi [38] improved the distinctiveness dominant orientation, we propose a novel rotation-invariant
of the histogram of oriented self-similarity (HOSS) descriptor MIM (RMIM) to achieve reliable rotation invariance. Then, the
by adding directional attributes to image patches where the RMLG descriptor incorporates the rotation-invariant RMIM
self-similarity values are computed. For the problem of LSS’ with the spatial configuration of DAISY to depict discrimina-
computation complex, Xiong et al. [39] proposed a feature tive features of multimodal images, which aims to construct
descriptor named oriented self-similarity (OSS) by using offset feature representation that is as robust as possible against the
mean filtering to calculate the self-similarity features fast differences in radiation and rotation.
based on the symmetry of the self-similarity. However, there The following is a summary of the main contributions.
still exist limitations with these descriptors because the rel- 1) A repeatable feature detector called MALG is defined
atively low discriminative capability of LSS descriptors may to detect evenly distributed IPs with high repeatability.
lead to the inability to maintain robust matching performance 2) A rotation-invariant feature descriptor named RMLG is
in some multimodal matching cases [22]. constructed based on the RMIM with rotation invariance
Another structural feature of radiation-robust description is and the spatial configuration of DAISY.
by utilizing the PC model, which is based on the position 3) The presented R2 FD2 matching method, consisting of
perception feature of the maximum Fourier component [42]. the MALG detector and RMLG descriptor, is quantifi-
Given that the PC model is more robust to illumination cationally and qualitatively evaluated with existing state-
and contrast changes compared with gradient information, of-the-art methods using various types of MRSIs.
therefore, many PC-based descriptors have been developed
The remainder of this article is organized as follows. The
[32], [33], [43], [44]. Ye et al. [32] presented a local HOPC proposed multimodal feature matching method is introduced in
(LHOPC) descriptor by combining the extended PC model and Section II, with an emphasis on the construction of the MALG
the arrangement of DAISY. Li et al. [33] developed a radiation- detector and RMLG descriptor. Section III examines and
variation-insensitive feature transform (RIFT) method, and a evaluates the matching performance of the proposed R2 FD2
maximum index map (MIM) was introduced based on the by conducting experiments on various multimodal image pairs.
PC model for feature description. Xiang et al. [44] improved Finally, the conclusion is summarized in Section IV.
different PC models to construct features for the matching
of optical and SAR images. In a similar work, Fan et al.
II. M ETHODOLOGY
[43] designed a multiscale PC descriptor, named multiscale
adaptive binning phase congruency (MABPC), which uses an In this section, a fast and robust method (named R2 FD2 ),
adaptive binning spatial structure to encode multiscale phase involving the MALG detector and the RMLG descrip-
congruency features, while improving its robustness to address tor, is proposed to improve the matching performance of
geometric and radiometric discrepancies. Nevertheless, in the multimodal images. Specifically, the MALG detector is first
process of feature description, the above methods either lack presented to detect IPs with high repeatability between multi-
rotation invariance [44] or rely on time-consuming loop traver- modal image pairs. Then, the RMLG descriptor is employed
sal based on the log-Gabor convolution sequence to achieve to robustly depict the local invariant characteristics of detected
rotation invariance [33] or estimate the dominant orientation IPs. The flowchart of the proposed R2 FD2 is shown in Fig. 2,
by combining orientation histogram with local PC features which is then elaborated in detail.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

Fig. 2. Flowchart of the proposed R2 FD2 .

A. Construction of MALG Detector of the filter, respectively; δθ is the angular bandwidth; and θo
represents the filter’s orientation.
As mentioned above, the PC model is more resistant to Since the 2-D log-Gabor is a frequency domain filter,
significant NRD between multimodal images compared with its expression in the space domain can be obtained by
gradient information, and there have been relevant studies [32], inverse Fourier transform based on the corresponding fre-
[33] using the PC model to extract stable IPs. Nevertheless,
quency response of log-Gabor filters in polar coordinates [49].
these detectors using the minimum moment or maximum
Therefore, the 2-D log-Gabor function in the space domain can
moment of PC may cause loss of multidirectional features and
be typically decomposed into an even-symmetric filter and an
are computationally expensive because they are the weighted
odd-symmetric filter, which is defined as follows:
responses of PC in different orientations and represent the
moment changes with the orientation [46]. Also, the responses LG(x, y, s, o) = LGeven (x, y, s, o) + i · LGodd (x, y, s, o) (2)
of PC in different orientations are calculated by making use of
where the real component LGeven (x, y, s, o) and the imaginary
log-Gabor wavelets because of their good antinoise and edge
component LGeven (x, y, s, o) represent the even- and the odd-
extraction performance [47]. In order to improve the reliability
symmetric filters, respectively, of the log-Gabor wavelets at
of feature detector while ensuring the high repeatability of IPs,
scale s with orientation o.
in this article, the MALG detector is proposed by incorporating
Accordingly, the space response components E(x, y, s, o)
the multichannel autocorrelation strategy with the log-Gabor
and O(x, y, s, o) of log-Gabor filters can be yielded by
wavelets for IPs detection.
convolving the image I (x, y) with the two even- and odd-
Given that good noise suppression and edge preservation
symmetric filters
are two crucial characteristics of an excellent feature detec- (
tor [48], the 2-D log-Gabor wavelets are employed during E(x, y, s, o) = I (x, y) ∗ LGeven (x, y, s, o)
the construction of the proposed MALG detector. They can (3)
O(x, y, s, o) = I (x, y) ∗ LGodd (x, y, s, o).
provide a useful description of edge feature information for
multiple orientations and at multiple scales from multimodal Then, the amplitudes of log-Gabor for all Ns scales are
image pairs, which is suitable for describing the local structure summed at orientation o to obtain the multichannel log-Gabor
of multimodal images. Generally, a 2-D log-Gabor filter is features; formally, the multichannel log-Gabor features are
expressed as follows: defined as follows:
( q
2 (x, y) + O 2 (x, y)
A(x, y, s, o) = E s,o s,o
(log( f /Fs ))2 (θ − θo )2 (4)
   
No
LGs,o ( f, θ ) = exp − A = Ai (x, y, o) 1 = 1 A(x, y, s, o)
o
  P Ns
exp − (1)
2(log β)2 2δθ2
where A(x, y, s, o) is the amplitude component of I (x, y) at
where o and s represent the orientation and scale of the log- scale s and orientation o, Ns represents the number of scales
Gabor filter, respectively; β determines the bandwidth of the for the log-Gabor filter banks, No represents the number of
filter; f and Fs define the frequency and central frequency orientations for the log-Gabor filter banks, and 6 is summed

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

Fig. 3. Schematic of extracted IPs by the proposed MALG detector. (a) Optical–infrared image pairs (repeatability = 55.87%). (b) Optical–depth image
pairs (repeatability = 44.10%). (c) Optical–SAR image pairs (repeatability = 36.79%).

over the log-Gabor filter banks on different scales for the The autocorrelation response value R of the multichannel
orientation o. Also, Ai (x, y, o) equals the amplitude responses log-Gabor features for each pixel is calculated by utilizing the
of the log-Gabor at the location (x, y) for orientation o, and comprehensive autocorrelation matrix Mcom
i = 1, 2, . . . , No . In this article, Ns = 4 and No = 6 are fixed
values. R = det[Mcom (x, y)] − α[trace Mcom (x, y)]2 (10)
For the multichannel log-Gabor features Ai (x, y, o), their
self-similarity for log-Gabor features of each orientation after where det[Mcom (x, y)] is the determinant of matrix M and
a shift (1x, 1y) at the location (x, y) can be yielded by the trace Mcom (x, y) represents the direct trace of matrix Mcom .
following autocorrelation function: Also, α is a constant, ranging from 0.04 to 0.06. Finally,
N the local maximum extrema are first the extracted IP, while
C Ao = C Ai (x, y, 1x, 1y, o) 1 o

X 2 nonmaximum suppression is carried out to decrease some
w(u, v) Ao (x, y, o)− Ao (u +1x, v+1y, o)

= adjacent IPs, that is, the first N local extrema with the largest
(u,v)∈W (x,y) response values will be selected as the final IPs by our MALG
(5) detector.
Moreover, Fig. 3 presents three illustrations of the IPs
where W (x, y) is a window centered at the location (x, y). extracted by the proposed MALG detector; specifically,
Also, w(u, v) is a weighting function, which is either a Fig. 3(a)–(c) shows the IPs extracted from optical–infrared,
constant or a Gaussian weighting function. According to the optical–depth, and optical-SAR image pairs, respectively.
Taylor expansion, the first-order approximation is performed As seen, our MALG detector is capable of extracting IPs
after shifting (1x, 1y) for the log-Gabor feature of each with high repeatability and uniform distribution between multi-
channel modal image pairs. The definition of repeatability is introduced
Ao (u + 1x, v + 1y, o) = Ao (u, v, o) + Aox (u, v, o)1x in Section III-B, and more performance evaluation of MALG
is given in Section III-B.
+ Aoy (u, v, o)1y + O(1x 2 +1y 2 )
Ao (u + 1x, v + 1y, o) ≈ Ao (u, v, o) + Aox (u, v, o)1x
+ Aoy (u, v, o)1y (6) B. Establishment of RMLG Descriptor
where Aox
and Aoy
are the partial derivative of log-Gabor Once repeatable IPs have been extracted, the next crit-
feature in the corresponding orientation o. Therefore, the ical step is to design a robust feature descriptor with the
above autocorrelation function for each orientation can be intent of increasing the distinction of features. The feature
simplified as descriptor usually consists of two components: assignment
N of dominant orientation and construction of feature repre-
C Ao = C Ai (x, y, 1x, 1y, o) 1 o

sentation. However, as mentioned earlier, the gradient-based
= |1x, 1y|M(x, y, o) 1x, 1y
 
(7) descriptors are very sensitive to NRD, and existing structural
feature-based descriptors either rely on time-consuming loop
with M(x, y, o) denoting the autocorrelation matrix of orien-
traversal based on the log-Gabor convolution sequence to
tation o defined as
achieve rotation invariance [33] or complicatedly assign the
M(x, y, o) dominant orientation by combining orientation histogram with
w A x (x, y, o) x (x, y, o)A y (x, y, o)
2 local PC features [32], [43], [45]. Therefore, it is difficult for
 P o P o o

w A
= . these descriptors to achieve fast and robust multimodal image
w A x (x, y, o)A y (x, y, o) w A y (x, y, o)
o o o 2
P P

(8) matching. To overcome these problems, we first propose a fast


strategy for assigning dominant orientation and further employ
Then, autocorrelation features in all orientations are combined the spatial configuration of DAISY for feature representation,
to obtain a comprehensive autocorrelation matrix (denoted as and the above two components are combined to generate the
Mcom ) with multidirectional features, as in (9), shown at the final RMLG descriptor. More details regarding the proposed
bottom of the next page. RMLG descriptor are provided as follows.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

1) Fast Assignment of Dominant Orientation: From the and is very discrete. Therefore, there are many redundant
previous equation (4), we can get the multichannel log- orientation estimations based on the orientation histogram
o
Gabor features
 No A , that is, the log-Gabor response sequence because the index of the rotated MIM patch needs to be
Ai (x, y, o) 1 . Nevertheless, the log-Gabor response

cyclically shifted by kref and ksen positions in the reconstruction
sequence not possesses the rotation invariance compared with of MIM.
the gradient map. This means that rotating the log-Gabor Inspired by the orientation histogram of SIFT and combined
response sequence will not yield the corresponding log-Gabor with the above analysis, we design a fast assignment strategy
response sequence for the rotated image patch. To obtain the for dominant orientation, and a novel MIM with rotation
rotation invariance, the MIM and circular effect are proposed invariance is performed by a statistical measure based on the
by means of loop traversal [33]. The calculation of MIM is MIM. This strategy can avoid the process of weighting the
given as follows: histogram calculation by trilinear interpolation that calculates
n N o the weight of each pixel of the spatial and directional bins.
MIM(x, y) = arg maxo Ai (x, y, o) 1 o (11) Specifically, the fast assignment strategy of dominant orienta-
tion is calculated as follows.
where arg maxo represents the orientation index corresponding
The essence of the orientation histogram is to count the
to
 the maximum value in the log-Gabor response sequence
N gradient amplitude and orientation of the pixels in the neigh-
Ai (x, y, o) 1 o .
borhood, while MIM itself has the directional characteristics of
Furthermore, Yu et al. [45] assigned the dominant ori-
the log-Gabor convolution sequence. Hence, we directly count
entation by combining the orientation histogram with the
the value with the most occurrences in the MIM (denoted as
amplitudes and orientations of PC, and auxiliary orientations
C M I M ) and use it as the dominant orientation of IPs, which
were also estimated in the same way as the SIFT. Then, the
can be expressed as follows:
corresponding MIM patch was rotated by the dominant or
auxiliary orientations; subsequently, the index of the rotated
  
 CMIM = mode MIM(x, y)
MIM patch for the reference and the sensed image was 180◦ (14)
cyclically shifted by kref and ksen positions, respectively  DO = CMIM ∗
  No
rotation
kref = round (12) where mode represents the operation to calculate the sample
180◦ /No
   mode in MIM, that is, the value that appears most times in
rotation MIM. Also, DO represents the dominant orientation. Fig. 4
 ceil


180 /No 

ksen = (13) shows the feasibility of the above strategy for calculating
rotation dominant orientation with different rotated images. Given a
 floor


180◦ /No reference image without rotation, its corresponding sensed
where round indicates the rounding operation and ceil and image without rotation, and rotating 90◦ the sensed image,
floor represent the round-up and round-down operation, we select a pair of corresponding IPs between these images,
respectively. and then, their dominant orientations are computed. It is not
Specifically, one feature vector was constructed for each difficult to find that the dominant orientation of the proposed
orientation of IPs in the reference image, and two feature strategy is the same, and this example preliminarily indicates
vectors were constructed for each orientation of IPs in the that the proposed strategy of dominant orientation is feasible.
sensed image. The aforementioned loop traversal strategy to What follows is the reconstruction of MIM, CMIM is used
achieve rotation invariance was very time-consuming [33]. to calculate the new MIM based on the following equation:
While Yu et al. [45] estimation of the dominant orientation (
required calculating the complex amplitudes and orientations MIMnew (x, y) = MIM(x, y) − CMIM + 1
of PC, increasing the auxiliary directions of feature points, MIMnew (x, y) = MIMnew (x, y)+ No , MIMnew (x, y) < 1.
and designing two feature vectors for each orientation of IPs (15)
in the sensed image, which further lead to the time-consuming
nature of their descriptor. Actually, MIMnew (x, y) represents the new MIM that is
We note that the essence of the orientation histogram in recalculated by circularly shifting the CMIM th layer of the log-
the SIFT descriptor is to count the gradient amplitude and Gabor convolution sequence as the first layer of the log-Gabor
orientation of the pixels in the neighborhood, and the orienta- convolution sequence. Finally, the novel MIM with rotation
tion corresponding to the peak of the histogram represents invariance (named RMIM) can be obtained by rotating the
the dominant directions of IPs. The range of the gradient recalculated MIM by the dominant orientation
orientation is [0, 360] and is continuous, while the value range
of MIM based on the log-Gabor response sequence is [1, No ] RMIM = rotate[MIMnew (x, y), DO]. (16)

" #
(x, (x, (x,
P No P o 2
 P No P o o
A
w x y, o) A y, o)A y, o)
Mcom (x, y) = P No P1  1 P Nwo P x y
. (9)
w A x (x, y, o)A y (x, y, o) w A y (x, y, o)
o o o 2

1 1

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

extremely weak in the edge region, and its rotation invariance


mainly exists in the circular neighborhood near the center point
of the RMIM map. Therefore, it may be more effective to use
a DAISY-style spatial arrangement (i.e., circular grid) for the
construction of our RMLG descriptor based on a distribution
histogram technique. Due to the values of RMIM ranging from
1 to No (No = 6) and the final feature vector is obtained by
concatenating all the histograms, therefore, the dimension of
the feature vector is Bins × No .
To further verify the above conjecture, we use the afore-
mentioned spatial arrangements (i.e., SIFT-style, GLOH-style,
and DAISY-style) to construct the RMLG descriptor with
different spatial arrangements on the RMIM. Since the dimen-
sions of the feature vectors constructed by the different
spatial arrangements are also different, in order to make
a fair comparison with the spatial arrangement of DAISY,
we have improved the spatial arrangement of SIFT and GLOH
so that the dimensions of all generated feature vectors are
about 150.
Specifically, the initial and improved spatial arrangement of
SIFT is the 4 × 4 and 5 × 5 square grid, so a 96-D and 150-D
feature vector can be obtained utilizing the values of RMIM
ranging from 1 to No (No = 6). While the improved spatial
Fig. 4. Schematic of the proposed strategy for calculating dominant
orientation. The bar charts represent the MIM histograms of the local regions
arrangement of GLOH is a reference to the configuration
around the IPs. Dominant orientation of (a) reference image patch without of Yu et al. [45], then a 144-D feature vector is obtained.
rotation, (b) sensed image patch without rotation, and (c) rotated sensed image Also, the RMLG descriptor by combining the RMIM with the
patch with 90◦ rotation.
arrangement of DAISY is a 150-D feature vector. We conduct a
comparison experiment of the RMLG descriptor with different
To verify the rotation invariance of the proposed RMIM spatial arrangements on three different multimodal datasets
more intuitively, we perform a contrast experiment to compare (optical–infrared, optical–LiDAR, and optical–SAR) for a total
the rotation invariance of the MIM and RMIM, as shown of 60 image pairs (more details of these datasets are introduced
in Fig. 5. Fig. 5(f) is obtained by rotating Fig. 5(a) in Section III-A). Table I gives the average number of correct
anticlockwise 30◦ ; Fig. 5(b) and (g) are the MIM of matches (NCM) of the RMLG descriptor with different spatial
Fig. 5(a) and (f), respectively; Fig. 5(c) and (h) are MIMnew arrangements. It is obvious that the RMLG descriptor with
of Fig. 5(a) and (f), respectively; Fig. 5(d) and (i) are the the spatial arrangement of DAISY yields the best matching
RMIM of Fig. 5(a) and (f), respectively. Fig. 5(e) shows the performance.
difference in MIM between Fig. 5(b) and (g); Fig. 5(j) shows As a consequence, the proposed RMLG is finally con-
the difference in RMIM between Fig. 5(d) and (i). There are structed by applying the arrangement of DAISY, arriving at
significant differences in MIM between Fig. 5(b) and (g), a 150-D feature vector, because the spatial arrangement of
and most of the values of Fig. 5(e) are not close to zero. DAISY has been shown to outperform other spatial arrange-
Nevertheless, most of the values of Fig. 5(j) are close to ments (e.g., SIFT and GLOH). Also, a more detailed anal-
zero, and it is not difficult to find that the similarity of ysis of RMLGs matching performance will be presented in
RMIM between Fig. 5(d) and (i) is obvious. This substantially Section III.
indicates that the generated RMIM based on our proposed
fast assignment strategy of dominant orientation is rotationally
C. R2 FD2 Feature Matching
invariant. Also, more evaluation of RMLGs rotation invariance
is proved in Section III-C. In this article, the proposed R2 FD2 matching method is
2) RMLG Descriptor Representation: After building the composed of the constructed MALG detector and RMLG
RMIM with rotation invariance, the follow-up critical step is descriptor. First, IPs with high repeatability are detected from
to generate a unique feature description utilizing the RMIM. a reference image and a sensed image by the proposed
In the construction of feature descriptors, a reasonable spatial MALG detector, and then, their local invariant features
arrangement for feature description is crucial. Different spatial are calculated utilizing the proposed RMLG descriptor.
arrangements for feature description have been proposed, the Finally, the commonly used nearest neighbor distance ratio
most representative of which are the square grid in SIFT [29], (NNDR) [29] matching strategy is employed to identify ini-
the log-polar grid in GLOH [34], and the circular grid in tial correspondences between reference and sensed image
DAISY [35]. Fig. 6 shows these different spatial arrangements. pairs, and the fast sample consensus (FSC) [50] technique
Moreover, combining Fig. 5(d), (i), and (j), we can easily is performed to remove outliers for determining reliable
find that the rotation invariance of the proposed RMIM is correspondences.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

Fig. 5. Comparison of rotation invariance for the MIM and RMIM. (a) Input image. (b) MIM of (a). (c) MIMnew of (a). (d) RMIM of (a). (e) Errors map
of MIM. (f) Rotated 30◦ image. (g) MIM of (f). (h) MIMnew of (f). (i) RMIM of (f). (j) Errors map of RMIM.

Fig. 6. Three different spatial arrangements for feature description. (a) SIFT.
(b) GLOH. (c) DAISY.

III. E XPERIMENTAL E VALUATION AND A NALYSIS


In this section, to validate the matching performance of
the proposed R2 FD2 matching method, we selected differ-
ent types of MRSIs datasets (e.g., optical–infrared, optical–
LiDAR, and optical–SAR image pairs) for qualitative and
quantitative evaluation. We first introduced the experimental
settings, including the parameters predefined and the details
of MRSIs datasets. Next, we evaluated the repeatability per-
formance of the MALG detector and the rotation invariance of
the RMLG descriptor. Finally, the matching results of R2 FD2
were presented and analyzed by comparing them with five
state-of-the-art matching methods (including HOSS [38], RIFT Fig. 7. Samples of MRSIs datasets. Samples of (a) optical–infrared image
pairs, (b) optical–LiDAR image pairs, and (c) optical–SAR image pairs.
[33], rotation-invariant amplitudes of log-Gabor orientation
histograms (RI-ALGH) [45], multiscale histogram of local
main orientation (MS-HLMO) [51], and histogram of the and there are significant radiation differences between mul-
timodal image pairs. These challenges can comprehensively
orientation of weighted phase (HOWP) [52]) for verifying the
test the robustness and adaptability of the proposed matching
robustness and effectiveness of R2 FD2 .
method. Several sample image pairs of each type of dataset
are shown in Fig. 7. It is worth noting that since our R2 FD2 is
A. Experimental Settings not currently scale-invariant, the resolution of each image pair
We collected three types of MRSIs datasets for qualita- is resampled with the same ground sample distance (GSD).
tive and quantitative evaluations, including optical–infrared, The number of scales and orientations for the log-Gabor
optical–LiDAR, and optical–SAR datasets. Each type of wavelets draw on the selection of log-Gabor wavelets in the
dataset consisted of 20 image pairs for a total of 60 multimodal number of scales and orientations applied in related research
image pairs. The size of image pairs ranged from 450 × [33], [45], as well as a comprehensive consideration of the
450 pixels to 750 × 750 pixels. These experimental multi- computational complexity and matching performance of the
modal datasets included various high, medium, and low spatial proposed method. Therefore, Ns and No were set to 4 and 6,
resolution images, covering both urban and suburban areas, respectively. Also, the RMLG descriptor of our R2 FD2 was

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

TABLE I
AVERAGE M ATCHING P ERFORMANCE OF THE RMLG D ESCRIPTOR W ITH D IFFERENT S PATIAL A RRANGEMENTS

implemented by employing the arrangement of DAISY, and TABLE II


therefore, the local region of feature description consisted of C OMPARISON R ESULTS OF IP S ’ R EPEATABILITY
many centrosymmetric circles of different sizes. These circles
were located on a concentric structure of three layers with
different radii, and eight circles were uniformly distributed in
each layer, which finally generated our R2 FD2 with 150-D
features.
The codes of HOSS, MS-HLMO, and HOWP were provided
by the corresponding author’s website. Since the code exposed
by RIFT was not rotation-invariant, we reproduced it based on
the relevant content of RIFTs paper. Similarly, we reproduced
RI-ALGHs rotation invariance (excluding the scale invariance where ref(x, y) and sen(x, y) represent the extracted IPs on
component) based on the description of the article for compar- the reference and sensed images, respectively, and Ptruth is
ison because it not had public code, and our MALG detector the projective model and can be obtained by using manually
was used for IP extraction before RI-ALGH feature description selected correspondences.
in subsequent experiments. The parameters of all comparison Specifically, 5000 IPs were obtained by our MALG detector,
methods were set to the optimal settings recommended in and FAST detector on the maximum and minimum moment
the corresponding articles. All experiments were implemented of PC (denoted as FASTMPC ) [33]. Table II gives the aver-
by using MATLAB2020a on a personal computer with the age repeatability rate of extracted IPs by the two detectors
configuration of Intel1 Core2 CPU i7-10750H 2.6 GHz and on different types of multimodal images. Fig. 8 shows the
16-GB RAM. comparison results of the repeatability and distributions of
IPs between the FASTMPC and our MALG detector, and a
B. Repeatability Evaluation of MALG Detector representative example of each type of dataset is presented.
To verify the repeatability performance of our proposed It is not difficult to find that the average repeatability rate
MALG on feature detection, we conducted a comparison of IPs by our MALG detector can be increased by about five
experiment of IPs’ repeatability on aforesaid multimodal percentage points than the FASTMPC detector. What is more,
datasets (e.g., optical–infrared, optical–LiDAR, and optical– the MALG detector can extract more evenly distributed IPs in
SAR) for a total of 60 image pairs. In the process of IPs multimodal image pairs compared with the FASTMPC detector,
detection, the repeatability of IPs is a very important index. which was prone to the phenomenon of IPs gathering.
Generally speaking, the higher the repeatability rate of IPs As a consequence, our proposed MALG detector, combin-
between two images, the more IPs can potentially be matched ing the multichannel log-Gabor features and autocorrelation
as correspondences [34]. The calculation of IPs’ repeatability strategy, not only improves the repeatability of IPs but also
is represented by the following equation: makes the distribution of IPs more uniform, which can lay a
Ncor foundation for improving the robustness of subsequent feature
Repeatability = (17) matching.
0.5 ∗ (Nref + Nsen )
where Nref and Nsen represent the number of IPs detected
C. Rotation Invariance Evaluation of RMLG Descriptor
from the reference image and sensed image, respectively.
Ncor represents the number of correspondences whose location In Section II-C, we preliminarily demonstrate that the
error (denoted as E loc ) is smaller than a certain threshold proposed RMIM is invariant to image rotation. However, the
(three pixels) through the prior mapping relationship between calculation of dominant orientation and RMIM both depends
the reference and sensed images, which can be expressed as on the six orientations (i.e., 0, (π/6), (2π /6), (3π/6), (4π /6),
follows: and (5π /6)) of the log-Gabor filters, so if the rotation angle
of image pair is not in the vicinity of the six orientations, can
E loc = |ref(x, y) − Ptruth ∗ sen(x, y)| (18) RMIM still maintain its rotation invariance.
1 Registered trademark. In response to this issue, in this section, we further verified
2 Trademarked. the rotation invariance of RMLG. The specific verification

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

Fig. 8. Comparison of the repeatability and distributions of IPs between the FASTMPC and our MALG detector. Contrastive example of extracted IPs for
(a) optical–infrared image pairs, (b) optical–LiDAR image pairs, and (c) optical–SAR image pairs.

Fig. 10. Visualization of matching and registration results. Matching and


registration results of (a) 30◦ rotation, (b) 130◦ rotation, (c) 240◦ rotation,
Fig. 9. Rotation invariance test of R2 FD2 on different types of MRSIs and (d) 350◦ rotation.
datasets as the rotation angles from 0◦ to 360◦ . NCMs of (a) first tested
image pairs and (b) second tested image pairs.

process is given as follows. First, we randomly selected MS-HLMO, and HOWP. Each image pair of the above MRSIs
two sets of image pairs without rotation from the above datasets was rotated from 0◦ to 180◦ with an interval of
MRSIs datasets for experimentation. Then, one of the selected 10◦ , and a total of 1140 (19∗20∗3) rotated images can be
image pairs was rotated from 0◦ to 360◦ with an interval obtained as experimentation. The correct matches of each
of 10◦ , and a total of 72 rotated images can be obtained. image pair were manually determined by selecting 10–20
These rotated images and their corresponding optical images evenly distributed correspondences to estimate the projective
constitute 72 pairs of test cases, which are finally matched by model (denoted as Ptruth ). These matched correspondences
our R2 FD2 . Also, the two examples of rotation invariance tests with residuals less than three pixels were considered as the
for our R2 FD2 are shown in Fig. 9. correct matches by utilizing the estimated projective model
These NCMs were marked with red dots, it can be clearly Ptruth . For quantitative evaluation, we employed four criteria
seen that NCMs of all rotated angles were not less than 100, to evaluate the performance of each matching method in terms
and more than half of NCMs were greater than 300. What is of NCM, SR, root-mean-square errors (RMSEs), and running
more, the matching success rate (SR) of all rotation angles time (RT). Among them, NCM represented the number of
was up to 100%, which further verifies that our R2 FD2 can correspondences correctly matched. If the number of NCM
maintain rotation invariance in the range of [0◦ , 360◦ ]. Fig. 10 was less than ten, the corresponding image pairs were marked
shows the matching results of several groups of rotation as a matching failure. The RMSE can be calculated as follows:
angles (30◦ , 130◦ , 240◦ , and 350◦ ) and their corresponding s
registration results. As can be seen, the distribution of corre- PN 
R(x, y) − Ptruth ∗ S(x, y)
2
i=1
spondences is relatively uniform, and the checkboard maps of RMSE = (19)
N
registration results have been aligned correctly.
where R(x, y) and S(x, y) represent the correct matches of
D. Matching Performance Evaluation of R2 FD2 the reference and sensed images, respectively, Ptruth is the
In this section, we compared our R2 FD2 with five projective model, and N represents the number of NCM. The
state-of-the-art matching methods: HOSS, RIFT, RI-ALGH, smaller the RMSE, the higher the accuracy of the matching

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

Fig. 11. Comparisons of average NCM criteria for different matching methods. Average NCM of (a) optical–infrared datasets, (b) optical–LiDAR datasets,
and (c) optical–SAR datasets.

Fig. 12. Comparisons of SR criteria for different matching methods. SR of (a) optical–infrared datasets, (b) optical–LiDAR datasets, and (c) optical–SAR
datasets.

method. The definition of SR is given as follows: reduced in the optical–SAR dataset and RIFT achieved more
i I (Pi )
NCMs than HOSS. This could be caused by the relatively low
P
SR = ∗ 100% (20) discriminative capability of HOSS based on LSS description
( T led to the inability to maintain robust matching performance
1, NCM(Pi ) ≥ 10
I (Pi ) = (21) for optical–SAR dataset with significant NRD. In contrast,
0, else it is obvious that our R2 FD2 outperformed the other methods
where T represents the total number of image pairs of a multi- in the average NCM criteria and obtained the most matching
modal image sets, I (Pi ) represents a logical value, 1 represents numbers on all types of multimodal image pairs, followed by
a successful matching trial, and 0 represents a failed matching HOWP and RI-ALGH. This indicates that the features detected
trial. NCM(Pi ) represents the NCM of the ith image pair. by our MALG are more repeatable and the features described
SR was the ratio between the number of image pairs that are by our RMLG are more discriminative.
successfully matched to the total number of image pairs. The Fig. 12 shows the comparison results of SR criteria for
larger the value of NCM and SR, the stronger the robustness different matching methods, where the SR criteria of HOSS
of the corresponding matching method. were the worst, especially since there are cases where the
As shown in Fig. 11, the comparison results of average SR of HOSS was zero in optical–SAR datasets, followed by
NCM criteria for different matching methods on each multi- MS-HLMO and RIFT. RI-ALGH and HOWP had comparable
modal image dataset are demonstrated. Also, the average NCM performance regarding the SR criteria in each type of dataset.
refers to the average of all NCMs of a total of 19 sets of images On the whole, our R2 FD2 obtained the highest SR on all the
generated by each image pair with an interval of 10◦ from 0◦ datasets, and the matching SR of R2 FD2 reached 100% on
to 180◦ . As can be seen, MS-HLMO matched the least NCMs most datasets and close to 100% on a few image pairs.
for all types of multimodal image pairs, followed by HOSS. To further evaluate the accuracy of different matching
This may be related to the fact that MS-HLMO utilizes the methods, Fig. 13 shows the comparison results of average
Harris-based function to detect IPs, which usually results in RMSE criteria, where RMSE was set to five indicating a failed
the extracted IPs being less than others such as the MALG and match. Similar to the average NCM, the average RMSE refers
FASTMPC detector. The average NCM criteria of HOSS and to the average of all RMSEs of a total of 19 sets of images
RIFT were comparable in optical–infrared and optical–LiDAR generated by each image pair with an interval of 10◦ from 0◦
datasets, while the average NCMs of HOSS were extremely to 180◦ . It can be seen from Fig. 13 that our R2 FD2 yielded the

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

Fig. 13. Comparisons of average RMSE criteria for different matching methods. RMSE of (a) optical–infrared datasets, (b) optical–LiDAR datasets, and
(c) optical–SAR datasets.

Fig. 14. Correspondence visualization of R2 FD2 . Matching results of (a) optical–infrared datasets, (b) optical–LiDAR datasets, and (c) optical–SAR datasets.

TABLE III Table III gives the average RT of each compared method for
C OMPARISONS OF RT C RITERIA FOR E ACH M ATCHING M ETHOD the whole dataset, which was implemented on a laptop with a
CPU i7-10750H 2.6 GHz and 16-GB RAM. The average RT
of our R2 FD2 was the fastest, and the time consumption was
about 11 s. The efficiency of HOWP was second best, RIFT
ranked third, HOSS and MS-HLMO fourth, and RI-ALGH
last. Specifically, the RT of our R2 FD2 was about 9 times,
6.5 times, 5 times, and 1.4 times faster than RI-ALGH,
MS-HLMO (HOSS), RIFT, and HOWP, respectively. It is
obvious that our R2 FD2 has a great advantage in matching
best results on the criterion of average RMSE and achieved the efficiency, which is attributed to the fast assignment of domi-
matching accuracy of fewer than two pixels for all datasets. nant orientation and the construction of the RMLG descriptor
This was followed by HOWP and RI-ALGH, and their RMSE using the RMIM with rotation invariance.
was relatively worse than our R2 FD2 . Nevertheless, HOSS and Furthermore, we carried out qualitative evaluations for
MS-HLMO were all likely to exhibit the worst performance R2 FD2 by displaying correct correspondences and registration
on the criterion of average RMSE for different cases. These results for the visual inspection. Fig. 14 shows more matching
experimental results further illustrate that the validity of the results of our R2 FD2 , and at least four image pairs were
proposed approaches compared to the state-of-the-art methods, randomly selected from each multimodal dataset and applied
and the rotation-invariance performed by the RMLG descriptor different rotation deformations in the range of [0◦ , 360◦ ].
is more reliable than others. Fig. 15 shows the corresponding registration results of Fig. 14

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

Fig. 15. Checkboard visualization of R2 FD2 . Registration results of (a) optical–infrared datasets, (b) optical–LiDAR datasets, and (c) optical–SAR datasets.

by using checkerboard maps. Each edge of all checkerboard repeatable MALG detector and the rotation-invariant RMLG
maps can be well aligned without obvious misalignment, descriptor. The MALG detector was first designed by inte-
which further verifies the satisfactory generality of R2 FD2 . grating the multichannel autocorrelation strategy with the log-
Overall, these evaluations and coherence analysis proved Gabor wavelets for IPs extraction. In this way, IPs extracted
that our R2 FD2 achieved high computational efficiency and the by MALG generally had a high repetition rate and were
effectiveness of our R2 FD2 in resisting significant radiation evenly distributed in multimodal images. Then, the fast
and rotation differences among multimodal images were far assignment strategy of dominant orientation was proposed to
superior to the state-of-the-art feature matching methods. The establish the novel RMIM with rotation invariance. Subse-
excellent matching performance of R2 FD2 was mainly due quently, the RMLG descriptor was conducted by incorporating
to the following two reasons. On the one hand, the feature the rotation-invariant RMIM with the spatial arrangement
detection of R2 FD2 adopted the novel MALG detector, and of DAISY for feature representation. Qualitative and quan-
MALG had the excellent property of high repeatability and titative experiments were performed by utilizing different
uniform distribution for IPs detection, which can be rather types of MRSIs datasets (optical–infrared, optical–LiDAR, and
advantageous for subsequent matching. On the other hand, optical–SAR image pairs) to evaluate the matching perfor-
the feature description of R2 FD2 utilized the discriminative mance of our R2 FD2 . The experimental results demonstrated
RMLG descriptor, and RMLG integrated the rotation-invariant that the proposed R2 FD2 outperformed five state-of-the-art
RMIM with the arrangement of DAISY to depict more dis- feature matching methods (i.e., HOSS, RIFT, RI-ALGH, MS-
criminative invariant features, which lays a foundation for fast HLMO, and HOWP) in all criteria (including NCM, SR,
and robust matching. RMSE, and RT). As a result, our R2 FD2 can be capable of
reliably achieving fast and robust feature matching for MRSIs.
IV. C ONCLUSION Although the proposed R2 FD2 exhibited superior adap-
In this article, a novel feature matching method (named tation to rotation and radiation differences for multimodal
R2 FD2 ) was presented for MRSIM, involving both the feature matching, it was sensitive to scale distortions between

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
5606115 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 61, 2023

multimodal images because it did not address the question of [21] Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration of
scale invariance. Accordingly, our future research will include multimodal remote sensing images based on structural similarity,” IEEE
Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2941–2958, Mar. 2017.
the exploration of these limitations more deeply. For example, [22] J. Fan, Y. Wu, M. Li, W. Liang, and Y. Cao, “SAR and optical
it is of great significance to establish a suitable scale space for image registration using nonlinear diffusion and phase congruency
achieving scale invariance, such as co-occurrence scale space structural descriptor,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 9,
pp. 5368–5379, Sep. 2018.
[53], nonlinear diffusion scale space [22], and Gaussian scale [23] Y. Ye, L. Bruzzone, J. Shan, F. Bovolo, and Q. Zhu, “Fast and robust
space [54]. matching for multimodal remote sensing image registration,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 9059–9070, Nov. 2019.
R EFERENCES [24] Y. Xiang, R. Tao, and H. You, “OS-PC: Combining feature repre-
sentation and 3-D phase correlation for subpixel optical and SAR
[1] Y. Zhang, Z. Zhang, and J. Gong, “Generalized photogrammetry
image registration,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9,
of spaceborne, airborne and terrestrial multi-source remote sensing
pp. 6451–6466, Mar. 2020.
datasets,” Acta Geodaetica et Cartographica Sinica, vol. 50, no. 1,
pp. 1–11, 2021. [25] Y. Ye, B. Zhu, T. Tang, C. Yang, Q. Xu, and G. Zhang, “A robust
multimodal remote sensing image registration method and system using
[2] J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and
steerable filters with first- and second-order gradients,” ISPRS J. Pho-
applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, Jan. 2019.
togramm. Remote Sens., vol. 188, pp. 331–350, Jun. 2022.
[3] L.-J. Deng, M. Feng, and X.-C. Tai, “The fusion of panchromatic and
[26] A. Sedaghat and N. Mohammadi, “Uniform competency-based local
multispectral remote sensing images via tensor-based sparse modeling
feature extraction for remote sensing images,” ISPRS J. Photogram.
and hyper-Laplacian prior,” Inf. Fusion, vol. 52, pp. 76–89, Dec. 2019.
Remote Sens., vol. 135, pp. 142–157, Jan. 2018.
[4] F. Luo, Z. Zou, J. Liu, and Z. Lin, “Dimensionality reduction and
[27] H. P. Moravec, Obstacle Avoidance and Navigation in the Real World
classification of hyperspectral image via multistructure unified discrim-
by a Seeing Robot Rover. Stanford, CA, USA: Stanford Univ., 1980.
inative embedding,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022,
Art. no. 5517916. [28] C. Harris and M. Stephens, “A combined corner and edge detector,” in
[5] S. Hao, W. Wang, Y. Ye, T. Nie, and L. Bruzzone, “Two-stream deep Proc. Alvey Vis. Conf., vol. 15, Manchester, U.K., 1988, pp. 5210–5244.
architecture for hyperspectral image classification,” IEEE Trans. Geosci. [29] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
Remote Sens., vol. 56, no. 4, pp. 2349–2361, Apr. 2018. Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[6] Y. Ye, W. Liu, L. Zhou, T. Peng, and Q. Xu, “An unsupervised SAR and [30] E. Rosten, R. Porter, and T. Drummond, “Faster and better: A machine
optical image fusion network based on structure-texture decomposition,” learning approach to corner detection,” IEEE Trans. Pattern Anal. Mach.
IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022. Intell., vol. 32, no. 1, pp. 105–119, Jan. 2010.
[7] H. Cao, P. Tao, H. Li, and J. Shi, “Bundle adjustment of satellite images [31] Y. Xiang, F. Wang, and H. You, “OS-SIFT: A robust SIFT-like algorithm
based on an equivalent geometric sensor model with digital elevation for high-resolution optical-to-SAR image registration in suburban areas,”
model,” ISPRS J. Photogramm. Remote Sens., vol. 156, pp. 169–183, IEEE Trans. Geosci. Remote Sens., vol. 56, no. 8, pp. 3078–3090,
Oct. 2019. Jun. 2018.
[8] L. Tang, Y. Deng, Y. Ma, J. Huang, and J. Ma, “SuperFusion: A ver- [32] Y. Ye, J. Shan, S. Hao, L. Bruzzone, and Y. Qin, “A local phase
satile image registration and fusion network with semantic awareness,” based invariant feature for remote sensing image matching,” ISPRS
IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp. 2121–2137, Dec. 2022. J. Photogramm. Remote Sens., vol. 142, pp. 205–221, Aug. 2018.
[9] Y. Ye et al., “Feature decomposition-optimization-reorganization net- [33] J. Li, Q. Hu, and M. Ai, “RIFT: Multi-modal image matching based
work for building change detection in remote sensing images,” Remote on radiation-variation insensitive feature transform,” IEEE Trans. Image
Sens., vol. 14, no. 3, p. 722, Feb. 2022. Process., vol. 29, pp. 3296–3310, 2020.
[10] J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving [34] K. Mikolajczyk and C. Schmid, “A performance evaluation of local
matching,” Int. J. Comput. Vis., vol. 127, no. 5, pp. 512–531, 2019. descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10,
[11] B. Zhu, J. Zhang, T. Tang, and Y. Ye, “SFOC: A novel multi-directional pp. 1615–1630, Oct. 2005.
and multi-scale structural descriptor for multimodal remote sensing [35] E. Tola, V. Lepetit, and P. Fua, “DAISY: An efficient dense descriptor
image matching,” Int. Arch. Photogramm., Remote Sens. Spatial Inf. applied to wide-baseline stereo,” IEEE Trans. Pattern Anal. Mach.
Sci., vols. 127, pp. 113–120, May 2022. Intell., vol. 32, no. 5, pp. 815–830, May 2010.
[12] J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from [36] A. Sedaghat and H. Ebadi, “Remote sensing image matching based on
handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, adaptive binning SIFT descriptor,” IEEE Trans. Geosci. Remote Sens.,
no. 1, pp. 23–79, Aug. 2020. vol. 53, no. 10, pp. 5283–5293, Oct. 2015.
[13] B. Zhu, L. Zhou, S. Pu, J. Fan, and Y. Ye, “Advances and chal- [37] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scale-
lenges in multimodal remote sensing image registration,” IEEE J. invariant feature matching for optical remote sensing images,” IEEE
Miniaturization Air Space Syst., early access, Feb. 14, 2023, doi: Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4516–4527, Nov. 2011.
10.1109/JMASS.2023.3244848. [38] A. Sedaghat and N. Mohammadi, “Illumination-robust remote sensing
[14] Y. Deng and J. Ma, “ReDFeat: Recoupling detection and description image matching based on oriented self-similarity,” ISPRS J. Pho-
for multimodal feature learning,” IEEE Trans. Image Process., vol. 32, togramm. Remote Sens., vol. 153, pp. 21–35, Jul. 2019.
pp. 591–602, 2022. [39] X. Xiong, G. Jin, Q. Xu, and H. Zhang, “Self-similarity features
[15] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep for multimodal remote sensing image matching,” IEEE J. Sel. Top-
learning framework for remote sensing image registration,” ISPRS J. ics Appl. Earth Observ. Remote Sens., vol. 14, pp. 12440–12454,
Photogramm. Remote Sens., vol. 145, pp. 148–164, Nov. 2018. 2021.
[16] L. H. Hughes, D. Marcos, S. Lobry, D. Tuia, and M. Schmitt, “A deep [40] Y. Ye and J. Shan, “A local descriptor based registration method
learning framework for matching of SAR and optical imagery,” ISPRS for multispectral remote sensing images with non-linear intensity dif-
J. Photogramm. Remote Sens., vol. 169, pp. 166–179, Nov. 2020. ferences,” ISPRS J. Photogramm. Remote Sens., vol. 90, pp. 83–95,
[17] L. Zhou, Y. Ye, T. Tang, K. Nan, and Y. Qin, “Robust matching for SAR Apr. 2014.
and optical images using multiscale convolutional gradient features,” [41] Y. Ye, L. Shen, M. Hao, J. Wang, and Z. Xu, “Robust optical-to-SAR
IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022. image matching based on shape properties,” IEEE Geosci. Remote Sens.
[18] D. Quan et al., “Self-distillation feature learning network for optical and Lett., vol. 14, no. 4, pp. 564–568, Apr. 2017.
SAR image registration,” IEEE Trans. Geosci. Remote Sens., vol. 60, [42] P. Kovesi, “Image features from phase congruency,” J. Comput. Vis. Res.,
2022, Art. no. 4706718. vol. 1, no. 3, pp. 1–26, 1999.
[19] Y. Ye, T. Tang, B. Zhu, C. Yang, B. Li, and S. Hao, “A multiscale frame- [43] J. Fan, Y. Ye, J. Li, G. Liu, and Y. Li, “A novel multiscale adaptive bin-
work with unsupervised learning for remote sensing image registration,” ning phase congruency feature for SAR and optical image registration,”
IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5622215. IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5235216.
[20] B. Zhu, Y. Ye, L. Zhou, Z. Li, and G. Yin, “Robust registration of aerial [44] Y. Xiang, R. Tao, F. Wang, H. You, and B. Han, “Automatic reg-
images and LiDAR data using spatial constraints and Gabor structural istration of optical and SAR images via improved phase congruency
features,” ISPRS J. Photogramm. Remote Sens., vol. 181, pp. 129–147, model,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13,
Nov. 2021. pp. 5847–5861, 2020.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: R2 FD2 : FAST AND ROBUST MATCHING OF MRSIS VIA REPEATABLE FEATURE DETECTOR 5606115

[45] Q. Yu, D. Ni, Y. Jiang, Y. Yan, J. An, and T. Sun, “Universal SAR Jinkun Dai received the B.S. degree from the
and optical image registration via a novel SIFT framework based on Faculty of Geosciences and Environmental Engi-
nonlinear diffusion and a polar spatial-frequency descriptor,” ISPRS J. neering, Southwest Jiaotong University, Chengdu,
Photogramm. Remote Sens., vol. 171, pp. 1–17, Jan. 2021. China, in 2022, where he is currently pursuing the
[46] P. Kovesi, “Phase congruency detects corners and edges,” in Proc. M.S. degree in surveying and mapping science and
Austral. Pattern Recognit. Soc. Conf. DICTA, 2003, pp. 1–10. technology.
[47] Y. Xiang, F. Wang, L. Wan, and H. You, “SAR-PC: Edge detection in His research interests include image matching,
SAR images via an advanced phase congruency model,” Remote Sens., image fusion, and classification.
vol. 9, no. 3, p. 209, Feb. 2017.
[48] J. Fan, Y. Wu, F. Wang, Q. Zhang, G. Liao, and M. Li, “SAR
image registration using phase congruency and nonlinear diffusion-based
SIFT,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 3, pp. 562–566,
Mar. 2015.
[49] J. Arróspide and L. Salgado, “Log-Gabor filters for image-based vehicle Jianwei Fan received the B.S. degree in electronic
verification,” IEEE Trans. Image Process., vol. 22, no. 6, pp. 2286–2295, information science and technology from the Henan
Jun. 2013. University of Science and Technology, Luoyang,
[50] Y. Wu, W. Ma, M. Gong, L. Su, and L. Jiao, “A novel point-matching China, in 2011, and the Ph.D. degree in pattern
algorithm based on fast sample consensus for image registration,” IEEE recognition and intelligent systems from Xidian Uni-
Geosci. Remote Sens. Lett., vol. 12, no. 1, pp. 43–47, Jan. 2015. versity, Xi’an, China, in 2017.
[51] C. Gao, W. Li, R. Tao, and Q. Du, “MS-HLMO: Multiscale histogram He is currently a Lecturer with the School of Com-
of local main orientation for remote sensing image registration,” IEEE puter and Information Technology, Xinyang Normal
Trans. Geosci. Remote Sens., vol. 60, 2022, Art. no. 5626714. University, Xinyang, China. His main research inter-
[52] Y. Zhang et al., “Histogram of the orientation of the weighted phase ests include remote sensing image processing, image
descriptor for multi-modal remote sensing image matching,” ISPRS J. registration, and feature extraction.
Photogramm. Remote Sens., vol. 196, pp. 1–15, Feb. 2023.
[53] Y. Yao, Y. Zhang, Y. Wan, X. Liu, X. Yan, and J. Li, “Multi-modal
remote sensing image matching considering co-occurrence filter,” IEEE
Trans. Image Process., vol. 31, pp. 2584–2597, 2022. Yao Qin (Student Member, IEEE) received the B.S
[54] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “SAR-SIFT: degree in information engineering from Shanghai
A SIFT-like algorithm for SAR images,” IEEE Trans. Geosci. Remote Jiaotong University, Shanghai, China, in 2013, and
Sens., vol. 53, no. 1, pp. 453–466, Jan. 2014. the M.S. and Ph.D. degrees in information and
communication engineering from the College of
Electronic Science, National University of Defense
Technology (NUDT), Changsha, China, in 2015 and
2019, respectively.
He was a Visiting Ph.D. with the Remote Sensing
Laboratory, Department of Information Engineering
Bai Zhu received the B.S. degree from the Fac-
and Computer Science, University of Trento, Trento,
ulty of Geosciences and Environmental Engineer-
Italy. He has been a Research Assistant with the Northwest Institute of Nuclear
ing, Southwest Jiaotong University, Chengdu, China,
Technology, Xi’an, China, since 2020. His research interests include infrared
in 2019, where he is currently pursuing the Ph.D.
small target detection, hyperspectral image classification and clustering, and
degree in surveying and mapping science and
domain adaptation.
technology.
His research is mainly focused on remote sens-
ing image processing, multimodal image matching,
image registration, and feature extraction.
Yuanxin Ye (Member, IEEE) received the B.S.
degree in remote sensing science and technol-
ogy from Southwest Jiaotong University, Chengdu,
China, in 2008, and the Ph.D. degree in photogram-
metry and remote sensing from Wuhan University,
Wuhan, China, in 2013.
He is currently a Research Fellow with the Fac-
Chao Yang received the B.S. degree from the
ulty of Geosciences and Environmental Engineer-
Faculty of Geosciences and Environmental Engi-
ing, Southwest Jiaotong University, Chengdu. His
neering, Southwest Jiaotong University, Chengdu,
research interests include remote sensing image pro-
China, in 2019, where he is currently pursuing the
cessing, image registration, change detection, and
Ph.D. degree in surveying and mapping science and
object detection.
technology.
Dr. Ye achieved “the International Society for Photogrammetry and Remote
His research interests include image matching,
Sensing (ISPRS) Prizes for Best Papers by Young Authors” by the 23rd
deep learning, and image processing.
International Society for Photogrammetry and Remote Sensing Congress in
Prague, in 2016, and “the Best Youth Oral Paper Award” by ISPRS Geospatial
Week 2017 in Wuhan, in 2017.

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on September 26,2024 at 07:57:35 UTC from IEEE Xplore. Restrictions apply.

You might also like