Graph Convolutional Autoencoder Model For The
Graph Convolutional Autoencoder Model For The
Science
To cite this article: Xiongfeng Yan , Tinghua Ai , Min Yang & Xiaohua Tong (2020): Graph
convolutional autoencoder model for the shape coding and cognition of buildings in maps,
International Journal of Geographical Information Science, DOI: 10.1080/13658816.2020.1768260
RESEARCH ARTICLE
1. Introduction
Spatial cognition aims to understand and explain how people acquire, organize, utilize,
and revise information in spatial awareness activities, which is an important research field
in cognitive science and psychology (Mark et al. 1999). It can help people establish spatial
concepts and address the high-order spatial properties of geospatial objects, phenomena,
or events in the map space, including shapes, patterns, movements, trends, evolution, and
interrelationships (Mennis et al. 2000, Ai et al. 2013). Research on spatial cognition can be
broadly divided into two aspects. The first aspect is the characteristics of cognitive
subjects, i.e. cognitive behaviors or laws, such as the Gestalt principles, which state that
people naturally tend to pursue the integrity or continuity of the overall structure and
then gradually investigate local details (Wertheimer 1938, Li et al. 2004). The second
aspect is the characteristics of cognitive objects, i.e. the features or properties of
a geospatial object itself, including geometric, topological, or semantical information.
analyze graph-structured data (Bronstein et al. 2017). These networks require defining
an effective convolution. There are two potential solutions: the spatial approach 空间域和谱域
(Niepert et al. 2016) and the spectral approach (Shuman et al. 2016). In the spatial
approach, the convolution is directly performed in the vertex domain, whereas in the
spectral approach, it is realized through graph Fourier transform and convolution
theorem. Further details on the two approaches can be found in the review by
Bronstein et al. (2017). Similar to 2D convolution, the graph spectral convolution has
advantages such as fast computation, few parameters, and spatially local connectivity
(Defferrard et al. 2016). Therefore, it can be combined with computational models, such
as neural networks or autoencoder networks, to build a learning architecture. This
method has achieved exciting results for the pattern classification of spatial groups
(Yan et al. 2019). This study will attempt to apply it to the shape analysis of individual
geospatial objects.
Buildings are pivotal components of map spaces and play an important role in
applications such as urban modeling (Steiniger et al. 2008, Henn et al. 2012), high-
definition mapping and navigation (Zhang et al. 2013). As buildings exhibit salient visual
characteristics, such as perpendicular corners and symmetrical axes, processing tasks for
building data are based on shape analysis and cognition, e.g. simplification (Yan et al.
2017), change detection (Zhou et al. 2018), data updating (Yang et al. 2018b), and quality
assessment (Xu et al. 2017a). This study takes a 2D vector building as an example and
extracts its inherent features using a graph deep learning approach, to support the
mechanism and formalization of spatial cognition. As boundary-based representation of
building shapes has advantages, such as low redundancy, high precision, and informa-
tion-intensive, the study first uses boundary points as vertices to construct a graph
structure of the building shape and extracts the vertex features based on the boundary
structure. Subsequently, an autoencoder learning model is constructed by combining the
graph convolution and deep autoencoder architecture; unlabeled data are used for the
unsupervised training to realize shape coding.
The remainder of this paper is organized as follows. Section 2 details the framework for
coding the building shape using graph deep learning. Section 3 presents the experiments
and results, including detailed analyses. Section 4 discusses the application of shape
coding to scenarios such as shape retrieval and shape matching. Finally, Section 5
concludes the paper.
Figure 1. Overall framework for coding the shape of a building using graph deep learning. The colors
assigned to the vertex (or edge) on the graph indicate the feature values (or weights).
● Graph construction. The boundary points of the building are used as vertices to
construct a graph structure, and the description features for each vertex are
extracted based on its neighborhood structures.
● Graph operation. A novel graph convolution is introduced based on the Fourier
transform and convolution theorem, providing a method to integrate the multiple
features of a vertex and its neighbors.
● Graph learning. The graph convolution and autoencoder are combined to build
a graph learning model, and unsupervised training is performed to make it capable
of encoding the building shape.
Through the above process, the building shape in a 2D plane is encoded as a 1D vector,
and the similarity between two different shapes is measured by the distance between
their vectors, including the Euclidean and cosine distances.
Figure 2. Graph of a building shape and the features extracted for its vertices. The colors assigned to
the vertex (or edge) on the graph indicate the feature values (or weights).
SsA
SsA ¼ (1)
S0
LsA
LsA ¼ pffiffiffiffi
ffi (2)
S0
HsA
HAs ¼ (3)
S0
PAs
DsA ¼ pffiffiffiffi
ffi (4)
S0
6 X. YAN ET AL.
RsA
RsA ¼ pffiffiffiffi
ffi (5)
S0
Different neighboring order s (i.e. corresponding to different vertices B and C) will form
different local and regional triangular structures. If the neighborhood sizes from order 1 to
Z are considered, a 7Z-dimensional feature pi can be extracted for each vertex i,
expressed as
pi ¼ Ssi ; Lsi ; αsi ; His ; Dsi ; Rsi ; βsi ; ðs 2 f1; 2; . . . ; Z gÞ (6)
The dimension of the features PN7Z for the entire shape graph is N 7Z, where N is the
number of vertices.
X
K
ΦðΛÞ ¼ e
θk Tk Λ (9)
k¼0
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 7
Figure 3. Architecture of a graph convolutional autoencoder for coding building shapes. The output
and input are graph structures containing eight vertices with 2D features; the encoder and decoder
consist of two convolutional layers (including 2 × 3 and 3 × 2 K-order polynomial kernels, respectively)
and two pooling (or up-sampling) layers. The colors assigned to the vertex (or edge) on the graph
indicate the feature values (or weights).
8 X. YAN ET AL.
The encoder and decoder contain multiple hidden convolutional layers with the
following layer-wise propagation rule based on graph convolution and nonlinear
activation:
! !
X
Fin XK ½l
θi;j Tk L~ H
½lþ1 ½l
H j ¼σ k i þb j (11)
i¼1 k¼0
½l ½lþ1
where σðnÞ denotes a nonlinear activation function; Hi and Hj denote the ith input
graph of the l layer activations and the j output graph of the ðl þ 1Þth layer activations,
th th
½l
respectively; θi;j k and bj are the trainable Fin Fout vector of the K-order polynomial
coefficients and 1 Fout vector of the bias in the lth layer, respectively.
According to Equation (11), the convolutional layer weights the multiple features of the
vertices to obtain an integrated representation. In addition, a pooling operation is
performed in the encoder part to further combine the features of the neighboring vertices
and reduce the graph size (i.e. the number of vertices), and an up-sampling operation is
performed in the decoder part to ensure that the graph size is consistent with the input.
Since the boundary points are linearly connected, the pooling and up-sampling opera-
tions proposed by Defferrard et al. (2016) were employed with the same efficiency as the
1D sequence data. To share the weights for batch training, the size of the input graph
structure should be consistent; this can be achieved by interpolation or padding.
The model is optimized by minimizing the mean squared error between the input
features PN7Z of the graph vertices and the reconstructed features P ^N7Z . The parameters
of the convolution kernels and the biases of the activation functions are updated using
the back propagation (BP) algorithm; the calculation of their gradients can be found in
previous reports (Defferrard et al. 2016, Yan et al. 2019).
The model focuses on the intermediate coding instead of the output part. Constraints
are usually added to the coding for specific purposes, such as the reduction of dimension-
ality. For example, in the model shown in Figure 3, the number of vertices is eventually
compressed to two by the two pooling layers, and each vertex contains only 2D features
after the spatial integration of the graph convolution. This means that the original 16-
dimensional feature in this example is reduced to a 4-dimensional feature, thus integrat-
ing and compressing the building information to yield a new shape coding.
2017); therefore, they were used as a cognitive classification reference in this study.
Experimental datasets were selected from OpenStreetMap, containing 10 types of typical
standard shapes including E-shape, T-shape, and U-shape. For diversity, the building
shapes were selected from areas with different geographical characteristics, e.g. urban,
suburban, and rural areas.
Labels are not required for training the proposed GCAE model; however, to evaluate
the performance of shape coding, the selected buildings were labeled manually by three
participants, i.e. to indicate which shape the buildings belong to. Each type consists of 500
shapes, each of which was used as a training sample for a total of 5,000 samples. Table 1
lists the 10 standard shapes and 10 example training shapes.
Because of the complexity of the shapes and the accuracy of data acquisition, the
number of boundary points in the different buildings may vary significantly.
Therefore, the study first simplified the buildings using the Douglas–Peucker algo-
rithm (Douglas and Peucker 1973) with a conservative and empirical threshold of
0.1 m; subsequently, an equally spaced interpolation was performed with a total of
64 boundary points. Additionally, the starting point of each building was adjusted to
the vertex in the bottom-left corner. The vertex features were normalized using the
Z-score method.
E- F- H- I- L- O- T- U- Y- Z-
Class
shape shape shape shape shape shape shape shape shape shape
Standard
shape
Training
shape
10 X. YAN ET AL.
Figure 4. Architecture and parameters of the GCAE model used in our experiments.
Y-shapes. This result demonstrates that the shape similarities between buildings calcu-
lated using the shape coding are generally accurate. However, because of the differences
in coding quality and data uncertainty, there are some overlaps in the coding of different
types of shapes: Y-shapes appear in the cluster of T-shapes in the area A of Figure 5.
Notably, several separate sub-clusters of the same shapes are also formed, probably
because of the rotation of the buildings. For example, I-shaped buildings form two sub-
clusters on the left side of Figure 5, where in the area B, the long axis of the buildings is
nearly vertical, whereas the long axis of the buildings in the area C is almost horizontal. To
further verify this conjecture, this study trained and visualized the shape coding using
adjusted building data, i.e. the orientation of all the buildings was manually adjusted to
coincide with the orientation of their cognitive letters, as shown in Figure 6. The separa-
tion of the clusters was thus remarkably improved. Most types of shapes form fairly
independent clusters; a few T-shaped and Z-shaped buildings are separated; and
L-shaped buildings, although forming two sub-clusters, are still relatively close to each
other and far from the other clusters. This shows that the GCAE model is sensitive to
variations in the orientation.
● Nearest neighbor (NN). NN refers to the number of shapes that belong to the same
type as their closest neighbor to the total number of training shapes.
● First tier (FT) and second tier (ST). For one shape, FT and ST refer to the percentages of
shapes that belong to the same type as it does in its top (t − 1) and 2(t − 1) neighbors,
respectively, where t is the number of shapes that belong to the same type as in the
training, i.e. 500 in this study.
● Discounted cumulative gain (DCG). For one shape, the similarities to other shapes are
determined and sorted in descending order. The more advanced the sorted shape,
the stronger the influence of relevance or irrelevance. The relevance is 1 when the
types are the same; otherwise, it is 0.
All the metrics range from 0 to 1, and a higher value indicates a better performance.
More details about these metrics have been reported by Shilane et al. (2004).
In this study, the FT, ST, and DCG values of the 10 standard shapes were counted and
averaged. As baselines for the GCAE model, two conventional methods were also implemen-
ted, namely the Fourier descriptor (FD) (Ai et al. 2013) and turning function (TF) (Arkin et al.
1991), where the expansion order of the Fourier descriptor was set to 6. Table 2 lists the results.
The proposed GCAE model outperforms the rest in all the four metrics for the original
and rotating building data. The GCAE model significantly outperforms the TF method, likely
because the TF method considers only one turning feature, whereas the GCAE model
integrates multiple features of local and regional structures. However, the advantage of
the TF method is that it is not sensitive to the orientation of the buildings. This is reasonable,
as it traverses all the boundary points to find the minimum value when calculating the
similarity distance. The FT method is very sensitive to the orientation of the shape, as
evident from its poor performance when dealing with unrotated buildings. Furthermore,
buildings have characteristics such as a few vertices, right-angles, and discontinuous bends,
as artificial features. This requires a higher-order Fourier expansion to approximate, which is
computationally expensive and may have an impact on performance.
Figure 7 shows the precision–recall curves of the different methods for the two sets of
building data by averaging the ten standard shapes. The shape coding obtained using the
proposed GCAE model outperforms the other two methods in terms of the shape
similarity measurement, consistent with the previous analysis.
As the pooling layer was used, the number of vertices in the coding representation
could vary with this depth. To ensure the consistency of the information compression
ratio between the different models, the coding dimension was fixed to 32 by adjusting the
number of convolution kernels in the last convolutional layer of the encoder. Table 3
reports the model performance for the original building data when the depth changes
from 1 to 5. The performance initially improves with increasing depth, but deteriorates
sharply when the depth reaches 5. There is a recovery when the number of convolution
kernels is reduced. This indicates that increasing the depth improves the feature extrac-
tion capability of the model to a certain extent; however, it increases the complexity and
may cause training difficulties.
Figure 8 shows the training loss curves at different neighborhood size settings when
extracting the vertex features. The losses under settings {1} and {1, 2, 4, 8} are the most
significant when converging, followed by {2}, {2, 4}, and {2, 4, 8}; {4} and {8} have the lowest
loss. These results reveal that the loss value depends mainly on the smallest neighbor-
hood size and the number of features. The smaller the neighborhood size and the fewer
the features, the less difficult it is to reconstruct the shape and the lower the loss.
Although a lower loss implies that the reconstructed shape features are closer to the
original ones, it does not represent the merits of the shape coding. Although more
information can be provided to the model by increasing the number of features, it brings
redundancy issues and places higher requirements on the compression capability. Table 4
gives the evaluation metrics for the different settings. A careful analysis shows that
although the settings {2} and {4} may be slightly advantageous, the difference in perfor-
mance is small. This conclusion is also appropriate for the dimensional parameters of the
shape coding: although the loss can be reduced by employing a higher coding dimension,
it weakens the information compression capability of the model, and even the shape can
be reconstructed by simply copying it.
Table 5. Top seven retrieved buildings for the ten standard shapes.
E-shape
0.763 0.991 1.027 1.043 1.046 1.124 1.151
F-shape
0.786 0.837 0.874 0.895 0.905 0.918 0.931
H-shape
0.358 0.661 0.677 0.686 0.736 0.824 1.107
I-shape
0.019 0.026 0.044 0.049 0.049 0.052 0.057
L-shape
0.288 0.326 0.359 0.426 0.436 0.440 0.449
O-shape
0.097 0.116 0.117 0.137 0.151 0.322 0.329
T-shape
0.537 0.710 0.714 0.809 0.828 0.872 0.873
U-shape
0.390 0.415 0.479 0.506 0.621 0.643 0.672
Y-shape
0.531 0.535 0.623 0.791 0.812 0.840 0.869
Z-shape
0.722 0.817 0.847 0.857 0.875 0.932 0.949
capability for local characteristics. Furthermore, from the F-shaped and Y-shaped
retrieved results, after retrieving all the similar shapes in the experimental area, the
GCAE model tends to favor the selection of shapes with more details; overall, the results
are in accordance with human visual cognition.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17
4.4. Discussions
The proposed shape coding has serval merits: (1) It is able to identify the local and global
characteristics in a cognitively compliant manner, as indicated by the shape retrieval test.
This can be probably attributed to the fact that the method considers multiple features
extracted based on the local and regional structures and provides a learning mechanism
for integrating these features. (2) Its shape similarity measurement has the meaning of
geometric difference, which is an important property for some downstream applications.
For example, in cartographic generalization, when a template shape is used to replace
a building for simplification, the geometric accuracy should be guaranteed in addition to
having similar shapes (Yan et al. 2017).
Experimental observations show that this method has some drawbacks: it is sensitive
to the orientation. Some possible improvements are adding a spatial transform network
(STN) (Jaderberg et al. 2015) module or introducing cognitive results as supervised knowl-
edge for training. In addition, this study concentrated on planar buildings without holes.
The shape cognition of holed buildings (Xu et al. 2017b) and 3D buildings using the graph
deep learning is conceived as a part of future work.
18 X. YAN ET AL.
Table 6. Shape similarity (SD) and matching distance (MD) between standard shapes and buildings.
Standard
shape
Building
E-shape
SD 0.671 1.257 2.133 2.157 1.668 2.193 1.610 2.370 2.216
MD 0.224 0.239 0.36 0.276 0.352 0.345 0.33 0.54 0.425
F-shape
SD 1.881 0.152 1.921 2.113 1.166 1.636 1.2 1.744 2.144
MD 0.353 0.14 0.365 0.306 0.238 0.258 0.434 0.468 0.407
H-shape
SD 2.367 2.207 0.648 2.262 2.182 1.881 2.338 2.035 1.366
MD 0.465 0.459 0.237 0.49 0.534 0.446 0.539 0.554 0.393
I-shape
SD 2.983 2.151 2.160 0.793 2.129 1.368 1.898 2.389 2.333
MD 0.385 0.334 0.237 0.194 0.398 0.297 0.253 0.589 0.401
L-shape
SD 1.984 1.178 1.955 2.074 0.193 1.279 1.046 1.851 2.02
MD 0.422 0.247 0.457 0.385 0.078 0.401 0.343 0.622 0.448
T-shape
SD 2.069 1.415 1.782 1.639 1.531 1.006 1.942 1.772 2.005
MD 0.378 0.257 0.339 0.277 0.428 0.115 0.47 0.423 0.428
U-shape
SD 1.866 1.316 2.133 2.011 0.961 1.568 0.695 1.877 2.247
MD 0.402 0.372 0.404 0.264 0.27 0.461 0.157 0.681 0.528
Y-shape
SD 2.53 1.654 1.849 2.31 1.521 1.128 1.742 0.220 1.777
MD 0.551 0.45 0.481 0.55 0.635 0.402 0.623 0.187 0.515
Z-shape
SD 2.593 1.979 2.017 1.793 1.823 1.923 1.949 1.930 1.945
MD 0.635 0.629 0.598 0.48 0.664 0.588 0.5 0.673 0.683
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 19
Figure 10. Correlation analysis between the similarity distance and matching distance.
This study provides insights into the use of cutting-edge artificial intelligence (AI)
techniques to solve conventional spatial cognitive problems in the map space. This
study is able to deal with vector-structured geospatial data instead of raster-based in
previous studies, since the gap between deep learning and graph-structured data has
been filled by the graph convolution. In this sense, various tasks in the map space have
the potential to pave a new way through graph deep learning. For example, the latest
literature shows that it has emerged in spatial pattern recognition (Yan et al. 2019) and
traffic modeling (Zhang et al. 2020).
5. Conclusions
Shape coding and representation is a classical and challenging issue in spatial cognition.
This study presented a graph convolutional autoencoder (GCAE) for the shape coding of
vector building data by combining graph convolution and autoencoder architecture. The
GCAE model can receive multiple features extracted from local and regional structures of
building shape and then learn shape coding through unsupervised training. Visual and
quantitative experiments confirmed that the proposed method has a high discriminative
ability for building shapes and outperforms existing methods in terms of similarity
measurements. Further, experiments conducted on real world data showed that the
coding effectively captures the local and global characteristics of building shapes, provid-
ing a basis for applications such as shape retrieval and matching. The pros and cons of the
proposed GCAE method were highlighted, including some issues that can be solved in the
future.
Acknowledgments
Special thanks go to the editor and anonymous reviewers for their insightful comments and
constructive suggestions that substantially improved the quality of the paper.
20 X. YAN ET AL.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Funding
This work was supported by the National Natural Science Foundation of China [41531180,
41871377]; National Key Research and Development Program of China [2017YFB0503500].
Notes on contributors
Xiongfeng Yan received the B.S. and Ph.D. degrees in cartography from Wuhan University in 2015
and 2019, respectively. He is currently a Postdoctoral with the College of Surveying and Geo-
Informatics, Tongji University, Shanghai, China. His research interests include cartography and
machine learning with special focus on the graph-structured spatial data.
Tinghua Ai is a Professor at the School of Resource and Environmental Sciences, Wuhan University,
Wuhan, China. He received the Ph.D. degree in cartography from the Wuhan Technical University of
Surveying and Mapping in 2000. His research interests include multi-scale representation of spatial
data, map generalization, spatial cognition, and spatial big data analysis.
Min Yang is an Associate Professor at the School of Resource and Environmental Sciences, Wuhan
University, Wuhan, China. He received the B.S. and Ph.D. degrees in cartography from Wuhan
University in 2007 and 2013, respectively. His research interests include change detection of spatial
data, map generalization, and spatial big data analysis.
Xiaohua Tong is a Professor at the College of Surveying and Geo-Informatics, Tongji University,
Shanghai, China. He received the Ph.D. degree in geoscience from Tongji University in 1999. His
research interests include photogrammetry and remote sensing, trust in spatial data, and image
processing for high-resolution satellite images.
ORCID
Xiongfeng Yan http://orcid.org/0000-0003-4748-464X
Tinghua Ai http://orcid.org/0000-0002-6581-9872
Min Yang http://orcid.org/0000-0003-1973-527X
References
Abadi, M., et al., 2015. TensorFlow: large-scale machine learning on heterogeneous systems.
Software. Available from: tensorflow.org
Adamek, T. and O’connor, N.E., 2004. A multiscale representation method for nonrigid shapes with
a single closed contour. IEEE Transactions on Circuits and Systems for Video Technology, 14 (5),
742–753. doi:10.1109/TCSVT.2004.826776.
Ai, T., et al., 2013. A shape analysis and template matching of building features by the Fourier
transform method. Computers, Environment and Urban Systems, 41 (5), 219–233. doi:10.1016/j.
compenvurbsys.2013.07.002.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21
Alajlan, N., et al., 2007. Shape retrieval using triangle-area representation and dynamic space
warping. Pattern Recognition, 40 (7), 1911–1920. doi:10.1016/j.patcog.2006.12.005.
Anselin, L., 1995. Local indicators of spatial association—LISA. Geographical Analysis, 27 (2), 93–115.
doi:10.1111/j.1538-4632.1995.tb00338.x.
Arkin, E.M., et al., 1991. An efficiently computable metric for comparing polygonal shapes. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13 (2), 209–216. doi:10.1109/
34.75509.
Basaraner, M. and Cetinkaya, S., 2017. Performance of shape indices and classification schemes for
characterising perceptual shape complexity of building footprints in GIS. International Journal of
Geographical Information Science, 31 (10), 1952–1977. doi:10.1080/13658816.2017.1346257.
Belongie, S., Malik, J., and Puzicha, J., 2002. Shape matching and object recognition using shape
contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (4), 509–522.
doi:10.1109/34.993558.
Bronstein, M.M., et al., 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal
Processing Magazine, 34 (4), 18–42. doi:10.1109/MSP.2017.2693418.
Defferrard, M., Bresson, X., and Vandergheynst, P., 2016. Convolutional neural networks on graphs
with fast localized spectral filtering. In: Proceedings of International Conference on Neural
Information Processing Systems (NIPS). Barcelona, Spain, 3844–3852.
Deng, D. and Yu, D., 2014. Deep learning: methods and applications. Foundations and Trends in
Signal Processing, 7 (3–4), 197–387. doi:10.1561/2000000039.
DeVries, P.M.R., et al., 2018. Deep learning of aftershock patterns following large earthquakes.
Nature, 560 (7720), 632–634. doi:10.1038/s41586-018-0438-y.
Douglas, D.H. and Peucker, T.K., 1973. Algorithms for the reduction of the number of points required to
represent a digitized line or its caricature. Cartographica: The International Journal for Geographic
Information and Geovisualization, 10 (2), 112–122. doi:10.3138/FM57-6770-U75U-7727.
Egenhofer, M.J., 1997. Query processing in spatial-query-by-sketch. Journal of Visual Languages and
Computing, 8 (4), 403–424. doi:10.1006/jvlc.1997.0054.
Feng, Y., Thiemann, F., and Sester, M., 2019. Learning cartographic building generalization with
deep convolutional neural networks. International Journal of Geo-Information, 8 (6), 258.
doi:10.3390/ijgi8060258.
Hammond, D.K., Vandergheynst, P., and Gribonval, R., 2011. Wavelets on graphs via spectral graph
theory. Applied and Computational Harmonic Analysis, 30 (2), 129–150. doi:10.1016/j.
acha.2010.04.005.
Henn, A., et al., 2012. Automatic classification of building types in 3D city models. Geoinformatica, 16
(2), 281–306. doi:10.1007/s10707-011-0131-x.
Jaderberg, M., et al., 2015. Spatial transformer networks. In: Proceedings of International Conference
on Neural Information Processing Systems (NIPS). Montreal, Canada, 2017–2025.
Kingma, D.P. and Ba, J., 2015. Adam: a method for stochastic optimization. In: Proceedings of
international Conference on Learning Representations (ICLR). San Diego, California, USA.
Kipf, T.N. and Welling, M., 2017. Semi-supervised classification with graph convolutional networks.
In: Proceedings of International Conference on Learning Representations (ICLR). Toulon, France.
LeCun, Y., et al., 1998. Gradient-based learning applied to document recognition. Proceedings of the
IEEE, 86 (11), 2278–2324. doi:10.1109/5.726791.
LeCun, Y., Bengio, Y., and Hinton, G., 2015. Deep learning. Nature, 521 (7553), 436–444. doi:10.1038/
nature14539.
Li, W., Goodchild, M.F., and Church, R., 2013. An efficient measure of compactness for
two-dimensional shapes and its application in regionalization problems. International Journal of
Geographical Information Science, 27 (6), 1227–1250. doi:10.1080/13658816.2012.752093.
Li, Z., et al., 2004. Automated building generalization based on urban morphology and Gestalt
theory. International Journal of Geographical Information Science, 18 (5), 513–534. doi:10.1080/
13658810410001702021.
Mark, D.M., et al., 1999. Cognitive models of geographical space. International Journal of
Geographical Information Science, 13 (8), 747–774. doi:10.1080/136588199241003.
22 X. YAN ET AL.
Mennis, J.L., Peuquet, D.J., and Qian, L., 2000. A conceptual framework for incorporating cognitive
principles into geographical database representation. International Journal of Geographical
Information Science, 14 (6), 501–520. doi:10.1080/136588100415710.
Mokhtarian, F. and Mackworth, A.K., 1992. A theory of multiscale, curvature-based shape represen-
tation for planar curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (8),
789–805. doi:10.1109/34.149591.
Niepert, M., Ahmed, M., and Kutzkov, K., 2016. Learning convolutional neural networks for graphs. In:
Proceedings of International Conference on International Conference on Machine Learning (ICML).
New York, USA, 2014–2023.
Qi, C.R., et al., 2017. Pointnet: deep learning on point sets for 3d classification and segmentation. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii,
USA, 652–660.
Reichstein, M., et al., 2019. Deep learning and process understanding for data-driven Earth system
science. Nature, 566 (7743), 195–204. doi:10.1038/s41586-019-0912-1.
Ritter, S., et al., 2017. Cognitive psychology for deep neural networks: a shape bias case study. In:
Proceedings of the 34th International Conference on Machine Learning (ICML). Sydney, Australia,
2940–2949.
Sandryhaila, A. and Mouraj, J.M.F., 2014. Discrete signal processing on graphs: frequency analysis.
IEEE Transactions on Signal Processing, 62 (12), 3042–3054. doi:10.1109/TSP.2014.2321121.
Schmidhuber, J., 2014. Deep learning in neural networks: an overview. Neural Networks, 61, 85–117.
doi:10.1016/j.neunet.2014.09.003
Sedlmeier, A. and Feld, S., 2018. Learning indoor space perception. Journal of Location Based
Services, 12 (3–4), 179–214. doi:10.1080/17489725.2018.1539255.
Shilane, P., et al., 2004. The Princeton shape benchmark. In: Proceedings Shape Modeling Applications.
Genova, Italy, 167–178.
Shuman, D.I., Ricaud, B., and Vandergheynst, P., 2016. Vertex-frequency analysis on graphs. Applied
and Computational Harmonic Analysis, 40 (2), 260–291. doi:10.1016/j.acha.2015.02.005.
Steiniger, S., et al., 2008. An approach for the classification of urban building structures based on
discriminant analysis techniques. Transactions in GIS, 12 (1), 31–59. doi:10.1111/j.1467-
9671.2008.01085.x.
Teague, M.R., 1980. Image analysis via the general theory of moments. Journal of the Optical Society
of America, 70 (8), 920–930. doi:10.1364/JOSA.70.000920.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic
Geography, 46, 234–240. doi:10.2307/143141
Touya, G., Zhang, X., and Lokhat, I., 2019. Is deep learning the new agent for map generalization?
International Journal of Cartography, 5 (2–3), 142–157. doi:10.1080/23729333.2019.1613071.
van der Maaten, L. and Hinton, G., 2008. Visualizing data using t-SNE. Journal of Machine Learning
Research, 9, 2579–2605.
Wertheimer, M., 1938. Laws of organization in perceptual forms. In: W.D. Ellis, ed. A source book of
Gestalt psychology. London: Routledge & Kegan Paul, 71–88.
Xu, Y., et al., 2017a. Quality assessment of building footprint data using a deep autoencoder
network. International Journal of Geographical Information Science, 31 (10), 1929–1951.
doi:10.1080/13658816.2017.1341632.
Xu, Y., et al., 2017b. Shape similarity measurement model for holed polygons based on position
graphs and Fourier descriptors. International Journal of Geographical Information Science, 31 (2),
253–279. doi:10.1080/13658816.2016.1192637.
Yan, X., et al., 2019. A graph convolutional neural network for classification of building patterns
using spatial vector data. ISPRS Journal of Photogrammetry and Remote Sensing, 150, 259–273.
doi:10.1016/j.isprsjprs.2019.02.010.
Yan, X., Ai, T., and Zhang, X., 2017. Template matching and simplification method for building
features based on shape cognition. ISPRS International Journal of Geo-Information, 6 (8), 250.
doi:10.3390/ijgi6080250.
Yang, C., Wei, H., and Yu, Q., 2018a. A novel method for 2D nonrigid partial shape matching.
Neurocomputing, 275, 1160–1176. doi:10.1016/j.neucom.2017.09.067
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 23
Yang, M., et al., 2018b. A map-algebra-based method for automatic change detection and spatial
data updating across multiple scales. Transactions in GIS, 22 (2), 435–454. doi:10.1111/tgis.12320.
Yao, D., et al., 2018. Learning deep representation for trajectory clustering. Expert Systems, 35 (2),
e12252. doi:10.1111/exsy.12252.
Zhang, L., et al., 2013. A spatial cognition-based urban building clustering approach and its
applications. International Journal of Geographical Information Science, 27 (4), 721–740.
doi:10.1080/13658816.2012.700518.
Zhang, Y., et al., 2020. A novel residual graph convolution deep learning model for short-term
network-based traffic forecasting. International Journal of Geographical Information Science, 34
(5), 969–995. doi:10.1080/13658816.2019.1697879.
Zhou, X., et al., 2018. Change detection for building footprints with different levels of detail using
combined shape and pattern analysis. ISPRS International Journal of Geo-Information, 7 (10), 406.
doi:10.3390/ijgi7100406.
Zhu, D., et al., 2020. Spatial interpolation using conditional generative adversarial neural networks.
International Journal of Geographical Information Science, 34 (4), 735–758. doi:10.1080/
13658816.2019.1599122.