0% found this document useful (0 votes)
28 views24 pages

Graph Convolutional Autoencoder Model For The

This research article presents a graph convolutional autoencoder (GCAE) model designed for shape coding and cognition of buildings in maps. The model utilizes a graph structure to represent building shapes, extracting multiple features from boundary points to enhance spatial cognition and shape similarity measurement. Experimental results demonstrate that the GCAE model outperforms existing methods in shape representation and is effective for applications like shape retrieval and matching.

Uploaded by

zshuoshuo12138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views24 pages

Graph Convolutional Autoencoder Model For The

This research article presents a graph convolutional autoencoder (GCAE) model designed for shape coding and cognition of buildings in maps. The model utilizes a graph structure to represent building shapes, extracting multiple features from boundary points to enhance spatial cognition and shape similarity measurement. Experimental results demonstrate that the GCAE model outperforms existing methods in shape representation and is effective for applications like shape retrieval and matching.

Uploaded by

zshuoshuo12138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

International Journal of Geographical Information

Science

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tgis20

Graph convolutional autoencoder model for the


shape coding and cognition of buildings in maps

Xiongfeng Yan , Tinghua Ai , Min Yang & Xiaohua Tong

To cite this article: Xiongfeng Yan , Tinghua Ai , Min Yang & Xiaohua Tong (2020): Graph
convolutional autoencoder model for the shape coding and cognition of buildings in maps,
International Journal of Geographical Information Science, DOI: 10.1080/13658816.2020.1768260

To link to this article: https://doi.org/10.1080/13658816.2020.1768260

Published online: 25 May 2020.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tgis20
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
https://doi.org/10.1080/13658816.2020.1768260

RESEARCH ARTICLE

Graph convolutional autoencoder model for the shape coding


and cognition of buildings in maps
a,b b b
Xiongfeng Yan , Tinghua Ai , Min Yang and Xiaohua Tonga
a
College of Surveying and Geo-Informatics, Tongji University, Shanghai, China; bSchool of Resource and
Environmental Sciences, Wuhan University, Wuhan, China

ABSTRACT ARTICLE HISTORY


The shape of a geospatial object is an important characteristic and Received 6 September 2019
a significant factor in spatial cognition. Existing shape representation Accepted 8 May 2020
methods for vector-structured objects in the map space are mainly KEYWORDS
based on geometric and statistical measures. Considering that shape Shape coding; graph
is complicated and cognitively related, this study develops a learning convolutional autoencoder
strategy to combine multiple features extracted from its boundary (GCAE); spatial cognition;
以图结构对建筑 and obtain a reasonable shape representation. Taking building data deep learning; vector
的形状进行建+根 as example, this study first models the shape of a building using building data
据局部和区域结 a graph structure and extracts multiple features for each vertex based
构为每个顶点提 on the local and regional structures. A graph convolutional autoen-
取多个特征
coder (GCAE) model comprising graph convolution and autoencoder
architecture is proposed to analyze the modeled graph and realize
shape coding through unsupervised learning. Experiments show that
the GCAE model can produce a cognitively compliant shape coding,
with the ability to distinguish different shapes. It outperforms exist-
ing methods in terms of similarity measurements. Furthermore, the
shape coding is experimentally proven to be effective in representing
the local and global characteristics of building shape in application
scenarios such as shape retrieval and matching.

1. Introduction
Spatial cognition aims to understand and explain how people acquire, organize, utilize,
and revise information in spatial awareness activities, which is an important research field
in cognitive science and psychology (Mark et al. 1999). It can help people establish spatial
concepts and address the high-order spatial properties of geospatial objects, phenomena,
or events in the map space, including shapes, patterns, movements, trends, evolution, and
interrelationships (Mennis et al. 2000, Ai et al. 2013). Research on spatial cognition can be
broadly divided into two aspects. The first aspect is the characteristics of cognitive
subjects, i.e. cognitive behaviors or laws, such as the Gestalt principles, which state that
people naturally tend to pursue the integrity or continuity of the overall structure and
then gradually investigate local details (Wertheimer 1938, Li et al. 2004). The second
aspect is the characteristics of cognitive objects, i.e. the features or properties of
a geospatial object itself, including geometric, topological, or semantical information.

CONTACT Tinghua Ai [email protected]


© 2020 Informa UK Limited, trading as Taylor & Francis Group
2 X. YAN ET AL.

Shape is an essential characteristic of cognitive objects, which effectively expresses the


form and distribution of geospatial objects and conveys the evolution and interaction of
geographical phenomena. In the map space, shape representation depends on data
organization. For rasterized objects, the concept of shape can be realized by integrating
a set of pixels or raster grids, such as the chain codes of recorded boundaries or the
statistics of subdivided units (Teague 1980). For vector-structured objects, shape refers to
a pattern formed by a polygon and typically has two representation methods: region
representation and boundary representation. The region representation method is
a measure based on the overall characteristic of the region, such as compactness (Li
et al. 2013); or a comparison with special shapes such as the smallest bounding rectangle
(SBR) or the equal area circle (EAC) (Basaraner and Cetinkaya 2017). The boundary
representation method approximates the shape by a string or a function such as the
curvature (Mokhtarian and Mackworth 1992), shape context (Belongie et al. 2002), Fourier
descriptor (Ai et al. 2013), or turning function (Arkin et al. 1991). In addition, researchers
have proposed multi-scale and multi-feature methods to describe the local and global
characteristics in detail, e.g. the multi-scale convexity concavity (MCC) (Adamek and
O’connor 2004), triangle area representation (TAR) (Alajlan et al. 2007), and triangular
centroid distances (TCDs) (Yang et al. 2018a). Although intuitive and computationally
stable, these methods are mainly based on geometric and statistical measures.
Considering that shape is extremely complicated and cognitively related, it is necessary
to develop a model for deep analysis using reasoning or learning strategies.
In recent years, deep learning has produced unprecedented results in computer vision,
speech recognition, and natural language processing. In the field of geoscience, it has
been increasingly used to understand spatial processes and extract geographical insights
(Reichstein et al. 2019), such as spatial interpolation (Zhu et al. 2020), aftershock prediction
(DeVries et al. 2018), cartographic generalization (Feng et al. 2019, Touya et al. 2019), and
indoor space perception (Sedlmeier and Feld 2018). Deep learning is essentially repre-
sentation learning with multiple levels of representations (LeCun et al. 2015). It combines
multiple simple but nonlinear modules (i.e. the network layers) and learns a way to
transform raw input into an ideal output representation (Schmidhuber 2014). Inspired
by spatial association (Anselin 1995) and the first law of geography (Tobler 1970), shape
representation in the map space involves considering local structures formed by neigh-
boring points, tangents, or arcs and then combining them to achieve a cognitive result of
the overall structure. Deep learning has a strong representation ability for such local visual
features. The excellent performance of convolutional neural networks (CNNs) can be
mainly attributed to the local correlation preservation, which is implemented with
a small kernel size and techniques such as weight sharing and pooling (LeCun et al.
1998). Experimental studies have shown that deep learning exhibits a shape bias,
enabling objects to be preferentially distinguished by shape rather than color (Ritter
et al. 2017).
However, previous studies mainly dealt with underlying grid-like, array, or unordered
data, such as 3D models (Qi et al. 2017) and 1D trajectories (Yao et al. 2018), and rarely
applied deep learning to 2D vector shapes in the map space. This is because vector-
structured shapes as input data do not easily satisfy the requirement of regularity, which
is essential for deep learning models. To this end, some researchers have proposed
neural networks on graphs, with the aim of extending deep learning technologies to
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 3

analyze graph-structured data (Bronstein et al. 2017). These networks require defining
an effective convolution. There are two potential solutions: the spatial approach 空间域和谱域
(Niepert et al. 2016) and the spectral approach (Shuman et al. 2016). In the spatial
approach, the convolution is directly performed in the vertex domain, whereas in the
spectral approach, it is realized through graph Fourier transform and convolution
theorem. Further details on the two approaches can be found in the review by
Bronstein et al. (2017). Similar to 2D convolution, the graph spectral convolution has
advantages such as fast computation, few parameters, and spatially local connectivity
(Defferrard et al. 2016). Therefore, it can be combined with computational models, such
as neural networks or autoencoder networks, to build a learning architecture. This
method has achieved exciting results for the pattern classification of spatial groups
(Yan et al. 2019). This study will attempt to apply it to the shape analysis of individual
geospatial objects.
Buildings are pivotal components of map spaces and play an important role in
applications such as urban modeling (Steiniger et al. 2008, Henn et al. 2012), high-
definition mapping and navigation (Zhang et al. 2013). As buildings exhibit salient visual
characteristics, such as perpendicular corners and symmetrical axes, processing tasks for
building data are based on shape analysis and cognition, e.g. simplification (Yan et al.
2017), change detection (Zhou et al. 2018), data updating (Yang et al. 2018b), and quality
assessment (Xu et al. 2017a). This study takes a 2D vector building as an example and
extracts its inherent features using a graph deep learning approach, to support the
mechanism and formalization of spatial cognition. As boundary-based representation of
building shapes has advantages, such as low redundancy, high precision, and informa-
tion-intensive, the study first uses boundary points as vertices to construct a graph
structure of the building shape and extracts the vertex features based on the boundary
structure. Subsequently, an autoencoder learning model is constructed by combining the
graph convolution and deep autoencoder architecture; unlabeled data are used for the
unsupervised training to realize shape coding.
The remainder of this paper is organized as follows. Section 2 details the framework for
coding the building shape using graph deep learning. Section 3 presents the experiments
and results, including detailed analyses. Section 4 discusses the application of shape
coding to scenarios such as shape retrieval and shape matching. Finally, Section 5
concludes the paper.

2. Graph convolutional autoencoder for the coding of building shapes


Based on the graph structure constructed from the boundary points of a building, this
study uses an autoencoder on graphs to learn a reasonable coding and realize shape
similarity measurement and cognitive representation.

2.1. Framework for coding the shape of a building 建筑物形状编码框架


The overall framework for coding the shape of a building using graph deep learning
consists of three parts, as shown in Figure 1.
4 X. YAN ET AL.

Figure 1. Overall framework for coding the shape of a building using graph deep learning. The colors
assigned to the vertex (or edge) on the graph indicate the feature values (or weights).

● Graph construction. The boundary points of the building are used as vertices to
construct a graph structure, and the description features for each vertex are
extracted based on its neighborhood structures.
● Graph operation. A novel graph convolution is introduced based on the Fourier
transform and convolution theorem, providing a method to integrate the multiple
features of a vertex and its neighbors.
● Graph learning. The graph convolution and autoencoder are combined to build
a graph learning model, and unsupervised training is performed to make it capable
of encoding the building shape.

Through the above process, the building shape in a 2D plane is encoded as a 1D vector,
and the similarity between two different shapes is measured by the distance between
their vectors, including the Euclidean and cosine distances.

2.2. Graph construction 图形构造


A graph is an ideal tool for modeling spatial objects. It is defined as G ¼ ðV; E; W Þ, where
V and E are finite sets of jV j ¼ N vertices and jE j ¼ M edges, and W 2 R NN is an
adjacency matrix that encodes the edge weights. Each vertex includes several features
representing the graph functions (also called signals).
A graph of a building connected by discrete points (Figure 2(a)) can be naturally
constructed using these points as vertices and the connection relationship as edges
(Figure 2(b)). Specifically, the length of an edge is considered as its weight.
Feature extraction for the vertices is a key point. Considering that shape boundary
conveys rich cognitive information through the contextual vertices, such as the curvature,
a local structure composed of a vertex and its neighbors is used (Figure 2(c)). Additionally,
the general characteristics of the region can be first identified by a visual perception of the
shape via the Gestalt principles, followed by a detailed boundary content analysis; hence,
a regional structure composed of the neighbors and the center point is also used (Figure 2
(d)). Seven specific indicators are summarized below.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 5

Figure 2. Graph of a building shape and the features extracted for its vertices. The colors assigned to
the vertex (or edge) on the graph indicate the feature values (or weights).

2.2.1. Features of local structure


As depicted in Figure 2(c), vertex A with its front and rear s-order (e.g. 1st-order in this
case) neighbors B and C, respectively, constitute a triangle ΔABC. Its geometrical para-
meters are employed as the descriptive features for A, including: (1) triangle area SsA , (2)
segment length LsA between the vertices B and C, and (3) turning angle αsA . The sign of αsA
depends on the concavity or convexity of the arc BAC. If the segment BA to the segment
AC rotates clockwise, the arc BAC is convex and αsA is positive; otherwise, it is negative.
The shape features should be invariant under shifting, scaling, and rotation. As the
shape changes in scale, the triangle area and segment length also change; therefore, the
features SsA and LsA need to be normalized. SsA is normalized by dividing it with the building
area S0 , and LsA is normalized by dividing it with the arithmetic square root of S0 :

SsA
SsA ¼ (1)
S0

LsA
LsA ¼ pffiffiffiffi
ffi (2)
S0

2.2.2. Features of regional structure


As depicted in Figure 2(d), the s-order neighbors B and C of vertex A, with the center point
O, constitute a triangle ΔOBC. With the lengths of the segments OB, OC, and BC taken as
d1 , d2 , and d3 , respectively, the regional features of A include: (1) triangle area HAs , (2) semi-
perimeter DsA ¼ ðd1 þ d2 þ d3 Þ=2, (3) radius RsA ¼ HAs =DsA , and (4) turning angle βsA .
Similarly, the features HAs , DsA , and RsA are normalized as follows.

HsA
HAs ¼ (3)
S0

PAs
DsA ¼ pffiffiffiffi
ffi (4)
S0
6 X. YAN ET AL.

RsA
RsA ¼ pffiffiffiffi
ffi (5)
S0

Different neighboring order s (i.e. corresponding to different vertices B and C) will form
different local and regional triangular structures. If the neighborhood sizes from order 1 to
Z are considered, a 7Z-dimensional feature pi can be extracted for each vertex i,
expressed as
 
pi ¼ Ssi ; Lsi ; αsi ; His ; Dsi ; Rsi ; βsi ; ðs 2 f1; 2; . . . ; Z gÞ (6)

The dimension of the features PN7Z for the entire shape graph is N  7Z, where N is the
number of vertices.

2.3. Convolution operation on graphs


It is difficult to directly define a convolution in the vertex domain because of the non-
stationarity of the vertex neighborhoods. This study introduces a graph convolution 频率域的
performed in the frequency domain. The mathematical idea is to use the eigenvectors 图卷积
 T N1
χ l l¼0 of the Laplacian matrix L, defined as L ¼ D  W; where D ¼
P
diagðd0 ; . . . ; dN1 Þ is the diagonal matrix formed by the degrees di ¼ W i;j of vertex
j
i; as the decomposition bases to linearly transform the graph-structured data (Sandryhaila
and Mouraj 2014). The expression is ^f ðλ, Þ ¼ X T f , and its inverse transform is f ðnÞ ¼ X ^f ,
 N1
where X ¼ ½χ 0 ; . . . ; χ N1  represents the eigenvector matrix. As χ Tl l¼0 satisfies ortho-
gonality, which is analogous to the complex exponential eiωt , this transform is also termed
graph Fourier transform.
According to the convolution theorem, the Fourier transform of a convolution of two
functions equals the pointwise product of their Fourier transforms. Hence, the convolu-
tion between two graph functions (Shuman et al. 2016) can be defined as
   
f  g ¼ X XTf n XTg (7)

The function, g, and its Fourier transform, X T g, are characterized as a convolution


kernel, which can also be expressed as a set of free parameters fδ, gN1
,¼0 or as a function
ΦðΛÞ of the eigenvalues fλ, gN1
,¼0 in the Fourier domain, yielding the convolution

f  g ¼ Xdiagðδ0 ;    ; δN1 ÞX  f ¼ XΦðΛÞX T f (8)

where Λ ¼ diagðλ0 ; . . . ; λN1 Þ represents the diagonal matrix of the eigenvalues.


However, it is necessary to perform eigendecomposition on Equation (8), incurring
considerable computational cost (Defferrard et al. 2016, Kipf and Welling 2017). To this
end, Hammond et al. (2011) proposed a fast convolution by representing ΦðΛÞ as
a truncated expansion in terms of the Chebyshev polynomials Tk ð xÞ up to the K th order:

X
K  
ΦðΛÞ ¼ e
θk Tk Λ (9)
k¼0
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 7

e ¼ 2Λ=λmax  IN , where λmax and IN denote the maximum eigenvalue of L


with scaled Λ
and an identity matrix of size N, respectively, θk 2 Rk is the coefficient. Tk ð xÞ is recursively
defined as TkTðxkÞ ¼ 2xTkk1Tð xÞ  Tk2 ð xÞ, with T0 ð xÞ ¼ 1 and T1 ð xÞ ¼ x.
As X ΛX ¼ X Λ X , the convolution can be rewritten as
!
X K   XK  
f  ΦðΛÞ ¼ X e XTf ¼
θk Tk Λ θk Tk L~ f (10)
k¼0 k¼0

where L~ ¼ 2L=λmax  IN . Note that the operation is realized by the multiplications of L,


thus avoiding the eigendecomposition and accelerating the computation. Moreover,
Hammond et al. (2011) proved that, for any two vertices i and j on a graph G, if
 
dG ði; jÞ > K, then LK i;j ¼ 0, where dG ði; jÞ is the minimum number of edges connecting
i and j. This indicates that the convolution implements spatial localization, such that the
features of a vertex are related only to its K-order neighboring vertices (Shuman et al.
2016).

2.4. Graph convolutional autoencoder model


The basic idea of the autoencoder is to learn a low-dimensional representation (coding)
for the input data in an unsupervised manner. The training objective is to realize
a reconstruction as close as possible to the original data rather than labels (Deng and
Yu 2014).
The introduced graph convolution has merit such as efficient computation and local
connectivity, making it possible to combine it with the autoencoder and construct
a learnable and trainable model, namely a graph convolutional autoencoder (GCAE),
that can encode building shapes. The GCAE model comprises five parts: input, encoder,
coding representation, decoder, and output. Figure 3 shows an example architecture.

Figure 3. Architecture of a graph convolutional autoencoder for coding building shapes. The output
and input are graph structures containing eight vertices with 2D features; the encoder and decoder
consist of two convolutional layers (including 2 × 3 and 3 × 2 K-order polynomial kernels, respectively)
and two pooling (or up-sampling) layers. The colors assigned to the vertex (or edge) on the graph
indicate the feature values (or weights).
8 X. YAN ET AL.

The encoder and decoder contain multiple hidden convolutional layers with the
following layer-wise propagation rule based on graph convolution and nonlinear
activation:
! !
X
Fin XK   ½l 
θi;j Tk L~ H
½lþ1 ½l 
H j ¼σ k i þb j (11)
i¼1 k¼0
½l ½lþ1
where σðnÞ denotes a nonlinear activation function; Hi and Hj denote the ith input
graph of the l layer activations and the j output graph of the ðl þ 1Þth layer activations,
th th

½l 
respectively; θi;j k and bj are the trainable Fin  Fout vector of the K-order polynomial
coefficients and 1  Fout vector of the bias in the lth layer, respectively.
According to Equation (11), the convolutional layer weights the multiple features of the
vertices to obtain an integrated representation. In addition, a pooling operation is
performed in the encoder part to further combine the features of the neighboring vertices
and reduce the graph size (i.e. the number of vertices), and an up-sampling operation is
performed in the decoder part to ensure that the graph size is consistent with the input.
Since the boundary points are linearly connected, the pooling and up-sampling opera-
tions proposed by Defferrard et al. (2016) were employed with the same efficiency as the
1D sequence data. To share the weights for batch training, the size of the input graph
structure should be consistent; this can be achieved by interpolation or padding.
The model is optimized by minimizing the mean squared error between the input
features PN7Z of the graph vertices and the reconstructed features P ^N7Z . The parameters
of the convolution kernels and the biases of the activation functions are updated using
the back propagation (BP) algorithm; the calculation of their gradients can be found in
previous reports (Defferrard et al. 2016, Yan et al. 2019).
The model focuses on the intermediate coding instead of the output part. Constraints
are usually added to the coding for specific purposes, such as the reduction of dimension-
ality. For example, in the model shown in Figure 3, the number of vertices is eventually
compressed to two by the two pooling layers, and each vertex contains only 2D features
after the spatial integration of the graph convolution. This means that the original 16-
dimensional feature in this example is reduced to a 4-dimensional feature, thus integrat-
ing and compressing the building information to yield a new shape coding.

3. Experiments and analyses of shape coding


The proposed GCAE model was implemented using Python in TensorFlow (Abadi et al.
2015). This section reports the results of experiments conducted on large building
datasets to obtain shape coding, which will be first evaluated visually and then based
on some quality metrics. The sensitivity of the model to the parameters will be also
discussed.

3.1. Experimental datasets


Since the shapes of English letters are common in buildings, they can be used as
a simplified but cognitively enhanced representation to replace buildings (Yan et al.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 9

2017); therefore, they were used as a cognitive classification reference in this study.
Experimental datasets were selected from OpenStreetMap, containing 10 types of typical
standard shapes including E-shape, T-shape, and U-shape. For diversity, the building
shapes were selected from areas with different geographical characteristics, e.g. urban,
suburban, and rural areas.
Labels are not required for training the proposed GCAE model; however, to evaluate
the performance of shape coding, the selected buildings were labeled manually by three
participants, i.e. to indicate which shape the buildings belong to. Each type consists of 500
shapes, each of which was used as a training sample for a total of 5,000 samples. Table 1
lists the 10 standard shapes and 10 example training shapes.
Because of the complexity of the shapes and the accuracy of data acquisition, the
number of boundary points in the different buildings may vary significantly.
Therefore, the study first simplified the buildings using the Douglas–Peucker algo-
rithm (Douglas and Peucker 1973) with a conservative and empirical threshold of
0.1 m; subsequently, an equally spaced interpolation was performed with a total of
64 boundary points. Additionally, the starting point of each building was adjusted to
the vertex in the bottom-left corner. The vertex features were normalized using the
Z-score method.

3.2. Model architecture and parameter settings


Figure 4 shows the architecture and parameters of the GCAE model used in this
experiment. The neighborhood sizes were set to {2, 4}, meaning that the input was
a graph structure consisting of 64 vertices with 14-dimensional features. The encoder
part contained three convolutional layers, where the numbers of convolution kernels
were 32, 12, and 4, and the polynomial order was 3. Three pooling layers were also
included, where the window size of pooling was 2, and the stride was 1. The shape
coding was a 32-dimensional vector. This coding passed through a decoder that was
symmetrical to the encoder, and was finally decoded into a graph structure consistent
with the input.
The Sigmoid function was applied for activating the convolutional layers. The model
was optimized using the Adam solver (Kingma and Ba 2015) with a learning rate of 0.005.
The mini-batch size was 50, and the model was trained for 500 epochs.

Table 1. Examples of 10 types of building shapes in the training data.

E- F- H- I- L- O- T- U- Y- Z-
Class
shape shape shape shape shape shape shape shape shape shape

Standard
shape

Training
shape
10 X. YAN ET AL.

Figure 4. Architecture and parameters of the GCAE model used in our experiments.

3.3. Visualization of shape coding


Ideally, the building shapes should exhibit a distribution pattern in which the same types
of shapes are adjacent to each other and different types of shapes are distant from each
other in the coding space. The study used the t-SNE algorithm (van der Maaten and
Hinton 2008) to visualize the trained 32-dimensional shape coding, as shown in Figure 5.
As shown, the same types of building shapes (i.e. indicated by the same color) are
gathered together to form clusters, and the distributions of the shapes between different
types can be clearly distinguished. Similar building shapes are located relatively close to
each other, such as E-shapes and F-shapes, I-shapes and O-shapes, and T-shapes and

Figure 5. Visualization of shape coding for building data.


INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 11

Y-shapes. This result demonstrates that the shape similarities between buildings calcu-
lated using the shape coding are generally accurate. However, because of the differences
in coding quality and data uncertainty, there are some overlaps in the coding of different
types of shapes: Y-shapes appear in the cluster of T-shapes in the area A of Figure 5.
Notably, several separate sub-clusters of the same shapes are also formed, probably
because of the rotation of the buildings. For example, I-shaped buildings form two sub-
clusters on the left side of Figure 5, where in the area B, the long axis of the buildings is
nearly vertical, whereas the long axis of the buildings in the area C is almost horizontal. To
further verify this conjecture, this study trained and visualized the shape coding using
adjusted building data, i.e. the orientation of all the buildings was manually adjusted to
coincide with the orientation of their cognitive letters, as shown in Figure 6. The separa-
tion of the clusters was thus remarkably improved. Most types of shapes form fairly
independent clusters; a few T-shaped and Z-shaped buildings are separated; and
L-shaped buildings, although forming two sub-clusters, are still relatively close to each
other and far from the other clusters. This shows that the GCAE model is sensitive to
variations in the orientation.

3.4. Performance of shape similarity measurement


Through the visual analysis, a preliminary conclusion is that the GCAE model can produce
shape coding that conforms to the cognition and has high discriminative ability. Further,
this study employed the following metrics to quantitatively assess the coding:

● Nearest neighbor (NN). NN refers to the number of shapes that belong to the same
type as their closest neighbor to the total number of training shapes.

Figure 6. Visualization of shape coding for building data after rotation.


12 X. YAN ET AL.

● First tier (FT) and second tier (ST). For one shape, FT and ST refer to the percentages of
shapes that belong to the same type as it does in its top (t − 1) and 2(t − 1) neighbors,
respectively, where t is the number of shapes that belong to the same type as in the
training, i.e. 500 in this study.
● Discounted cumulative gain (DCG). For one shape, the similarities to other shapes are
determined and sorted in descending order. The more advanced the sorted shape,
the stronger the influence of relevance or irrelevance. The relevance is 1 when the
types are the same; otherwise, it is 0.

All the metrics range from 0 to 1, and a higher value indicates a better performance.
More details about these metrics have been reported by Shilane et al. (2004).
In this study, the FT, ST, and DCG values of the 10 standard shapes were counted and
averaged. As baselines for the GCAE model, two conventional methods were also implemen-
ted, namely the Fourier descriptor (FD) (Ai et al. 2013) and turning function (TF) (Arkin et al.
1991), where the expansion order of the Fourier descriptor was set to 6. Table 2 lists the results.
The proposed GCAE model outperforms the rest in all the four metrics for the original
and rotating building data. The GCAE model significantly outperforms the TF method, likely
because the TF method considers only one turning feature, whereas the GCAE model
integrates multiple features of local and regional structures. However, the advantage of
the TF method is that it is not sensitive to the orientation of the buildings. This is reasonable,
as it traverses all the boundary points to find the minimum value when calculating the
similarity distance. The FT method is very sensitive to the orientation of the shape, as
evident from its poor performance when dealing with unrotated buildings. Furthermore,
buildings have characteristics such as a few vertices, right-angles, and discontinuous bends,
as artificial features. This requires a higher-order Fourier expansion to approximate, which is
computationally expensive and may have an impact on performance.
Figure 7 shows the precision–recall curves of the different methods for the two sets of
building data by averaging the ten standard shapes. The shape coding obtained using the
proposed GCAE model outperforms the other two methods in terms of the shape
similarity measurement, consistent with the previous analysis.

3.5. Parameter sensitivity analysis


The proposed GCAE model was experimentally investigated in terms of its sensitivity to
the number of convolutional layers (i.e. model depth) and neighborhood sizes for vertex
feature extraction.

Table 2. Performance of shape similarity measurement using different methods.


The higher the number the better the performance.
Method NN FT ST DCG
(a) Original building data
Tangent function (TF) 0.978 0.31 0.416 0.82
Fourier descriptor (FD) 0.968 0.25 0.339 0.777
GCAE model 0.989 0.466 0.591 0.874
(b) Building data after rotation
Tangent function (TF) 0.991 0.417 0.526 0.867
Fourier transform (FD) 0.996 0.726 0.894 0.965
GCAE model 0.997 0.868 0.934 0.986
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 13

Figure 7. Precision–recall curves of different methods.

As the pooling layer was used, the number of vertices in the coding representation
could vary with this depth. To ensure the consistency of the information compression
ratio between the different models, the coding dimension was fixed to 32 by adjusting the
number of convolution kernels in the last convolutional layer of the encoder. Table 3
reports the model performance for the original building data when the depth changes
from 1 to 5. The performance initially improves with increasing depth, but deteriorates
sharply when the depth reaches 5. There is a recovery when the number of convolution
kernels is reduced. This indicates that increasing the depth improves the feature extrac-
tion capability of the model to a certain extent; however, it increases the complexity and
may cause training difficulties.
Figure 8 shows the training loss curves at different neighborhood size settings when
extracting the vertex features. The losses under settings {1} and {1, 2, 4, 8} are the most
significant when converging, followed by {2}, {2, 4}, and {2, 4, 8}; {4} and {8} have the lowest
loss. These results reveal that the loss value depends mainly on the smallest neighbor-
hood size and the number of features. The smaller the neighborhood size and the fewer
the features, the less difficult it is to reconstruct the shape and the lower the loss.
Although a lower loss implies that the reconstructed shape features are closer to the
original ones, it does not represent the merits of the shape coding. Although more
information can be provided to the model by increasing the number of features, it brings
redundancy issues and places higher requirements on the compression capability. Table 4
gives the evaluation metrics for the different settings. A careful analysis shows that
although the settings {2} and {4} may be slightly advantageous, the difference in perfor-
mance is small. This conclusion is also appropriate for the dimensional parameters of the
shape coding: although the loss can be reduced by employing a higher coding dimension,

Table 3. Model performance varying with depth.


Model depth Number of convolution kernels NN FT ST DCG
1 {1} 0.872 0.314 0.407 0.795
2 {24,2} 0.982 0.41 0.5 0.843
3 {24,24,4} 0.987 0.433 0.574 0.867
4 {24,24,24,8} 0.992 0.482 0.626 0.885
5 {24,24,24,24,16} 0.635 0.259 0.401 0.768
5 {12,12,12,12,16} 0.924 0.411 0.567 0.829
14 X. YAN ET AL.

Figure 8. Training loss curves for different neighborhood size settings.

Table 4. Model performance under different neighborhood size settings.


Neighborhood sizes NN FT ST DCG
{1} 0973 0.469 0.601 0.877
{2} 0.978 0.509 0.643 0.884
{4} 0.987 0.483 0.648 0.876
{8} 0.963 0.453 0.606 0.875
{2, 4} 0.98 0.505 0.604 0.882
{4, 8} 0.981 0.497 0.634 0.885
{2, 4, 8} 0.98 0.47 0.641 0.881
{1, 2, 4, 8} 0.974 0.494 0.61 0.882

it weakens the information compression capability of the model, and even the shape can
be reconstructed by simply copying it.

4. Applications of shape coding


In this section, experiments were carried out to discuss the application of shape coding to
scenarios such as shape retrieval and shape matching.

4.1. Experimental dataset


A large-scale topographic map extracted from the downtown of Shanghai, China, was
used as the experimental area, as shown in Figure 9. The area is approximately 3.5 km ×
2.7 km, with a total of 2,751 building shapes, including many I-shaped and L-shaped
buildings and several typical building shapes, including E-shape, H-shape, and O-shape.
The number of building boundary points N was set to 64, the neighborhood size was set
to {2, 4}, and the model architecture was the same as that shown in Figure 5. Considering
that the GCAE model may be sensitive to the orientation, the orientation of each building
was automatically adjusted by referring to the long axis orientation of its SBR.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 15

Figure 9. Experimental area in Shanghai, China.

4.2. Shape retrieval


The objective of shape-based spatial retrieval is to find targets that match the predefined
shape (also known as a template) from a GIS database from the perspective of psycho-
logical cognition. For example, users draw a sketch of an image they have in their minds
and retrieve it from the database (Egenhofer 1997).
One of the key problems in retrieving the shapes of buildings is the measurement of
shape similarity, which is solved by the GCAE coding. The study first performed unsuper-
vised training on all the buildings and obtained a shape coding database. Next, 10 typical
building shapes were chosen to retrieve similar buildings from the shape code library.
Table 5 records the top seven shapes that are most similar to each standard shape, as well
as the Euclidean distances between them.
As listed in Table 5, the top seven shapes retrieved by each standard shape are
visually similar, indicating that the GCAE model can be effectively applied to the shape
retrieval scenario. For simple shapes, such as I-shape, L-shape, and U-shape, there are
several similar buildings in the experimental area; therefore, the retrieved results are
very similar, and the distance values are small. From the three sets of retrieved results, it
is found that the sensitivity of the model to the orientation is alleviated through the
adjustment using the SBR. For the E-shaped retrieved results, the sixth similar building
does not belong to the standard E-shape in terms of cognition. However, there are visual
similarities, and the distance is small, which indicates that the shape coding accurately
represents the overall characteristics. The analysis of the fifth E-shaped, seventh
H-shaped, and seventh Z-shaped retrieved shapes shows that they are partially similar
to the standard shapes, indicating that the coding has a good representational
16 X. YAN ET AL.

Table 5. Top seven retrieved buildings for the ten standard shapes.

Retrieval target Top 1 Top 2 Top 3 Top 4 Top 5 Top 6 Top 7

E-shape
0.763 0.991 1.027 1.043 1.046 1.124 1.151

F-shape
0.786 0.837 0.874 0.895 0.905 0.918 0.931

H-shape
0.358 0.661 0.677 0.686 0.736 0.824 1.107

I-shape
0.019 0.026 0.044 0.049 0.049 0.052 0.057

L-shape
0.288 0.326 0.359 0.426 0.436 0.440 0.449

O-shape
0.097 0.116 0.117 0.137 0.151 0.322 0.329

T-shape
0.537 0.710 0.714 0.809 0.828 0.872 0.873

U-shape
0.390 0.415 0.479 0.506 0.621 0.643 0.672

Y-shape
0.531 0.535 0.623 0.791 0.812 0.840 0.869

Z-shape
0.722 0.817 0.847 0.857 0.875 0.932 0.949

capability for local characteristics. Furthermore, from the F-shaped and Y-shaped
retrieved results, after retrieving all the similar shapes in the experimental area, the
GCAE model tends to favor the selection of shapes with more details; overall, the results
are in accordance with human visual cognition.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17

4.3. Shape matching


Shape matching is done as a further analysis of the geometric differences between shapes
based on the similarity measurement. The differences can be quantified by the matching
distance, which is calculated by comparing the areas of the intersection and union
between two shapes A and B (Li et al. 2013):
AreaðA \ BÞ
MDðA; BÞ ¼ 1  (12)
AreaðA [ BÞ
For the ten standard shapes, several buildings with similar shapes were selected from
the experimental area. A matching strategy forcing the standard shape’s SBR to coincide
with the building’s SBR was used to adjust the orientation, position, and size and thereby
minimize their geometric differences (Ai et al. 2013). Finally, the shape similarities (SD) and
matching distances (MD) between the standard shapes and buildings were calculated
using the GCAE model and Equation (12), respectively, as shown in Table 6.
Table 6 shows that except for the Z-shape, the most similar standard shapes of the other
buildings have least matching distances between them. For these buildings, the similarity
between the standard shapes most similar to them is lower than that of the others,
consistent with the visual cognition results. For the Z-shaped building, the shape similarity
differs significantly from the matching distance and from visual cognition. An in-depth
analysis revealed that one possible reason for this incorrect matching is the adjustment of
building orientation in data preprocessing, as the SBR orientations of the building and the
standard shape are inconsistent.
Figure 10 presents the correlations between the shape similarity and matching dis-
tances of two buildings to the corresponding standard shapes. The two metrics are
positively correlated with coefficients range from 0.7 to 1.0. This result indicates that to
a certain extent, the smaller the shape similarity calculated using the GCAE model, the
better the matching.

4.4. Discussions
The proposed shape coding has serval merits: (1) It is able to identify the local and global
characteristics in a cognitively compliant manner, as indicated by the shape retrieval test.
This can be probably attributed to the fact that the method considers multiple features
extracted based on the local and regional structures and provides a learning mechanism
for integrating these features. (2) Its shape similarity measurement has the meaning of
geometric difference, which is an important property for some downstream applications.
For example, in cartographic generalization, when a template shape is used to replace
a building for simplification, the geometric accuracy should be guaranteed in addition to
having similar shapes (Yan et al. 2017).
Experimental observations show that this method has some drawbacks: it is sensitive
to the orientation. Some possible improvements are adding a spatial transform network
(STN) (Jaderberg et al. 2015) module or introducing cognitive results as supervised knowl-
edge for training. In addition, this study concentrated on planar buildings without holes.
The shape cognition of holed buildings (Xu et al. 2017b) and 3D buildings using the graph
deep learning is conceived as a part of future work.
18 X. YAN ET AL.

Table 6. Shape similarity (SD) and matching distance (MD) between standard shapes and buildings.

Standard
shape
Building

E-shape
SD 0.671 1.257 2.133 2.157 1.668 2.193 1.610 2.370 2.216
MD 0.224 0.239 0.36 0.276 0.352 0.345 0.33 0.54 0.425

F-shape
SD 1.881 0.152 1.921 2.113 1.166 1.636 1.2 1.744 2.144
MD 0.353 0.14 0.365 0.306 0.238 0.258 0.434 0.468 0.407

H-shape
SD 2.367 2.207 0.648 2.262 2.182 1.881 2.338 2.035 1.366
MD 0.465 0.459 0.237 0.49 0.534 0.446 0.539 0.554 0.393

I-shape
SD 2.983 2.151 2.160 0.793 2.129 1.368 1.898 2.389 2.333
MD 0.385 0.334 0.237 0.194 0.398 0.297 0.253 0.589 0.401

L-shape
SD 1.984 1.178 1.955 2.074 0.193 1.279 1.046 1.851 2.02
MD 0.422 0.247 0.457 0.385 0.078 0.401 0.343 0.622 0.448

T-shape
SD 2.069 1.415 1.782 1.639 1.531 1.006 1.942 1.772 2.005
MD 0.378 0.257 0.339 0.277 0.428 0.115 0.47 0.423 0.428

U-shape
SD 1.866 1.316 2.133 2.011 0.961 1.568 0.695 1.877 2.247
MD 0.402 0.372 0.404 0.264 0.27 0.461 0.157 0.681 0.528

Y-shape
SD 2.53 1.654 1.849 2.31 1.521 1.128 1.742 0.220 1.777
MD 0.551 0.45 0.481 0.55 0.635 0.402 0.623 0.187 0.515

Z-shape
SD 2.593 1.979 2.017 1.793 1.823 1.923 1.949 1.930 1.945
MD 0.635 0.629 0.598 0.48 0.664 0.588 0.5 0.673 0.683
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 19

Figure 10. Correlation analysis between the similarity distance and matching distance.

This study provides insights into the use of cutting-edge artificial intelligence (AI)
techniques to solve conventional spatial cognitive problems in the map space. This
study is able to deal with vector-structured geospatial data instead of raster-based in
previous studies, since the gap between deep learning and graph-structured data has
been filled by the graph convolution. In this sense, various tasks in the map space have
the potential to pave a new way through graph deep learning. For example, the latest
literature shows that it has emerged in spatial pattern recognition (Yan et al. 2019) and
traffic modeling (Zhang et al. 2020).

5. Conclusions
Shape coding and representation is a classical and challenging issue in spatial cognition.
This study presented a graph convolutional autoencoder (GCAE) for the shape coding of
vector building data by combining graph convolution and autoencoder architecture. The
GCAE model can receive multiple features extracted from local and regional structures of
building shape and then learn shape coding through unsupervised training. Visual and
quantitative experiments confirmed that the proposed method has a high discriminative
ability for building shapes and outperforms existing methods in terms of similarity
measurements. Further, experiments conducted on real world data showed that the
coding effectively captures the local and global characteristics of building shapes, provid-
ing a basis for applications such as shape retrieval and matching. The pros and cons of the
proposed GCAE method were highlighted, including some issues that can be solved in the
future.

Acknowledgments
Special thanks go to the editor and anonymous reviewers for their insightful comments and
constructive suggestions that substantially improved the quality of the paper.
20 X. YAN ET AL.

Data and codes availability statement


The data and codes that support the findings of this study are available in Figshare at http://doi.org/
10.6084/m9.figshare.11742507.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the National Natural Science Foundation of China [41531180,
41871377]; National Key Research and Development Program of China [2017YFB0503500].

Notes on contributors
Xiongfeng Yan received the B.S. and Ph.D. degrees in cartography from Wuhan University in 2015
and 2019, respectively. He is currently a Postdoctoral with the College of Surveying and Geo-
Informatics, Tongji University, Shanghai, China. His research interests include cartography and
machine learning with special focus on the graph-structured spatial data.
Tinghua Ai is a Professor at the School of Resource and Environmental Sciences, Wuhan University,
Wuhan, China. He received the Ph.D. degree in cartography from the Wuhan Technical University of
Surveying and Mapping in 2000. His research interests include multi-scale representation of spatial
data, map generalization, spatial cognition, and spatial big data analysis.
Min Yang is an Associate Professor at the School of Resource and Environmental Sciences, Wuhan
University, Wuhan, China. He received the B.S. and Ph.D. degrees in cartography from Wuhan
University in 2007 and 2013, respectively. His research interests include change detection of spatial
data, map generalization, and spatial big data analysis.
Xiaohua Tong is a Professor at the College of Surveying and Geo-Informatics, Tongji University,
Shanghai, China. He received the Ph.D. degree in geoscience from Tongji University in 1999. His
research interests include photogrammetry and remote sensing, trust in spatial data, and image
processing for high-resolution satellite images.

ORCID
Xiongfeng Yan http://orcid.org/0000-0003-4748-464X
Tinghua Ai http://orcid.org/0000-0002-6581-9872
Min Yang http://orcid.org/0000-0003-1973-527X

References
Abadi, M., et al., 2015. TensorFlow: large-scale machine learning on heterogeneous systems.
Software. Available from: tensorflow.org
Adamek, T. and O’connor, N.E., 2004. A multiscale representation method for nonrigid shapes with
a single closed contour. IEEE Transactions on Circuits and Systems for Video Technology, 14 (5),
742–753. doi:10.1109/TCSVT.2004.826776.
Ai, T., et al., 2013. A shape analysis and template matching of building features by the Fourier
transform method. Computers, Environment and Urban Systems, 41 (5), 219–233. doi:10.1016/j.
compenvurbsys.2013.07.002.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21

Alajlan, N., et al., 2007. Shape retrieval using triangle-area representation and dynamic space
warping. Pattern Recognition, 40 (7), 1911–1920. doi:10.1016/j.patcog.2006.12.005.
Anselin, L., 1995. Local indicators of spatial association—LISA. Geographical Analysis, 27 (2), 93–115.
doi:10.1111/j.1538-4632.1995.tb00338.x.
Arkin, E.M., et al., 1991. An efficiently computable metric for comparing polygonal shapes. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13 (2), 209–216. doi:10.1109/
34.75509.
Basaraner, M. and Cetinkaya, S., 2017. Performance of shape indices and classification schemes for
characterising perceptual shape complexity of building footprints in GIS. International Journal of
Geographical Information Science, 31 (10), 1952–1977. doi:10.1080/13658816.2017.1346257.
Belongie, S., Malik, J., and Puzicha, J., 2002. Shape matching and object recognition using shape
contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (4), 509–522.
doi:10.1109/34.993558.
Bronstein, M.M., et al., 2017. Geometric deep learning: going beyond Euclidean data. IEEE Signal
Processing Magazine, 34 (4), 18–42. doi:10.1109/MSP.2017.2693418.
Defferrard, M., Bresson, X., and Vandergheynst, P., 2016. Convolutional neural networks on graphs
with fast localized spectral filtering. In: Proceedings of International Conference on Neural
Information Processing Systems (NIPS). Barcelona, Spain, 3844–3852.
Deng, D. and Yu, D., 2014. Deep learning: methods and applications. Foundations and Trends in
Signal Processing, 7 (3–4), 197–387. doi:10.1561/2000000039.
DeVries, P.M.R., et al., 2018. Deep learning of aftershock patterns following large earthquakes.
Nature, 560 (7720), 632–634. doi:10.1038/s41586-018-0438-y.
Douglas, D.H. and Peucker, T.K., 1973. Algorithms for the reduction of the number of points required to
represent a digitized line or its caricature. Cartographica: The International Journal for Geographic
Information and Geovisualization, 10 (2), 112–122. doi:10.3138/FM57-6770-U75U-7727.
Egenhofer, M.J., 1997. Query processing in spatial-query-by-sketch. Journal of Visual Languages and
Computing, 8 (4), 403–424. doi:10.1006/jvlc.1997.0054.
Feng, Y., Thiemann, F., and Sester, M., 2019. Learning cartographic building generalization with
deep convolutional neural networks. International Journal of Geo-Information, 8 (6), 258.
doi:10.3390/ijgi8060258.
Hammond, D.K., Vandergheynst, P., and Gribonval, R., 2011. Wavelets on graphs via spectral graph
theory. Applied and Computational Harmonic Analysis, 30 (2), 129–150. doi:10.1016/j.
acha.2010.04.005.
Henn, A., et al., 2012. Automatic classification of building types in 3D city models. Geoinformatica, 16
(2), 281–306. doi:10.1007/s10707-011-0131-x.
Jaderberg, M., et al., 2015. Spatial transformer networks. In: Proceedings of International Conference
on Neural Information Processing Systems (NIPS). Montreal, Canada, 2017–2025.
Kingma, D.P. and Ba, J., 2015. Adam: a method for stochastic optimization. In: Proceedings of
international Conference on Learning Representations (ICLR). San Diego, California, USA.
Kipf, T.N. and Welling, M., 2017. Semi-supervised classification with graph convolutional networks.
In: Proceedings of International Conference on Learning Representations (ICLR). Toulon, France.
LeCun, Y., et al., 1998. Gradient-based learning applied to document recognition. Proceedings of the
IEEE, 86 (11), 2278–2324. doi:10.1109/5.726791.
LeCun, Y., Bengio, Y., and Hinton, G., 2015. Deep learning. Nature, 521 (7553), 436–444. doi:10.1038/
nature14539.
Li, W., Goodchild, M.F., and Church, R., 2013. An efficient measure of compactness for
two-dimensional shapes and its application in regionalization problems. International Journal of
Geographical Information Science, 27 (6), 1227–1250. doi:10.1080/13658816.2012.752093.
Li, Z., et al., 2004. Automated building generalization based on urban morphology and Gestalt
theory. International Journal of Geographical Information Science, 18 (5), 513–534. doi:10.1080/
13658810410001702021.
Mark, D.M., et al., 1999. Cognitive models of geographical space. International Journal of
Geographical Information Science, 13 (8), 747–774. doi:10.1080/136588199241003.
22 X. YAN ET AL.

Mennis, J.L., Peuquet, D.J., and Qian, L., 2000. A conceptual framework for incorporating cognitive
principles into geographical database representation. International Journal of Geographical
Information Science, 14 (6), 501–520. doi:10.1080/136588100415710.
Mokhtarian, F. and Mackworth, A.K., 1992. A theory of multiscale, curvature-based shape represen-
tation for planar curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (8),
789–805. doi:10.1109/34.149591.
Niepert, M., Ahmed, M., and Kutzkov, K., 2016. Learning convolutional neural networks for graphs. In:
Proceedings of International Conference on International Conference on Machine Learning (ICML).
New York, USA, 2014–2023.
Qi, C.R., et al., 2017. Pointnet: deep learning on point sets for 3d classification and segmentation. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii,
USA, 652–660.
Reichstein, M., et al., 2019. Deep learning and process understanding for data-driven Earth system
science. Nature, 566 (7743), 195–204. doi:10.1038/s41586-019-0912-1.
Ritter, S., et al., 2017. Cognitive psychology for deep neural networks: a shape bias case study. In:
Proceedings of the 34th International Conference on Machine Learning (ICML). Sydney, Australia,
2940–2949.
Sandryhaila, A. and Mouraj, J.M.F., 2014. Discrete signal processing on graphs: frequency analysis.
IEEE Transactions on Signal Processing, 62 (12), 3042–3054. doi:10.1109/TSP.2014.2321121.
Schmidhuber, J., 2014. Deep learning in neural networks: an overview. Neural Networks, 61, 85–117.
doi:10.1016/j.neunet.2014.09.003
Sedlmeier, A. and Feld, S., 2018. Learning indoor space perception. Journal of Location Based
Services, 12 (3–4), 179–214. doi:10.1080/17489725.2018.1539255.
Shilane, P., et al., 2004. The Princeton shape benchmark. In: Proceedings Shape Modeling Applications.
Genova, Italy, 167–178.
Shuman, D.I., Ricaud, B., and Vandergheynst, P., 2016. Vertex-frequency analysis on graphs. Applied
and Computational Harmonic Analysis, 40 (2), 260–291. doi:10.1016/j.acha.2015.02.005.
Steiniger, S., et al., 2008. An approach for the classification of urban building structures based on
discriminant analysis techniques. Transactions in GIS, 12 (1), 31–59. doi:10.1111/j.1467-
9671.2008.01085.x.
Teague, M.R., 1980. Image analysis via the general theory of moments. Journal of the Optical Society
of America, 70 (8), 920–930. doi:10.1364/JOSA.70.000920.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic
Geography, 46, 234–240. doi:10.2307/143141
Touya, G., Zhang, X., and Lokhat, I., 2019. Is deep learning the new agent for map generalization?
International Journal of Cartography, 5 (2–3), 142–157. doi:10.1080/23729333.2019.1613071.
van der Maaten, L. and Hinton, G., 2008. Visualizing data using t-SNE. Journal of Machine Learning
Research, 9, 2579–2605.
Wertheimer, M., 1938. Laws of organization in perceptual forms. In: W.D. Ellis, ed. A source book of
Gestalt psychology. London: Routledge & Kegan Paul, 71–88.
Xu, Y., et al., 2017a. Quality assessment of building footprint data using a deep autoencoder
network. International Journal of Geographical Information Science, 31 (10), 1929–1951.
doi:10.1080/13658816.2017.1341632.
Xu, Y., et al., 2017b. Shape similarity measurement model for holed polygons based on position
graphs and Fourier descriptors. International Journal of Geographical Information Science, 31 (2),
253–279. doi:10.1080/13658816.2016.1192637.
Yan, X., et al., 2019. A graph convolutional neural network for classification of building patterns
using spatial vector data. ISPRS Journal of Photogrammetry and Remote Sensing, 150, 259–273.
doi:10.1016/j.isprsjprs.2019.02.010.
Yan, X., Ai, T., and Zhang, X., 2017. Template matching and simplification method for building
features based on shape cognition. ISPRS International Journal of Geo-Information, 6 (8), 250.
doi:10.3390/ijgi6080250.
Yang, C., Wei, H., and Yu, Q., 2018a. A novel method for 2D nonrigid partial shape matching.
Neurocomputing, 275, 1160–1176. doi:10.1016/j.neucom.2017.09.067
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 23

Yang, M., et al., 2018b. A map-algebra-based method for automatic change detection and spatial
data updating across multiple scales. Transactions in GIS, 22 (2), 435–454. doi:10.1111/tgis.12320.
Yao, D., et al., 2018. Learning deep representation for trajectory clustering. Expert Systems, 35 (2),
e12252. doi:10.1111/exsy.12252.
Zhang, L., et al., 2013. A spatial cognition-based urban building clustering approach and its
applications. International Journal of Geographical Information Science, 27 (4), 721–740.
doi:10.1080/13658816.2012.700518.
Zhang, Y., et al., 2020. A novel residual graph convolution deep learning model for short-term
network-based traffic forecasting. International Journal of Geographical Information Science, 34
(5), 969–995. doi:10.1080/13658816.2019.1697879.
Zhou, X., et al., 2018. Change detection for building footprints with different levels of detail using
combined shape and pattern analysis. ISPRS International Journal of Geo-Information, 7 (10), 406.
doi:10.3390/ijgi7100406.
Zhu, D., et al., 2020. Spatial interpolation using conditional generative adversarial neural networks.
International Journal of Geographical Information Science, 34 (4), 735–758. doi:10.1080/
13658816.2019.1599122.

You might also like