0% found this document useful (0 votes)
5 views11 pages

Positional Encoder Graph Neural Networks

The document presents Positional Encoder Graph Neural Networks (PE-GNN), a novel framework designed to enhance the modeling of complex spatial data by incorporating spatial context and correlation into graph neural networks (GNNs). PE-GNN utilizes a positional encoder to learn context-aware embeddings for geographic coordinates and predicts spatial autocorrelation as an auxiliary task, demonstrating improved performance over existing GNN approaches in spatial interpolation and regression tasks. The method is modular, allowing integration with various GNN backbones, and is competitive with Gaussian processes in spatial interpolation tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Positional Encoder Graph Neural Networks

The document presents Positional Encoder Graph Neural Networks (PE-GNN), a novel framework designed to enhance the modeling of complex spatial data by incorporating spatial context and correlation into graph neural networks (GNNs). PE-GNN utilizes a positional encoder to learn context-aware embeddings for geographic coordinates and predicts spatial autocorrelation as an auxiliary task, demonstrating improved performance over existing GNN approaches in spatial interpolation and regression tasks. The method is modular, allowing integration with various GNN backbones, and is competitive with Gaussian processes in spatial interpolation tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Positional Encoder Graph Neural Networks for Geographic Data

Konstantin Klemmer Nathan Safir Daniel B. Neill


Microsoft Research University of Georgia New York University

Abstract GNNs are not necessarily sufficient for modeling complex


arXiv:2111.10144v3 [cs.LG] 15 Feb 2023

spatial effects: spatial context can be different at each loca-


tion, which may be reflected in the relationship with its spa-
Graph neural networks (GNNs) provide a pow-
tial neighborhood. The study of spatial context and depen-
erful and scalable solution for modeling contin-
dencies has attracted increasing attention in the machine
uous spatial data. However, they often rely on
learning community, with studies on spatial context em-
Euclidean distances to construct the input graphs.
beddings (Mai et al., 2020b; Yin et al., 2019) and spatially
This assumption can be improbable in many real-
explicit auxiliary task learning (Klemmer and Neill, 2021).
world settings, where the spatial structure is more
complex and explicitly non-Euclidean (e.g., road Here, we seek to merge these streams of research. We
networks). Here, we propose PE-GNN, a new propose the positional encoder graph neural network (PE-
framework that incorporates spatial context and GNN), a flexible approach for better encoding spatial con-
correlation explicitly into the models. Building text into GNN-based predictive models. PE-GNN is highly
on recent advances in geospatial auxiliary task modular and can work with any GNN backbone. It contains
learning and semantic spatial embeddings, our a positional encoder (PE) (Vaswani et al., 2017; Mai et al.,
proposed method (1) learns a context-aware vec- 2020b), which learns a contextual embedding for point co-
tor encoding of the geographic coordinates and ordinates throughout training. The embedding returned by
(2) predicts spatial autocorrelation in the data in PE is concatenated with other node features to provide the
parallel with the main task. On spatial interpo- training data for the GNN operator. PE-GNN further pre-
lation and regression tasks, we show the effec- dicts the local spatial autocorrelation of the output as an
tiveness of our approach, improving performance auxiliary task in parallel to the main objective, expand-
over different state-of-the-art GNN approaches. ing the approach proposed by Klemmer and Neill (2021)
We observe that our approach not only vastly im- to continuous spatial coordinates. We train PE-GNN by
proves over the GNN baselines, but can match constructing a novel training graph, based on k-nearest-
Gaussian processes, the most commonly utilized neighborhood, from a randomly sampled batch of points
method for spatial interpolation problems. at each training step. This forces PE to learn generalizable
features, as the same point coordinate might have different
spatial neighbors at different training steps. Distances be-
1 Introduction tween nodes are reflected as edge weights. This training
approach also leads us to compute a “shuffled” Moran’s I,
implicitly nudging the model to learn a general representa-
Geographic data is characterized by a natural geometric
tion of spatial autocorrelation which works across varying
structure, which often defines the observed spatial pattern.
neighbor sets. Over a range of spatial regression tasks, we
While traditional neural network approaches do not have an
show that PE-GNN consistently improves performance of
intuition to account for spatial dynamics, graph neural net-
different GNN backbones.
works (GNNs) can represent spatial structures graphically.
The recent years have seen many applications leveraging Our contributions can be summarized as follows:
GNNs for modeling tasks in the geographic domain, such
as inferring properties of a point-of-interest (Zhu et al., • We propose PE-GNN, a novel GNN architecture in-
2020) or predicting the speed of traffic at a certain location cluding a positional encoder learning spatial context
(Chen et al., 2019). Nonetheless, as we show in this study, embeddings for each point coordinate to improve pre-
dictions.
Proceedings of the 26th International Conference on Artificial • We propose a novel way of training the positional en-
Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. coder (PE): While Mai et al. (2020b) train PE in an
PMLR: Volume 206. Copyright 2023 by the author(s).
unsupervised fashion and Mai et al. (2020a) use PE in
Positional Encoder Graph Neural Networks for Geographic Data

a joint embedding with a data-dependent, secondary 2018) and GraphSAGE (Hamilton et al., 2017) are power-
encoder (e.g., text encoder), we use the output of PE ful methods for inference and representation learning with
concatenated with other node features to directly pre- spatial data. Recently, GNN approaches tailored to the spe-
dict an outcome variable. PE learns through backprop- cific complexities of geospatial data have been developed.
agation on the main regression loss in an end-to-end The authors of Kriging Convolutional Networks (Appleby
fashion. Training PE thus takes into account not only et al., 2020) propose using GNNs to perform a modified
the eventual variable of interest, but also further con- kriging task. Hamilton et al. (2017) apply GNNs for a
textual information at the current location–and its rela- spatio-temporal Kriging task, recovering data from unsam-
tion to other points. Within PE-GNN, spatial informa- pled nodes on an input graph. We look to extend this line
tion is thus represented both through the constructed of research by providing stronger, explicit capacities for
graph and the learned PE embeddings. GNNs to learn spatial structures. Additionally, our pro-
posed method is highly modular and can be combined with
• We expand the Moran’s I auxiliary task learning any GNN backbone.
framework proposed by Klemmer and Neill (2021) for
continuous spatial coordinates.
2.2 Spatial context embeddings for geographic data
• Our training strategy involves the creation of a new
training graph at each training step from the current, Through many decades of research on spatial patterns, a
random point batch. This enables learning of a more myriad of measures, metrics, and statistics have been de-
generalizable PE embedding and allows computation veloped to cover a broad range of spatial interactions. All
of a “shuffled” Moran’s I, which accounts for different of these measures seek to transform spatial locations, with
neighbors at different training steps, thus tackling the optional associated features, into some meaningful embed-
well-known scale sensitivity of Moran’s I. ding, for example, a theoretical distribution of the loca-
tions or a measure of spatial association. The most com-
• To the best of our knowledge, PE-GNN is the first
mon metric for continuous geographic data is the Moran’s
GNN based approach that is competitive with Gaus-
I statistic, developed by Anselin (1995). Moran’s I mea-
sian Processes on pure spatial interpolation tasks, i.e.,
sures local and global spatial autocorrelation and acts as
predicting a (continuous) output based solely on spa-
a detector of spatial clusters and outliers. The metric has
tial coordinates, as well as substantially improving
also motivated several methodological expansions, like lo-
GNN performance on all predictive tasks.
cal spatial heteroskedasticity (Ord and Getis, 2012) and lo-
cal spatial dispersion (Westerholt et al., 2018). Measures
2 Related work of spatial autocorrelation have already been shown to be
useful for improving neural network models through auxil-
2.1 Traditional and neural-network-based spatial iary task learning (Klemmer and Neill, 2021), model selec-
regression modeling tion (Klemmer et al., 2019), embedding losses (Klemmer
et al., 2022) and localized representation learning (Fu et al.,
Our work considers the problem of modeling geospatial 2019). Beyond these traditional metrics, recent years have
data. This poses a distinct challenge, as standard regres- seen the emergence of neural network based embeddings
sion models (such as OLS) fail to address the spatial nature for geographic information. Wang et al. (2017) use kernel
of the data, which can result in spatially correlated resid- embeddings to learn social media user locations. Fu et al.
uals. To address this, spatial lag models (Anselin et al., (2019) devise an approach using local point-of-interest
2001) add a spatial lag term to the regression equation that (POI) information to learn region embeddings and integrate
is proportional to the dependent variable values of nearby similarities between neighboring regions to learn mobile
observations, assigned by a weight matrix. Likewise, ker- check-ins. Yin et al. (2019) develop GPS2Vec, an embed-
nel regression takes a weighted average of nearby points ding approach for latitude-longitude coordinates, based on
when predicting the dependent variable. The most popu- a grid cell encoding and spatial context (e.g., tweets and
lar off-the-shelf methods for modeling continuous spatial images). Mai et al. (2020b) developed Space2Vec, an-
data are based on Gaussian processes (Datta et al., 2016). other latitude-longitude embedding without requiring fur-
Recently, there has been a rise of research on applications ther context like tweets or POIs. Space2Vec transforms the
of neural network models for spatial modeling tasks. More input coordinates using sinusoidal functions and then re-
specifically, graph neural networks (GNNs) are often used projects them into a desired output space using linear lay-
for these tasks with the spatial data represented graphically. ers. In follow-up work, Mai et al. (2020a) first propose
Particularly, they offer flexibility and scalability advan- the direct integration of Space2Vec into downstream tasks
tages over traditional spatial modeling approaches. Spe- and show its potential with experiments on spatial seman-
cific GNN operators including Graph Convolutions (Kipf tic lifting and geographic question answering. In this study,
and Welling, 2017), Graph Attention (Veličković et al., we propose to generalize their approach to any geospatial
Konstantin Klemmer, Nathan Safir, Daniel B. Neill

regression task by conveniently integrating Space2Vec em- graph connecting point locations is known, one would typ-
beddings into GNNs. ically construct a graph using the distance (Euclidean or
other) between pairs of points. In many real world settings
3 Method (e.g., points-of-interest along a road network) this assump-
tion is unrealistic and may lead to poorly defined neighbor-
hoods. Lastly, GCNs contain no intrinsic tool to transform
3.1 Graph Neural Networks with Geographic Data
point coordinates into a different (latent) space that might
We now present PE-GNN, using Graph Convolutional Net- be more informative for representing the spatial structure,
works (GCNs) as example backbone. Let us first define a with respect to the particular problem the GCN is trying to
datapoint pi = {yi , xi , ci }, where yi is a continuous target solve.
variable (scalar), xi is a vector of predictive features and ci As such, GCNs can struggle with tasks that explicitly re-
is a vector of point coordinates (latitude / longitude pairs). quire learning of complex spatial dependencies, as we con-
We use the great-circle distance dij = haversin(ci , cj ) firm in our experiments. We propose a novel approach
between point coordinates to create a graph of all points in to overcome these difficulties, by devising a new posi-
the set, using a k-nearest-neighbor approach to define each tional encoder module, learning a flexible spatial con-
point’s neighborhood. The graph G = (V, E) consists of text encoding for each geographic location. Given a
a set of vertices (or nodes) V = {v1 , . . . , vn } and a set of batch of datapoints, we create the spatial coordinate ma-
edges E = {e1 , . . . , em } as assigned by the adjacency ma- trix C from individual point coordinates c1 , . . . , cn and
trix A. Each vertex i 2 V has respective node features xi define a positional encoder P E(C, min , max , ⇥P E ) =
and target variable yi . While the adjacency matrix A usu- N N (ST (C, min , max ), ⇥P E ), consisting of a sinu-
ally comes as a binary matrix (with values of 1 indicating soidal transform ST ( min , max ) and a fully-connected
adjacency and values of 0 otherwise), one can account for neural network N N (⇥P E ), parametrized by ⇥P E . Fol-
different distances between nodes and use point distances lowing the intuition of transformers (Vaswani et al., 2017)
dij or kernel transformations thereof (Appleby et al., 2020) for geographic coordinates (Mai et al., 2020b), the sinu-
to weight A. Given a degree matrix D and an identity ma- soidal transform is a concatenation of scale-sensitive sinu-
trix I, the normalized adjacency matrix Ā is defined as: soidal functions at different frequencies, so that
1/2 1/2
Ā = D (A + I)D (1)
ST (C, min , max ) =
As proposed by Kipf and Welling (2017), a GCN layer can (3)
[ST0 (C, min , max ); . . . ; STS 1 (C, min , max )]
now be defined as:

H(l) = (ĀH(l 1)
W(l) ), l = 1, . . . , L (2) with S being the total number of grid scales and min
and max setting the minimum and maximum grid
scale (comparable to the lengthscale parameter of a ker-
where describes an activation function (e.g., ReLU)
nel). The scale-specific encoder STs (C, min , max ) =
and W(l) is a weight matrix parametrizing GCN layer
[STs,1 (C, min , max ); STs,2 (C, min , max )] processes
l. The input for the first GCN layer H(0) is given by
the spatial dimensions v (e.g., latitude and longitude) of C
the feature matrix X containing all node feature vectors
separately, so that
x1 , . . . , xn . The assembled GCN predicts the output Ŷ =
GCN (X, ⇥GCN ) parametrized by ⇥GCN .
STs,v (C, min , max )=
3.2 Context-aware spatial coordinate embeddings  ✓ [v]
◆ ✓ ◆
C C[v]
cos s/(S 1)
; sin s/(S 1)
(4)
Traditionally, the only intuition for spatial context in GCNs min g min g
stems from connections between nodes which allow for 8s 2 {0, . . . , S 1}, 8v 2 {1, 2},
graph convolutions, akin to pixel convolutions with image
data. This can restrict the capacity of the GCN to cap- where g = max . The output from ST is then fed
min
ture spatial patterns: While defining good neighborhood through the fully connected neural network N N (⇥P E )
structures can be crucial for GCN performance, this of- to transform it into the desired vector space shape,
ten comes down to somewhat arbitrary choices like select- creating the coordinate embedding matrix Cemb =
ing the k nearest neighbors of each node. Without prior P E(C, min , max , ⇥P E ).
knowledge on the underlying data, the process of setting
the right neighborhood parameters may require extensive 3.3 Auxiliary learning of spatial autocorrelation
testing. Furthermore, a single value of k might not be best
for all nodes: different locations might be more or less de- Geographic data often exhibit spatial autocorrelation: ob-
pendent on their neighbors. Assuming that no underlying servations are related, in some shape or form, to their geo-
Positional Encoder Graph Neural Networks for Geographic Data

cemb

Linear
ST
I(ŷ)

Linear
L2(I(y), I(ŷ))

concat(x,cemb)

GCNConv
GCNConv
Embed coordinates
c using positional
encoder PE ŷ

Linear
GCNConv
GCNConv
x ŷ x L1(y, ŷ)

Linear
L(y, ŷ)

L=L1+λL2
Randomly sample nbatch Build kNN graph with Graph Randomly sample nbatch Build kNN graph with Graph
datapoints p=[x,c,y] from k neighbors using Convolutional Compute datapoints p=[x,c,y] from k neighbors using Convolutional Compute
geo-database coordinates c Network loss geo-database coordinates c Network loss

GCN PE-GCN

Figure 1: PE-GCN compared to the GCN baseline: PE-GCN contains a (1) positional encoder network, learning a
spatial context embedding throughout training which is concatenated with node-level features and (2) an auxiliary learner,
predicting the spatial autocorrelation of the outcome variable simultaneously to the main regression task.

graphic neighbors. Spatial autocorrelation can be measured the training data as batch B. A graph with corresponding
using the Moran’s I metric of local spatial autocorrelation adjacency matrix AB is constructed for the batch and the
(Anselin, 1995). Moran’s I captures localized homogeneity Moran’s I metric of the outcome variable I(YB ) is com-
and outliers, functioning as a detector of spatial clustering puted. This approach brings a unique advantage: When
and spatial change patterns. In the context of our problem, training with (randomly shuffled) batches, points may have
the Moran’s I measure of spatial autocorrelation for out- different neighbors in different training iterations. The
come variable yi is defined as: Moran’s I for point i can thus change throughout iterations,
reflecting a differing set of more distant or closer neigh-
bors. This also naturally helps to tackle Moran’s I scale
(yi ȳi )
n
X sensitivity. Altogether, we refer to this altered Moran’s I as
Ii = (n 1) Pn ai,j (yj ȳj ), (5) “shuffled Moran’s I”.
j=1 (yj ȳj )2
j=1,j6=i

3.4 Positional Encoder Graph Neural Network


where ai,j 2 A denotes adjacency of observations i and j. (PE-GNN)
As proposed by Klemmer and Neill (2021), predicting the
Moran’s I metric of the output can be used as auxiliary task We now assemble the different modules of our method
during training. Auxiliary task learning (Suddarth and Ker- and introduce the Positional Encoder Graph Neural Net-
gosien, 1990) is a special case of multi-task learning, where work (PE-GNN). The whole modeling pipeline of PE-
one learning algorithm tackles two or more tasks at once. GNN compared to a naive GNN approach is pictured in
In auxiliary task learning, we are only interested in the pre- Figure 1. Sticking to the GCN example, PE-GCN is con-
dictions of one task; however, adding additional, auxiliary structed as follows: Assuming a batch B of randomly sam-
tasks to the learner might improve performance on the pri- pled points p1 , . . . , pnbatch 2 B, a spatial graph is con-
mary problem: the auxiliary task can add context to the structed from point coordinates c1 , . . . , cnbatch using k-
learning problem that can help solve the main problem. nearest-neighborhood, resulting in adjacency matrix AB .
This approach is commonly used, for example in reinforce- The point coordinates are then subsequently fed through
ment learning (Flet-Berliac and Preux, 2019) or computer the positional encoder P E(⇥P E ), consisting of the sinu-
vision (Hou et al., 2019; Jaderberg et al., 2017). soidal transform ST and a single fully-connected layer
with sigmoid activation, embedding the 2d coordinates in
Translated to our GCN setting, we seek to predict the out- a custom latent space and returning vector embeddings
come Y and its local Moran’s I metric I(Y) using the same cemb
1 nbatch = CB . The neural network allows for
, . . . , cemb emb
ˆ
network, so that [Ŷ, I(Y)] = GCN (X). As Klemmer explicit learning of spatial context, reflected in the vector
and Neill (2021) note, the local Moran’s I metric is scale- embedding. We then concatenate the positional encoder
sensitive and, due to its restriction to local neighborhoods, output with the node features, to create the input for the
can miss out on longer-distance spatial effects (Feng et al., first GCN layer:
2019; Meng et al., 2014). But while Klemmer and Neill
(2021) propose to compute the Moran’s I at different reso-
lutions, the GCN setting allows for a different, novel ap- H(0) = concat(XB , Cemb
B ) (6)
proach to overcome this issue: Rather than constructing
the graph of training points a priori, we opt for a procedure The subsequent layers follow according to Equation 2.
where in each training step, nbatch points are sampled from Note here that this approach is distinctly different from Mai
Konstantin Klemmer, Nathan Safir, Daniel B. Neill

et al. (2020a), who learn a specific joint embedding be- sampled, this creates a “shuffled” version of the metric.
tween the geographic coordinates and potential other inputs We then run inputs XB , CB , AB through the two-headed
(e.g., text data). Our approach allows for separate treatment model M⇥P E ,⇥GCN obtaining predictions ŶB , I(ŶB ).
of geographic coordinates and potential other predictors, We then compute the loss L(YB , I(YB ), ŶB , I(ŶB ), ),
allowing a higher degree of flexibility: PE-GCN can be de- weighing the Moran’s I auxiliary task according to weight
ployed for any regression task, geo-referenced in the form parameter . Lastly, we use the loss L to update our model
of latitude longitude coordinates. Lastly, to integrate the parameters ⇥GCN , ⇥P E according to stochastic gradient
Moran’s I auxiliary task, we compute the metric I(YB ) for descent. Training is conducted for tsteps after which the
our outcome variable YB at the beginning of each training final model M is returned.
step according to Equation 5, using spatial weights from
AB . Prediction is then facilitated by creating two predic- PE-GNN, with any GNN backbone, helps to tackle many
tion heads, here linear layers, while the graph operation of the particular challenges of geographic data: While our
layers (e.g., GCN layers) are shared between tasks. Finally, approach still includes the somewhat arbitrary choice of k-
ˆ B ). The loss of nearest neighbors to define the spatial graph, the proposed
we obtain predicted values ŶB and I(Y
positional encoder network is not bound by this restriction,
PE-GCN can be computed with any regression criterion,
as it does not operate on the graph. This enables a separate
for example mean squared error (MSE):
learning of context-aware embeddings for each coordinate,
accounting for neighbors at any potential distance within
the batch. While the spatial graph used still relies on pre-
L = M SE(ŶB , YB ) + M SE(I(ŶB ), I(YB )) (7) defined distance measure, the positional encoder embeds
latitude and longitude values in a high-dimensional latent
where denotes the auxiliary task weight. The final model space. These high-dimensional coordinates are able to re-
is denoted as M⇥P E ,⇥GCN . Algorithm 1 describes a train- flect spatial complexities much more flexibly and, added as
ing cycle. node features, can communicate these throughout the learn-
ing process. Batched PE-GNN training is not conducted
Algorithm 1 PE-GNN Training on a single graph, but a new graph consisting of randomly
Require: M , , k, tsteps,nbatch hyper-parameter sampled training points at each iteration. As such, at dif-
1: Initialize model M with random weights and hyper- ferent iterations, focus is put on the relationships between
parameter different clusters of points. This helps our method to gen-
2: Set optimizer with hyper-parameter eralize better, rather than just memorizing neighborhood
3: for number of training steps (tsteps) do structures. Lastly, the differing training batches also help
4: Sample minibatch B of nbatch points with features us to compute a “shuffled” version of the Moran’s I metric,
XB , coordinates CB and outcome YB . capturing autocorrelation at the same location for different
5: Construct a spatial graph with adjacency matrix (closer or more distant), random neighborhoods.
AB from coordinates CB using k-nearest neighbors
6: Using spatial adjacency AB , compute Moran’s I of
4 Experiments
output as I(YB )
7: Predict outcome
4.1 Data
[ŶB , I(ŶB )] = M⇥P E ,⇥GCN (XB , CB , AB )
8: Compute loss We evaluate PE-GNN and baseline competitors on four
L(YB , I(YB ), ŶB , I(ŶB ), ) real-world geographic datasets of different spatial resolu-
9: Update the parameters ⇥GCN , ⇥P E of model M tions (regional, continental and global):
using stochastic gradient descent California Housing: This dataset contains the prices of
10: return M over 20, 000 California houses from the 1990 U.S. census
(Kelley Pace and Barry, 2003). The regression task at hand
We begin training by initializing our model M , for exam- is to predict house prices y using features x (e.g., house
ple a PE-GCN, with random weights and potential hyper- age, number of bedrooms) and location c. California hous-
parameters (e.g., PE embedding dimension) and defining ing is a standard dataset for assessment of spatial autocor-
our optimizer. We then start the training cycle: At each relation.
training step, we first sample a minibatch B of points from Election:This dataset contains the election results of over
our training data. These points come as features XB , 3, 000 counties in the United States (Jia and Benson, 2020).
point coordinates CB and outcome variables YB . We con- The regression task here is to predict election outcomes y
struct a graph from spatial coordinates CB using k-nearest- using socio-demographic and economic features (e.g., me-
neighborhood, obtaining an adjacency matrix AB . Next we dian income, education) x and county locations c.
use AB as spatial weight matrix to compute local Moran’s Air temperature:The air temperature dataset (Hooker et al.,
I values I(YB ) from YB . As minibatches are randomly 2018) contains the coordinates of 3, 000 weather stations
Positional Encoder Graph Neural Networks for Geographic Data

(a) Real values and predictions using GraphSAGE and PE-GraphSAGE.

(b) Test error curves of GCN, GAT and GraphSAGE based models, measured by the MSE metric.

Figure 2: Visualizing predictive performance on the California Housing dataset.

Model Cali. Housing Election Air Temp. 3d Road


MSE MAE MSE MAE MSE MAE MSE MAE
GCN Kipf and Welling (2017) 0.0558 0.1874 0.0034 0.0249 0.0225 0.1175 0.0169 0.1029
PE-GCN = 0 0.0161 0.0868 0.0032 0.0241 0.0040 0.0432 0.0031 0.0396
PE-GCN = 0.25 0.0155 0.0882 0.0032 0.0236 0.0037 0.0417 0.0032 0.0416
PE-GCN = 0.5 0.0156 0.0885 0.0031 0.0241 0.0036 0.0401 0.0033 0.0421
PE-GCN = 0.75 0.0160 0.0907 0.0031 0.0240 0.0040 0.0429 0.0033 0.0424
GAT Veličković et al. (2018) 0.0558 0.1877 0.0034 0.0249 0.0226 0.1165 0.0178 0.0998
PE-GAT = 0 0.0159 0.0918 0.0032 0.0234 0.0039 0.0429 0.0060 0.0537
PE-GAT = 0.25 0.0161 0.0867 0.0032 0.0235 0.0040 0.0417 0.0058 0.0530
PE-GAT = 0.5 0.0162 0.0897 0.0032 0.0238 0.0045 0.0465 0.0061 0.0548
PE-GAT = 0.75 0.0162 0.0873 0.0032 0.0237 0.0041 0.0429 0.0062 0.0562
GraphSAGE Hamilton et al. (2017) 0.0558 0.1874 0.0034 0.0249 0.0274 0.1326 0.0180 0.0998
PE-GraphSAGE = 0 0.0157 0.0896 0.0032 0.0237 0.0039 0.0428 0.0060 0.0534
PE-GraphSAGE = 0.25 0.0097 0.0664 0.0032 0.0242 0.0040 0.0418 0.0059 0.0534
PE-GraphSAGE = 0.5 0.0100 0.0682 0.0033 0.0239 0.0043 0.0461 0.0060 0.0536
PE-GraphSAGE = 0.75 0.0100 0.0661 0.0032 0.0241 0.0036 0.0399 0.0058 0.0541
KCN Appleby et al. (2020) 0.0292 0.1405 0.0367 0.1875 0.0143 0.0927 0.0081 0.0758
PE-KCN = 0 0.0288 0.1274 0.0598 0.2387 0.0648 0.2385 0.0025 0.0310
PE-KCN = 0.25 0.0324 0.1380 0.0172 0.1246 0.0059 0.0593 0.0037 0.0474
PE-KCN = 0.5 0.0237 0.1117 0.0072 0.0714 0.0077 0.0664 0.0077 0.0642
PE-KCN = 0.75 0.0260 0.1194 0.0063 0.0681 0.0122 0.0852 0.0110 0.0755
Approximate GP 0.0353 0.1382 0.0031 0.0348 0.0481 0.0498 0.0080 0.0657
Exact GP 0.0132 0.0736 0.0022 0.0253 0.0084 0.0458 - -

Table 1: Spatial Interpolation: Test MSE and MAE scores from four different datasets, using four different GNN back-
bones with and without our proposed architecture.

around the globe. For this regression task we seek to pre- 4.2 Experimental setup
dict mean temperatures y from a single node feature x,
mean precipitation, and location c. We compare PE-GNN with four different graph neural
3d Road:The 3d road dataset (Kaul et al., 2013) provides network backbones: The original GCN formulation (Kipf
3-dimensional spatial co-ordinates (latitude, longitude, and and Welling, 2017), graph attention mechanisms (GAT)
altitude) of the road network in Jutland, Denmark. The (Veličković et al., 2018) and GraphSAGE (Hamilton et al.,
dataset comprises over 430, 000 points and can be used for 2017). We also use Kriging Convolutional Networks
interpolating altitude y using only latitude and longitude (KCN) (Appleby et al., 2020), which differs from GCN pri-
coordinates c (no node features x). marily in two ways: it transforms the distance-weighted ad-
jacency matrix A using a Gaussian kernel and adds the out-
Konstantin Klemmer, Nathan Safir, Daniel B. Neill

Model Cali. Housing Election Air Temp.


MSE MAE MSE MAE MSE MAE
GCN 0.0185 0.1006 0.0025 0.0211 0.0225 0.1175
PE-GCN = 0 0.0143 0.0814 0.0026 0.0213 0.0040 0.0432
PE-GCN = 0.25 0.0143 0.0816 0.0026 0.0213 0.0037 0.0417
PE-GCN = 0.5 0.0143 0.0828 0.0027 0.0217 0.0036 0.0401
PE-GCN = 0.75 0.0147 0.0815 0.0027 0.0219 0.0040 0.0429
GAT 0.0183 0.0969 0.0024 0.0211 0.0226 0.1165
PE-GAT = 0 0.0144 0.0836 0.0028 0.0218 0.0039 0.0429
PE-GAT = 0.25 0.0141 0.0817 0.0028 0.0219 0.0040 0.0417
PE-GAT = 0.5 0.0155 0.0851 0.0030 0.0225 0.0045 0.0465
PE-GAT = 0.75 0.0145 0.0824 0.0029 0.0223 0.0041 0.0429
G.SAGE 0.0131 0.0798 0.0007 0.0127 0.0219 0.1153
PE-G.SAGE = 0 0.0099 0.0667 0.0011 0.0154 0.0037 0.0422
PE-G.SAGE = 0.25 0.0098 0.0648 0.0010 0.0152 0.0029 0.0381
PE-G.SAGE = 0.5 0.0098 0.0679 0.0012 0.0157 0.0037 0.0445
PE-G.SAGE = 0.75 0.0114 0.0766 0.0012 0.0152 0.0038 0.0459
KCN 0.0292 0.1405 0.0367 0.1875 0.0143 0.0927
PE-KCN = 0 0.0288 0.1274 0.0598 0.2387 0.0648 0.2385
PE-KCN = 0.25 0.0324 0.1380 0.0172 0.1246 0.0059 0.0593
PE-KCN = 0.5 0.0237 0.1117 0.0072 0.0714 0.0077 0.0664
PE-KCN = 0.75 0.0260 0.1194 0.0063 0.0681 0.0122 0.0852
Approximate GP 0.0195 0.1008 0.0050 0.0371 0.0481 0.0498
Exact GP 0.0036 0.0375 0.0006 0.0139 0.0084 0.0458

Table 2: Spatial Regression: Test MSE and MAE scores from three different datasets, using four different GNN backbones
with and without our proposed architecture.

(a) California Housing.

(b) 3d Road.

Figure 3: MSE bar plots of mean performance and 2 confidence intervals obtained from 10 different training checkpoints.

come variable and features of neighboring points to the fea- To allow for a fair comparison between the different ap-
tures of each node. Test set points can only access neigh- proaches, we equip all models with the same architec-
bors from the training set to extract these features. We com- ture, consisting of two GCN / GAT / GraphSAGE lay-
pare the naive version of all these approaches to the same ers with ReLU activation and dropout, followed by lin-
four backbone architectures augmented with our PE-GNN ear layer regression heads. The KCN model also uses
modules. Beyond GNN-based approaches, we also com- GCN layers, following the author specifications. We found
pare PE-GNN to the most popular method for modeling that adding additional layers to the GNNs did not increase
continuous spatial data: Gaussian processes. For all ap- their capacity for processing raw latitude / longitude co-
proaches, we compare a range of different training settings ordinates. We test four different auxiliary task weights
and hyperparameters, as discussed below. = {0, 0.25.0.5, 0.75}, where = 0 implies no auxiliary
Positional Encoder Graph Neural Networks for Geographic Data

task. Spatial graphs are constructed assuming k = 5 near- compete with Gaussian Processes on simple spatial inter-
est neighbors, following rigorous testing. This also con- polation baselines, though especially exact GPs still some-
firms findings from previous work (Appleby et al., 2020; times have the edge. PE-GNN is substantially more scal-
Jia and Benson, 2020). We include a sensitivity analysis able than exact GPs, which rely on expensive pair-wise dis-
of the k parameter and different batch sizes in our results tance calculations across the full training dataset. Due to
section. Training for the GNN models is conducted using this problem, we do not run an exact GP baseline for the
PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey high-dimensional 3d Road dataset. For KCN models, we
and Lenssen, 2019). We use the Adam algorithm to op- observe a proneness to overfitting. As the authors of KCN
timize our models (Kingma and Ba, 2015) and the mean mention, this effect diminishes in large enough data do-
squared error (MSE) loss. Gaussian process models (ex- mains (Appleby et al., 2020). For example, KCNs are the
act and approximate) are trained using GPyTorch (Gardner best performing method on the 3d Road dataset–by far our
et al., 2018). Due to the size of the dataset, we only pro- largest experimental dataset. Here, we also observe that in
vide an approximate GP result for 3d Road. All training cases when KCN learns well, PE-KCN can still improve
is conducted on single CPU. On the Cali. Housing dataset its performance. The KCN experiments also highlight the
(n > 20, 000) training times for one step (no batched train- strongest effects of the Moran’s I auxiliary tasks: In cases
ing) are as follows: PE-GCN = 0.23s (with aux. task when KCN overfits (Election, Cali. Housing datasets), PE-
0.24s), PE-GAT = 0.38s, PE-GraphSAGE = 0.33s, PE- KCN without auxiliary task ( = 0) is not sufficient to
KCN = 0.41, exact GP = 0.77s. Results are averaged over overcome the problem. However, adding the auxiliary task
100 training steps. The code for PE-GNN and our exper- can mitigate most of the overfitting issue. This directly con-
iments can be accessed here: https://github.com/ firms a theory of Klemmer and Neill (2021) on the benefi-
konstantinklemmer/pe-gnn. cial effects of auxiliary learning of spatial autocorrelation.
Regarding the question of spatial scale, we find no systemic
variation in PE-GNN performance between applications
4.3 Results
with regional (California Housing, 3d Road), continental
(Election) and global (Air Temperature) spatial coverage.
4.3.1 Predictive performance
PE-GNN performance depends on the difficulty of the task
We test our methods on two tasks: Spatial Interpolation, at hand and the complexity of present spatial dependencies.
predicting outcomes from spatial coordinates alone, and We also assess the robustness of PE-GNN training cy-
Spatial Regression, where other node features are available cles. Figure 3 highlights the confidence intervals of PE-
in addition to the latitude / longitude coordinates. The re- GNN models with GCN, GAT and GraphSAGE backbones
sults of our experiments are shown in Table 1 and 2. For all trained on the California Housing and 3d Road datasets,
models, we provide mean squared error (MSE) and mean obtained from 10 different training cycles. We can see
absolute error (MAE) metrics on held-out test data. For the that training runs exhibit only little variability. These find-
spatial interpolation task, we observe that the PE-GNN ap- ings thus confirm that PE-GNN can consistently outper-
proaches consistently and vastly improve performance for form naive GNN baselines.
all four backbone architectures across the California Hous-
ing, Air Temperature and 3d Road datasets and, by a small
margin, for the Election dataset. For the spatial regres-
sion task, we observe that the PE-GNN approaches consis-
tently and substantially improve performance for all four
backbone architectures on the California Housing and Air
Temperature datasets. Performance remains unchanged or
decreases by very small margins in the Election dataset, ex-
cept for the KCN backbone which benefits tremendously
from the PE-GNN approach, particularly with auxiliary
tasks.
Generally, PE-GNN substantially improves over baselines
in regression and interpolation settings. Most of the im-
provement can be attributed to the positional encoder, how-
ever the auxiliary task learning also has substantial benefi-
cial effects in some settings, especially for the KCN mod- Figure 4: Predictive performance of PE-GCN and PE-GAT
els. The best setting for the task weight hyperparameter models on the California Housing dataset, using different
seems to heavily depend on the data, which confirms find- values of k for constructing nearest-neighbor graphs and
ings by Klemmer and Neill (2021). To our knowledge, PE- different batch sizes (bs).
GNN is the first GNN-based learning approach that can
Konstantin Klemmer, Nathan Safir, Daniel B. Neill

4.3.2 Sensitivity analyses with main and aux defining the model noise parameters.
By minimizing this objective, we learn the relative weight
Figure 4 highlights some results from our sensitivity anal- or contribution of main and auxiliary task to the combined
yses with the k and nbatch (batch size) parameters. After loss. The last term of the loss prevents it from moving to-
rigorous testing, we opt for k = 5-NN approach to create wards infinity and acts as a regularizer. While this approach
the spatial graph and compute the shuffled Moran’s I across performs equally compared to a well selected parameter,
all models. We chose nbatch = 2048 for Cali. Housing it eliminates the need to manually tune and select . Figure
and 3d Road datasets and nbatch = 1024 for the Election 5 highlights the learning of main and aux loss weights
and Air Temperature datasets. Note that while our exper- using PE-GCN and the Air Temperature dataset.
iments focus on batched training to highlight the applica-
bility of PE-GNN to high-dimensional geospatial datasets,
we also tested our approach with non-batched training on 5 Conclusion
the smaller datasets (Election, Air Temperature, Califor-
nia Housing). We found only marginal performance differ- With PE-GNN, we introduce a flexible, modular GNN-
ences between these settings. based learning framework for geographic data. PE-GNN
leverages recent findings in embedding spatial context into
neural networks to improve predictive models. Our em-
pirical findings confirm a strong performance. This study
highlights how domain expertise can help improve machine
learning models for applications with distinct characteris-
tics. We hope to build on the foundations of PE-GNN to
develop further methods for geospatial machine learning.

References
Figure 5: Automatic learning of loss weights via task un-
certainty on the Air Temp. dataset with PE-GCN. The Luc Anselin. 1995. Local Indicators of Spatial As-
left graphic shows the training loss (MSE), while the right sociation—LISA. Geographical Analysis 27, 2 (sep
graphic shows the main and auxiliary task weight param- 1995), 93–115. https://doi.org/10.1111/j.
eters main and aux . The training steps are given on the 1538-4632.1995.tb00338.x arXiv:1011.1669
x-axis. Luc Anselin et al. 2001. Spatial econometrics. A compan-
ion to theoretical econometrics 310330 (2001).
4.3.3 Learning auxiliary loss weights using task Gabriel Appleby, Linfeng Liu, and Li Ping Liu. 2020. Krig-
uncertainty ing convolutional networks. In AAAI 2020 - 34th AAAI
Conference on Artificial Intelligence, Vol. 34. AAAI
Lastly, following work by Cipolla et al. (2018) and Klem- press, 3187–3194. https://doi.org/10.1609/
mer and Neill (2021), we provide an intuition for automat- aaai.v34i04.5716
ically selecting the Moran’s I auxiliary task weights using
task uncertainty. This eliminates the need to manually tune Cen Chen, Kenli Li, Sin G. Teo, Xiaofeng Zou, Kang
and select the parameter. The approach first proposed by Wang, Jie Wang, and Zeng Zeng. 2019. Gated residual
Cipolla et al. (2018) formalizes the idea by first defining recurrent graph neural networks for traffic prediction. In
a probabilistic multi-task regression problem with a main 33rd AAAI Conference on Artificial Intelligence, AAAI
and auxiliary task as: 2019, 31st Innovative Applications of Artificial Intelli-
gence Conference, IAAI 2019 and the 9th AAAI Sympo-
sium on Educational Advances in Artificial Intelligence,
p(Ŷmain , Ŷaux |f (X)) = p(Ŷmain |f (X))p(Ŷaux |f (X)) EAAI 2019, Vol. 33. AAAI Press, 485–492. https:
(8) //doi.org/10.1609/aaai.v33i01.3301485
Roberto Cipolla, Yarin Gal, and Alex Kendall. 2018. Multi-
with Ŷmain , Ŷaux giving the main and auxiliary task
task Learning Using Uncertainty to Weigh Losses for
predictions. Following maximum likelihood estima-
Scene Geometry and Semantics. In Proceedings of the
tion, the regression objective function is given as
IEEE Computer Society Conference on Computer Vision
min L( main , aux ):
and Pattern Recognition. https://doi.org/10.
1109/CVPR.2018.00781 arXiv:1705.07115
= log p(Ŷmain , Ŷaux |f (X))
Abhirup Datta, Sudipto Banerjee, Andrew O. Finley,
1 1 and Alan E. Gelfand. 2016. Hierarchical Nearest-
= 2 Lmain + 2 Laux + (9)
2 main 2 aux Neighbor Gaussian Process Models for Large Geosta-
(log main + log aux ), tistical Datasets. J. Amer. Statist. Assoc. 111, 514 (apr
Positional Encoder Graph Neural Networks for Geographic Data

2016), 800–812. https://doi.org/10.1080/ ver, and Koray Kavukcuoglu. 2017. Reinforcement


01621459.2015.1044091 learning with unsupervised auxiliary tasks. In In-
Yongjiu Feng, Lijuan Chen, and Xinjun Chen. 2019. The ternational Conference on Learning Representations
impact of spatial scale on local Moran’s I clustering (ICLR). arXiv:1611.05397 https://youtu.be/
of annual fishing effort for Dosidicus gigas offshore Uz-zGYrYEjA
Peru. Journal of Oceanology and Limnology 37, 1 (jan Junteng Jia and Austion R. Benson. 2020. Residual Cor-
2019), 330–343. https://doi.org/10.1007/ relation in Graph Neural Network Regression. In Pro-
s00343-019-7316-9 ceedings of the ACM SIGKDD International Conference
Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph on Knowledge Discovery and Data Mining. Association
Representation Learning with PyTorch Geometric. (mar for Computing Machinery, New York, NY, USA, 588–
2019). arXiv:1903.02428 http://arxiv.org/ 598. https://doi.org/10.1145/3394486.
abs/1903.02428 3403101 arXiv:2002.08274
Yannis Flet-Berliac and Philippe Preux. 2019. Manohar Kaul, Bin Yang, and Christian S. Jensen. 2013.
MERL: Multi-Head Reinforcement Learning. Building accurate 3D spatial networks to enable next
In NeurIPS 2019 - Deep Reinforcement Learn- generation intelligent transportation systems. In Pro-
ing Workshop. arXiv:1909.11939 http: ceedings - IEEE International Conference on Mo-
//arxiv.org/abs/1909.11939 bile Data Management. https://doi.org/10.
1109/MDM.2013.24
Yanjie Fu, Pengyang Wang, Jiadi Du, Le Wu, and Xiaolin
Li. 2019. Efficient region embedding with multi-view R. Kelley Pace and Ronald Barry. 2003. Sparse spatial au-
spatial networks: A perspective of locality-constrained toregressions. Statistics & Probability Letters 33, 3 (may
spatial autocorrelations. In 33rd AAAI Conference on Ar- 2003), 291–297. https://doi.org/10.1016/
tificial Intelligence, AAAI 2019, 31st Innovative Appli- s0167-7152(96)00140-x
cations of Artificial Intelligence Conference, IAAI 2019 Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: A
and the 9th AAAI Symposium on Educational Advances method for stochastic optimization. In 3rd International
in Artificial Intelligence, EAAI 2019, Vol. 33. AAAI Conference on Learning Representations, ICLR 2015 -
Press, 906–913. https://doi.org/10.1609/ Conference Track Proceedings. arXiv:1412.6980
aaai.v33i01.3301906 Thomas N. Kipf and Max Welling. 2017. Semi-supervised
Jacob R. Gardner, Geoff Pleiss, David Bindel, Kil- classification with graph convolutional networks. In
ian Q. Weinberger, and Andrew Gordon Wilson. 5th International Conference on Learning Represen-
2018. GPyTorch: Blackbox Matrix-Matrix Gaus- tations, ICLR 2017 - Conference Track Proceedings.
sian Process Inference with GPU Acceleration. In International Conference on Learning Representations,
Advances in Neural Information Processing Systems ICLR. arXiv:1609.02907 http://arxiv.org/
(NeurIPS). arXiv:1809.11165 http://arxiv.org/ abs/1609.02907
abs/1809.11165 Konstantin Klemmer, Adriano Koshiyama, and Sebas-
William L. Hamilton, Rex Ying, and Jure Leskovec. tian Flennerhag. 2019. Augmenting correlation struc-
2017. Inductive representation learning on large graphs. tures in spatial data using deep generative models.
In Advances in Neural Information Processing Sys- arXiv:1905.09796 (2019). arXiv:1905.09796 http:
tems, Vol. 2017-Decem. Neural information process- //arxiv.org/abs/1905.09796
ing systems foundation, 1025–1035. arXiv:1706.02216 Konstantin Klemmer and Daniel B. Neill. 2021. Auxiliary-
http://arxiv.org/abs/1706.02216 task learning for geographic data with autoregressive
Josh Hooker, Gregory Duveiller, and Alessandro Cescatti. embeddings. In SIGSPATIAL: Proceedings of the ACM
2018. Data descriptor: A global dataset of air temper- International Symposium on Advances in Geographic
ature derived from satellite remote sensing and weather Information Systems.
stations. Scientific Data 5, 1 (nov 2018), 1–11. https: Konstantin Klemmer, Tianlin Xu, Beatrice Acciaio, and
//doi.org/10.1038/sdata.2018.246 Daniel B. Neill. 2022. SPATE-GAN: Improved Gen-
Yuenan Hou, Zheng Ma, Chunxiao Liu, and Chen Change erative Modeling of Dynamic Spatio-Temporal Pat-
Loy. 2019. Learning to Steer by Mimicking Features terns with an Autoregressive Embedding Loss. In AAAI
from Heterogeneous Auxiliary Networks. Proceedings 2022 - 36th AAAI Conference on Artificial Intelligence.
of the AAAI Conference on Artificial Intelligence 33, arXiv:2109.15044v1
01 (jul 2019), 8433–8440. https://doi.org/10. Gengchen Mai, Krzysztof Janowicz, Ling Cai, Rui Zhu,
1609/aaai.v33i01.33018433 arXiv:1811.02759 Blake Regalia, Bo Yan, Meilin Shi, and Ni Lao. 2020a.
Max Jaderberg, Volodymyr Mnih, Wojciech Marian SE-KGE: A location-aware Knowledge Graph Embed-
Czarnecki, Tom Schaul, Joel Z Leibo, David Sil- ding model for Geographic Question Answering and
Konstantin Klemmer, Nathan Safir, Daniel B. Neill

Spatial Semantic Lifting. Transactions in GIS 24 (6 geolocating social network users. In Lecture Notes
2020), 623–655. Issue 3. https://doi.org/10. in Computer Science (including subseries Lecture
1111/TGIS.12629 Notes in Artificial Intelligence and Lecture Notes
Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, in Bioinformatics). Vol. 10234 LNAI. Springer Ver-
Ling Cai, and Ni Lao. 2020b. Multi-Scale Represen- lag, 599–611. https://doi.org/10.1007/
tation Learning for Spatial Feature Distributions using 978-3-319-57454-7_47
Grid Cells. In International Conference on Learning Rene Westerholt, Bernd Resch, Franz Benjamin Mocnik,
Representations (ICLR). arXiv:2003.00824 http:// and Dirk Hoffmeister. 2018. A statistical test on the
arxiv.org/abs/2003.00824 local effects of spatially structured variance. Interna-
Yan Meng, Chao Lin, Weihong Cui, and Jian Yao. 2014. tional Journal of Geographical Information Science 32,
Scale selection based on Moran’s i for segmentation 3 (mar 2018), 571–600. https://doi.org/10.
of high resolution remotely sensed images. In Inter- 1080/13658816.2017.1402914
national Geoscience and Remote Sensing Symposium Yifang Yin, Zhenguang Liu, Ying Zhang, Sheng Wang,
(IGARSS). Institute of Electrical and Electronics Engi- Rajiv Ratn Shah, and Roger Zimmermann. 2019.
neers Inc., 4895–4898. https://doi.org/10. GPS2Vec: Towards generating worldwide GPS embed-
1109/IGARSS.2014.6947592 dings. In SIGSPATIAL: Proceedings of the ACM In-
J. Keith Ord and Arthur Getis. 2012. Local spatial het- ternational Symposium on Advances in Geographic In-
eroscedasticity (LOSH). Annals of Regional Science 48, formation Systems. Association for Computing Machin-
2 (apr 2012), 529–539. https://doi.org/10. ery, New York, NY, USA, 416–419. https://doi.
1007/s00168-011-0492-y org/10.1145/3347146.3359067
Di Zhu, Fan Zhang, Shengyin Wang, Yaoli Wang, Xi-
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
meng Cheng, Zhou Huang, and Yu Liu. 2020. Under-
James Bradbury, Gregory Chanan, Trevor Killeen, Zem-
standing Place Characteristics in Geographic Contexts
ing Lin, Natalia Gimelshein, Luca Antiga, Alban
through Graph Convolutional Neural Networks. Annals
Desmaison, Andreas Köpf, Edward Yang, Zach De-
of the American Association of Geographers 110, 2 (mar
Vito, Martin Raison, Alykhan Tejani, Sasank Chil-
2020), 408–420. https://doi.org/10.1080/
amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
24694452.2019.1694403
Soumith Chintala. 2019. PyTorch: An imperative style,
high-performance deep learning library. In Advances
in Neural Information Processing Systems, Vol. 32.
arXiv:1912.01703
S. C. Suddarth and Y. L. Kergosien. 1990. Rule-injection
hints as a means of improving network performance and
learning time. In Lecture Notes in Computer Science (in-
cluding subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), Vol. 412 LNCS.
Springer Verlag, 120–129. https://doi.org/10.
1007/3-540-52255-7_33
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz
Kaiser, and Illia Polosukhin. 2017. Attention is
all you need. In Advances in Neural Information
Processing Systems, Vol. 2017-Decem. 5999–6009.
arXiv:1706.03762 https://research.google/
pubs/pub46201/
Petar Veličković, Arantxa Casanova, Pietro Liò, Guillem
Cucurull, Adriana Romero, and Yoshua Bengio. 2018.
Graph attention networks. In 6th International Confer-
ence on Learning Representations, ICLR 2018 - Con-
ference Track Proceedings. International Conference on
Learning Representations, ICLR. arXiv:1710.10903
https://arxiv.org/abs/1710.10903v3
Fengjiao Wang, Chun Ta Lu, Yongzhi Qu, and Philip S.
Yu. 2017. Collective geographical embedding for

You might also like