0% found this document useful (0 votes)
3 views14 pages

A Physics-Informed and Attention-Based Graph Learn

This document presents a novel approach called PAG for predicting electric vehicle (EV) charging demand by integrating physics-informed meta-learning and attention-based graph learning techniques. The proposed model addresses challenges in feature extraction and misinterpretation of data, achieving state-of-the-art forecasting performance on a dataset from Shenzhen, China. Results indicate that PAG reduces forecasting errors and accurately reflects the relationship between charging demand and pricing fluctuations.

Uploaded by

ashadoit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

A Physics-Informed and Attention-Based Graph Learn

This document presents a novel approach called PAG for predicting electric vehicle (EV) charging demand by integrating physics-informed meta-learning and attention-based graph learning techniques. The proposed model addresses challenges in feature extraction and misinterpretation of data, achieving state-of-the-art forecasting performance on a dataset from Shenzhen, China. Results indicate that PAG reduces forecasting errors and accurately reflects the relationship between charging demand and pricing fluctuations.

Uploaded by

ashadoit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/373837842

A physics-informed and attention-based graph learning approach for regional


electric vehicle charging demand prediction

Preprint · September 2023


DOI: 10.48550/arXiv.2309.05259

CITATIONS READS

0 145

4 authors, including:

Haohao Qu Linlin You


The Hong Kong Polytechnic University 57 PUBLICATIONS 462 CITATIONS
18 PUBLICATIONS 82 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Haohao Qu on 15 September 2023.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 1

A physics-informed and attention-based graph


learning approach for regional electric vehicle
charging demand prediction
Haohao Qu, Haoxuan Kuang, Jun Li, Linlin You∗ , Member, IEEE,

Abstract—Along with the proliferation of electric vehicles 35 1.12

Charging price (CNY/lWh)


Number of in-use piles
(EVs), optimizing the use of EV charging space can signifi- 1.10
arXiv:2309.05259v1 [[Link]] 11 Sep 2023

30
cantly alleviate the growing load on intelligent transportation 1.08
systems. As the foundation to achieve such an optimization, a 25
1.06
spatiotemporal method for EV charging demand prediction in 20 1.04
urban areas is required. Although several solutions have been 15 1.02
proposed by using data-driven deep learning methods, it can be
10 1.00
found that these performance-oriented methods may suffer from
0.98
misinterpretations to correctly handle the reverse relationship Mon. Tue. Wed. Thu. Fri. Sat. Sun.
between charging demands and prices. To tackle the emerging Time (June 19-26, 2022)
challenges of training an accurate and interpretable prediction
model, this paper proposes a novel approach that enables
the integration of graph and temporal attention mechanisms
for feature extraction and the usage of physic-informed meta-
learning in the model pre-training step for knowledge transfer. Data-driven
Evaluation results on a dataset of 18,013 EV charging piles Model
in Shenzhen, China, show that the proposed approach, named Physical/Economic
PAG, can achieve state-of-the-art forecasting performance and Laws
the ability in understanding the adaptive changes in charging
demands caused by price fluctuations.
“the higher the price, the “the higher the price, the
Index Terms—Electric vehicle charging, spatio-temporal pre-
diction, graph attention networks, meta-learning greater the demand” lower the demand”

Fig. 1. Misinterpretation of EV charging demand caused by price fluctuations.


I. I NTRODUCTION

D RIVEN by the public concern about climate change and


the support of government policy, the global sales of
electric vehicles (EVs) have kept rising strongly over the
areas (i.e., zones). Such that, two challenges are emerging.
First, compared to conventional methods, feature extraction
past few years [1]. According to the International Energy capabilities shall be enhanced to capture not only temporal
Agency (IEA), global public spending on EV subsidies and patterns hidden in the time series data but also spillover ef-
incentives doubled to nearly USD 30 billion in 2021 [2]. fects propagating among neighboring areas. More importantly,
Such rapid growth not only resulted in a net reduction of existing data-driven models, which mainly focus on reducing
40 million tons of carbon dioxide but also led to increased forecasting errors, suffer from misinterpretations as illustrated
loads on urban transportation systems [3]. In this context, in Fig. 1, where a higher price may be interpreted by the model
various EV charging-related smart services are developed and to have a higher demand, as the partially observed data without
applied to facilitate the development of Intelligent Transporta- the knowledge presented by the common physical/economic
tion Systems (ITS), e.g., with accurate regional EV charging laws are used. Even though such data-driven models can still
demand prediction, proper parking recommendations can be perform well in most cases, the misinterpretation has fatal
provided to en-route drivers to mitigate mileage anxiety [4], flaws to support decision-making processes, which shall be
and dynamic pricing schemes can be applied by regulators to causally reflecting the fact that ”a higher price will impact the
improve energy efficiency [5]. overall demand”.
With the development of urban road networks, spatiotem- In order to make full use of spatial and temporal data,
poral EV charging demand prediction is of growing interest various studies are proposed to design lower error models by
to accommodate the increasing connectivity among urban integrating graph convolutional networks (GCNs) with recur-
H. Qu, H. Kuang, J. Li, and L. You are with the School of Intelligent rent neural networks (RNNs) [6], [7], but they are criticized
Systems Engineering, Sun Yat-Sen University, Guangzhou, 510006, CN. H. for the inflexibility and ambiguity in weight assignment due
Qu is also wih the Department of Computing, The Hong Kong Polytechnic to the limitation within convolutional and recurrent operations.
University, Hong Kong, 100872, CN.
Recently, a more advanced technique, called Attention Mech-
*Corresponding author, e-mail: youllin@[Link] anism, [8], is widely discussed and utilized to enhance such
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 2

integration with a stronger feature representation capability mand prediction and analysis, several challenges are emerging,
[9], [10]. However, the corresponding network can still not which include:
cover the dependencies and interactions between EV charging- • Feature extraction: EV charging stations are scattered
related factors. As for misinterpretations, Physics-informed all over the city and correlated with each other, as
Neural Networks (PINNs) [11] and Meta-learning [12] are the temporal demands can only be fulfilled once and
two feasible solutions. Specifically, PINNs can encode explicit the price and location differences may influence their
laws of physics, while meta-learning can facilitate the transfor- utilization. By considering charging stations as nodes,
mation of knowledge learned from related domains. However, their relationships (such as distances) as edges, time-
the approach with PINN, Meta-learning, GCN, and attention varying demand as sequences, and influencing factors as
mechanism integrated is still missing to support accurate and features, how to capture the hidden patterns in such a
unbiased knowledge learning. high-dimensional tensor with heterogeneous components
To fill the gap, this paper proposes a physics-informed and is the key issue in the EV charging demand prediction.
attention-based graph learning approach for the prediction of • Biased information: The data-driven models for EV
EV charging demand, called PAG, which combines Physics- charging demand prediction can suffer from an intrinsic
Informed Meta-Learning (PIML), Graph Attentional Network problem of misinterpretation, i.e., ”the higher the price
(GAT), and Temporal Pattern Attention (TPA) for model pre- the higher the occupancy” as illustrated by Fig. 1, which
training, graph embedding, and multivariate temporal decod- is caused by the partially observed data associated with
ing, respectively. Specifically, on the one hand, the typical mass-deployed pricing strategies of reducing peak charg-
convolutional and recursive operations are replaced by the ing demands by raising prices. In the case that unbiased
two attention mechanisms to enhance the flexibility and inter- data is hard to be obtained, how to counterbalance the
pretability of the prediction model. On the other hand, the pre- influence of biased information is an important problem
training step (i.e., PIML) processes tuning samples generated to ensure the correctness of forecasting results.
according to predefined physic/ economic laws and the ob- • Knowledge learning: The relationship between EV charg-
served samples together to extract common knowledge through ing demand and related factors vary across time and place.
meta-learning with potential misinterpretations remedied. When introducing different physical and economic laws
Moreover, through a rigorous evaluation based on a real- to equip data-driven models [13], a unified and scalable
world dataset with samples of 18,013 EV charging piles in method is required to learn the knowledge fairly from
Shenzhen, China, from 19 June to 18 July 2022 (30 days), observed and tuning samples (generated from physical
the efficiency and effectiveness of the proposed model are and economic laws).
demonstrated. Specifically, as shown by the results, PAG can
reduce forecasting errors by about 6.57% in four metrics, and
compared to other state-of-the-art baselines, it can also provide B. Related solutions
correct responses to price fluctuations. To tackle the aforementioned challenges, several solutions
In summary, the main contributions of this paper include: are proposed. Initially, to address graph embedding and mul-
1) An effective prediction model combing two attention tivariate time-series forecasting, scholars mainly focus on
mechanisms, i.e., GAT and TPA, is designed for EV statistic reasoning of model graph structures and infer future
charging demand prediction. statuses, separately [14], [15]. For instance, a spatiotemporal
2) A informed model pre-training method, called PIML, model is proposed [16] to evaluate the impact of large-
is proposed to tackle the intrinsic misinterpretations of scale deployment of plug-in electric vehicles (EVs) on urban
conventional data-driven methods to mine the correct distribution networks. Although these statistical reasoning-
relationship between the charging demands and prices. based models can be computationally efficient and easy to
3) An integration of GAT, TPA, and PIML enables knowl- be interpreted, there is still a limitation in expressing high-
edge adaptation and propagation to build an outstanding dimensional and non-linear features.
predictor with high accuracy and correct interpretation. More recently, with the rapid development of deep learning
The remainder of this paper is structured as follows. Section techniques, the integration of Graph Neural Networks (GNNs)
II summarizes related challenges and solutions about EV and Recurrent Neural Networks (RNNs) is emphasized to
charging demand prediction. Then, Section III introduces the enable accurate spatiotemporal prediction in transportation
proposed approach PAG, which is evaluated in Section IV. systems, such as parking occupancy [17], traffic speed [18],
Finally, conclusions and future works are drawn in Section V. flow [19] and also EV charging demand predictions [20],
[21], e.g., SGCN [22] combines graph convolutional network
II. L ITERATURE REVIEW (GCN) and gated recurrent unit (GRU) to extract spatial and
temporal features of the operating status at EV charging sta-
In this section, related challenges together with solutions
tions, respectively to better assist the prediction. However, the
for EV charging demand prediction are summarized.
typical convolutional and recurrent structures can be crippled
by their inflexibility and ambiguity in weight assignment.
A. Emerging challenges Later on, a more advanced technique, known as attention
In general, to create an efficient and effective model with mechanism [8], and its variants [9], [10] pose the potential
temporospatial information embedded for EV charging de- in refining the typical network structures, due to their strong
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 3

feature representation capabilities. Solutions in spatiotemporal A. Problem definition


prediction include AST-GAT [23] and STANN [24]. Both For consistency, the paper uses the lowercase letters (e.g.,
of them use spatial and temporal attention mechanisms to 𝑥) to represent scalars, bold lowercase/Greek letters with
assign appropriate weights to the dependencies between road the upper arrow (e.g., x and 𝛼 ® ) to denote column vectors,
segments and time steps. Nevertheless, a similar integration bold-face upper case letters (e.g., X) to denote matrices and
of spatiotemporal attention is still missing to capture the uppercase calligraphic letters (e.g., X) to denote sets or high-
interaction of EV charging-related factors. order tensors.
More importantly, although these performance-oriented Moreover, given a matrix X, X(𝑖, :) and X(:, 𝑗) represent its
deep learning methods have made significant contributions 𝑖 𝑡 ℎ row and 𝑗 𝑡 ℎ column, respectively, and the (𝑖 𝑡 ℎ , 𝑗 𝑡 ℎ ) entry of
to the prediction tasks in intelligent transportation systems, X can be denoted as X(𝑖, 𝑗). XT and xT are used to represent
they still have key weaknesses in addressing misinterpretations the transposes of X and x, respectively. Besides, R is used to
while full samples can not be sensed. To address such a represent the data space, e.g., X ∈ R 𝑀 × 𝑁 means that X has
problem, Physic-Informed Neural Networks (PINNs) [11] are two dimensions of 𝑀 and 𝑁. The element-wise product of x
proposed and discussed, which can regulate model training and y is represented as x ⊗ y, and their concatenation is noted
processes by introducing an extra loss based on given physical as x∥y.
laws, e.g., an accurate model for urban traffic state estimation In this paper, the EV charging demands are distributed
can be trained with a hybrid loss function that can measure not spatiotemporally in the urban area, and the demands are
only the spatiotemporal dependence but also the internal law not only directly related to past charging demands but also
of traffic flow propagation [13]. However, knowledge learning indirectly influenced by other demand-sensitive features, e.g.,
within PINNs remains challenging, especially when the laws charging prices. Therefore, the problem can be defined as
can not be explicitly expressed. a multivariate spatiotemporal forecasting problem with the
As a potential solution, meta-learning [12] starts to be following settings:
utilized, which can transfer knowledge from other domains 1) Data structure: The studied urban area can be structured
and migrate gradients to update the global model, making as a graph, which has 𝑁 zones as its nodes and the
the learning process smoother [25]. Taking a study on urban links of their centroids as edges. Accordingly, a three-
traffic dynamics prediction as an example [19], it shows that dimension tensor X is used to manage the data, e.g.,
even a simple integration of Convolutional Neural Network X(𝑖, 𝑗, 𝑡) represents 𝑗 𝑡 ℎ feature for 𝑖 𝑡 ℎ node at time 𝑡;
(CNN) and Long Short-Term Memory (LSTM) can achieve 2) Time-series input: To support the training of data-driven
state-of-the-art performance after being pre-trained by meta- models, the sliced data with a window size 𝑤 are used,
learning. This advanced technique provides inspiration to which is denoted as {[x𝑡 −𝑤 , x𝑡 −𝑤+1 , ...x𝑡 −1 };
address misinterpretations in EV charging demand prediction, 3) Task objective: It is to predict the future distribution
but its limitation is the need to acquire representative data for of EV charging demand, i.e., y = X(:, 𝑜, 𝑡 + Δ) = o𝑡+Δ ,
pre-training, which is expensive and laborious. where Δ is a fixed time interval for forecasting, and o𝑡+Δ
In summary, even though the combination of GNNs and represents charging demands for all studied traffic zones
RNNs makes significant achievements in prediction tasks to at time (𝑡 + Δ).
assist intelligent transportation systems, the introduction of
attention-based mechanisms provides additional capabilities in B. Graph Embedding Module
improving feature representation. Moreover, PINNs and meta-
To better extract the internal and external relationships
learning can enable knowledge transfer to remedy the impact
among node features, the graph embedding module is designed
of misinterpretation, however, how to balance the uncertainty
with two networks, i.e., CNN and GAT. First, CNN is used
and the learning cost is still challenging. Therefore, this paper
to extract the temporal features by employing a 2-dimensional
proposes a novel approach, called PAG, which integrates GAT,
convolution kernel [26], whose width is equal to the number of
TPA, and PIML to build an accurate and robust model for EV
features in each node. Note that its height and stride are two
charging demand prediction.
hyperparameters (by default 2 and 1, respectively). Second,
to model the spatial relationships among studied zones (i.e.,
III. P ROPOSED APPROACH districts), GAT with a masked multi-head attention mechanism
is used to strengthen the extraction of neighbor features.
As illustrated in Fig. 2, the proposed approach consists
As for the propagation process of each GAT layer, the mod-
of three modules, namely a) a graph embedding module,
ule processes the node features (i.e., X = {x1 , x2 , ..., x 𝑁 }, x𝑖 ∈
which transfers spatial relationships between nodes to dense
R𝐹 , 𝑁 is the number of nodes, and 𝐹 is the feature number of
vectors based on GAT; b) a multivariate decoder module, ′
each node) for the output (i.e., X′ = {x1′ , x2′ , ..., x′𝑁 }, x𝑖′ ∈ R𝐹 )
which processes hop-wise dense vectors by a neural network
in three steps, namely:
of TPA-LSTM for the predicted value; and c) a model pre-
Step 1: For node 𝑖, its similarity with each neighbor 𝑗 ∈ N𝑖
training module, which optimizes the network parameters
(neighors for node 𝑖 in the graph) is calculated according to
to avoid misinterpretations by applying a law-based pseudo-
Formula (1).
sampling and meta-learning strategy. In the following sections,
the prediction problem will be first defined, and then, the
details of each module will be described. e𝑖 𝑗 = 𝑎( [Wx𝑖 ∥ W, x 𝑗 ]) = LeakyReLU(®aT [Wx𝑖 ∥ x 𝑗 ]) (1)
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 4

: Predicted Value

a. -hop (1 − ) b. MLP
c.
Mul�-head GAT
TPA
stack Network
Learnable …
Networks 1-hop Conv2d Params
(1 − ) ℎ

Mul�-head GAT {ℎ1 , … , ℎ Gradients
−1 }
local Pre-training (Parallel)
Conv2d LSTM

Mix
High-
dimensional … …
′′ ′′ ′′ ′′ …
Features 1 2 3 1 2 2

Concatenate Tuning Sample


Genera�on
… …
Raw Data 1 2 3 1 2 3 …
Occupancy Graph Others (e.g., Price) Laws

Fig. 2. Overall structure of the proposed approach, which consists of (a). Graph Embedding Module, (b). Multivariate Decoder Module, and (c). Model
Pre-training Module.

where e𝑖 𝑗 represents the similarity between node 𝑖 and 𝑗; ∥ residual connection [29], [30] is introduced. Specifically, as

is the concatenation operation; W ∈ R𝐹 ×𝐹 denotes a shared for node 𝑖, the final output of the graph embedding module
linear transformation that changes the number of input feature 𝑥𝑖′′ can be calculated according to Formula (4).
′ ′
from 𝐹 to 𝐹 ′ for each node; and 𝑎 : R𝐹 × R𝐹 → R
xi ′′ = ∥ 𝑚=1
𝑀
[(1 − 𝛽)x𝑖′𝑚 + 𝛽x𝑖′𝑚−1 ], x𝑖′0 = x𝑖 (4)
represents a shared attention mechanism, which applies the
LeakyReLU function [27] to map the concatenated high- where 𝑥𝑖′′
has 𝑀 𝐹 features and 𝛽 is the residual coefficient.

dimensional features with a weight vector a® ∈ R2𝐹 to generate Note that due to the data frame constraint of residual connec-
the similarity score. tion, the number of features 𝐹 and 𝐹 ′ before and after GAT
Step 2: For node 𝑖, its attention coefficient for each neighbor are set to be the same.
𝑗 ∈ N𝑖 is calculated according to Formula (2).
C. Multivariate Decoder Module
exp(e𝑖 𝑗 )
𝛼𝑖 𝑗 = softmax 𝑗 (e𝑖 𝑗 ) = Í (2) It adopts TPA-LSTM [9] as the decoder to learn the mul-
𝑛∈ N𝑖 exp(e𝑖𝑛 ) ′′ ,
tivariate time series patterns from the prepared features x𝑖,𝑡
where 𝛼𝑖 𝑗 represents the attention coefficient of node 𝑖 and which represents the features of node 𝑖 at time 𝑡. Note that
𝑗. The attention coefficient indicates the feature importance since there is not any node-wise computation in the decoder
module, x𝑖,𝑡′′ is simplified as x in this section.
of node 𝑗 to node 𝑖 and the softmax function [28] is used to 𝑡
normalize it. Within the decoder, as shown in Fig. 2 (b), first, a Long
Step 3: For node 𝑖, 𝐾 independent attention mechanisms Short-Term Memory (LSTM) layer [31] is used to extract
are performed and averaged to generate x𝑖′ the output of GAT temporal patterns. Specifically, given an input x𝑡 ∈ R 𝑀𝐹 , the
layer according to Formula (3). hidden state ht and the cell state ct at time 𝑡 can be computed
according to Formula (5).
𝐾
1 ∑︁ ∑︁ 𝑘 𝑘
x𝑖′ = 𝜎( 𝛼 W x𝑗) (3) u𝑡 = 𝜎(W𝑢𝑢 x𝑡 + 𝑏 𝑢𝑢 + Wℎ𝑢 h𝑡 −1 + 𝑏 ℎ𝑢 )
𝐾 𝑘=1 𝑗 ∈ N 𝑖 𝑗
𝑖 f𝑡 = 𝜎(W𝑢 𝑓 x𝑡 + 𝑏 𝑢 𝑓 + Wℎ 𝑓 h𝑡 −1 + 𝑏 ℎ 𝑓 )
where 𝜎 is the activation function (e.g., Sigmoid or ReLU). g𝑡 = tanh(W𝑢𝑔 x𝑡 + 𝑏 𝑢𝑔 + Wℎ𝑔 h𝑡 −1 + 𝑏 ℎ𝑔 )
(5)
As shown in Fig. 2 (a), by stacking multiple GAT layers, q𝑡 = 𝜎(W𝑢𝑞 x𝑡 + 𝑏 𝑢𝑞 + Wℎ𝑞 h𝑡 −1 + 𝑏 ℎ𝑞 )
the spatial information of nodes (i.e., zones in the area) can
c𝑡 = f𝑡 ⊙ c𝑡 −1 + u𝑡 ⊙ g𝑡
be captured and propagated among their neighbors, e.g., if 𝑀
GAT layers are used, each node can obtain the knowledge h𝑡 = q𝑡 ⊙ tanh(c𝑡 )
within 𝑀-hop neighbors. Moreover, to tackle the over-smooth where h𝑡 −1 is the hidden state of the LSTM layer at time 𝑡 − 1
problem caused by the stacked GAT layers, a momentum or the initial hidden state at time 0; u𝑡 , f𝑡 , g𝑡 , and q𝑡 are the
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 5

𝐡𝑡−𝑤 𝐡𝑡−𝑤+1 𝐡𝑡−1 𝑀 CNN Filters

𝐶
ℎ1,1 𝐶
ℎ1,2 … 𝐶
ℎ1,𝑀 𝛼Ԧ1

𝐶
ℎ2,1 𝐶
ℎ2,2 … 𝐶
ℎ2,𝑀 𝛼Ԧ2
𝑀-hop Scoring
Feature 𝐇𝐶 𝐖𝛼


… Function


Maps …
𝐶
ℎ𝑀,1 𝐶
ℎ𝑀,2 … 𝐶
ℎ𝑀,𝑀 𝛼Ԧ𝑀

𝐡𝑃𝑡 ℎ1𝑃 ℎ2𝑃 … 𝑃


ℎ𝑀 𝐖𝑃

Average Pooling
LSTM … 𝐡𝑡

Fig. 3. Calculation process of the multivariate temporal decoder module.

input, forget, cell and output gates, respectively; W and 𝑏 are Physics/Economics Laws
Pre-Training
weights and biases; 𝜎 is the sigmoid function, and ⊙ is the Fine-tuning
Hadamard product. Assuming that h𝑡 ∈ R 𝑀 𝐹 , then u𝑡 , f𝑡 , g𝑡 , Random Init
q𝑡 , and c𝑡 ∈ R 𝑀 𝐹 ; W𝑢𝑢 , W𝑢 𝑓 , W𝑢𝑔 , W𝑢𝑞 ∈ R 𝑀 𝐹 ×𝑀𝐹 ; Wℎ𝑢 , Op�mal Init
Wℎ 𝑓 , Wℎ𝑔 and Wℎ𝑞 ∈ R 𝑀 𝐹 ×𝑀 𝐹 . Well-Trained
Second, by applying 𝑀 filters of CNN C𝑚 ∈
R𝐹 ×𝑤 with a stride length of 𝐹 on the hidden states
{h𝑡 −𝑤 , h𝑡 −𝑤+1 , ..., h𝑡 −1 }, and an average pooling P 𝛼 ∈ R𝐹 ×1
with a stride length of 𝐹 on h𝑡 , the computational process of
TPA can be revised to create attention on both hop-wise and
sequence-wise features as illustrated in Fig. 3. Mathematically,
𝑚 𝑡 ℎ attention score can be calculated according to Formula (6).
Fig. 4. Idea behind the model pre-training framework of Physics-Informed
Meta-Learning.
® 𝑚 = 𝜎(PT𝛼 h𝑡 ⊙ ∥ 𝑚=1
𝛼 𝑀
[CT𝑚 H𝑚 ]) = 𝜎(h𝑡𝑃 ⊙ ∥ 𝑚=1
𝑀
[ℎ𝐶
𝑚,𝑚 ]) (6)

where 𝜎 is the sigmoid activation function instead of softmax,


because it is expected to have more than one variable to sup- experienced, the trained model will be most likely close to
port the forecasting; H𝑚 represents the features with respect the edge of “Laws” (i.e., the blue area in the figure) instead
to 𝑚 𝑡 ℎ hop; h𝑡𝑃 and ℎ𝐶 of the right central. Such misinterpretations are difficult to
𝑚,𝑚 are the feature vector and the 𝑚 𝑡 ℎ
scalar after the pooling and convolution calculation P 𝛼 and be spotted because these local optima usually perform well
C𝑚 , respectively. In total, there are 𝑀 × 𝑀 attention scores, in most cases. However, misinterpretations in Deep Neural
® 𝑚 ∈ R 𝑀 , 𝑚 = 1, 2, ..., 𝑀.
𝛼 Networks (DNNs) can lead to fatal errors, i.e., in this study, “a
Finally, two linear transformation layers denoted as W 𝑃 higher price can bring higher charging demands”. To address
and W 𝛼 ∈ R 𝑀 , respectively, are introduced to integrate the such misinterpretations, a scalable model pre-training strategy
information for the prediction value 𝑦 of each node as defined is proposed by combining Physics-informed Neural Networks
in Formula (7). and Meta-learning.
In general, the key idea is to generate multiple groups of
y = WT𝑝 (W 𝛼 H𝐶 𝛼 + h𝑡𝑃 ) (7) tuning samples based on physical or economic laws, e.g., a
consistent price increase will lead to a demand decrease, and
where H is the hidden feature map from time (𝑡 − 𝑤) to (𝑡 − 1)
then use them to pre-train the model based on meta-learning
and 𝛼 ∈ R 𝑀 × 𝑀 is their attention scores.
methods, e.g., First-order Model-agnostic Meta-learning (FO-
MAML). To address the misinterpretation of the charging price
D. Model Pre-training Module 𝑝, a set of normally distributed impulses Δ𝑝 is generated,
As illustrated in Fig. 4, the idea behind the model pre- and then corresponding responses Δ𝑦 are calculated based
training module is explained. Assume that a neural network is on the physical or economic laws F . For example, accord-
trained to detect the position of a ball by throwing it randomly. ing to the recent investigation [32], the price elasticity of
If the observation is not enough or not all situations are demand for EV public charging in Beijing (the capital city
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 6

of China) is approximately −1.48, which can be formulated Therefore, the gradient function of FOMAML can be de-
by (Δ𝑦/𝑦)/(Δ𝑝/𝑝) = −1.48. Given this, the tuning samples fined in Formula (13).
of node 𝑖 can be generated according to Formula (8).
𝑆 𝑆
∑︁ 𝜕𝑙 𝑠 ∑︁
∇ 𝜙 𝐿 (𝜙) ≈ = ∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 ) (13)
Δ𝑦 𝑖 = F (Δ𝑝 𝑖 ) = −1.48(Δ𝑝 𝑖 /𝑝 𝑖 )𝑦 𝑖 (8) 𝜕𝜃 𝑠
𝑠=1 𝑠=1

Furthermore, the spillover effect between adjacent areas On this basis, the pre-training module updates 𝜙𝑒 (the initial
caused by price fluctuations needs to be addressed as well. parameter of the 𝑒 𝑡 ℎ epoch) to 𝜙𝑒+1 (the initial parameter of
In general, the fluctuation may impact self and neighboring the next epoch). The samples in buffers are divided into two
charging demands differently (i.e., the continuous or intolera- parts, i.e., Support and Query sets, for obtaining intermediate
ble increase of charging prices leads to self-demand decrease parameters (𝜃 1 , . . . , 𝜃 𝑆 ) and the insightful gradients ∇ 𝜙 𝐿(𝜙),
but neighboring demand increase). Therefore, assuming that respectively. Finally, 𝜙𝑒+1 is updated according to Formula
there are 𝑁 neighbors, the tuning sample of neighbor 𝑗 can (14), where 𝜆 is the learning rate.
be generated according to Formula (9).
∇ 𝜙 𝐿(𝜙)
1 𝜙𝑒+1 = 𝜙𝑒 − 𝜆 (14)
Δ𝑦 𝑗 = − Δ𝑦 𝑖 (9) 𝑆
𝑁
In general, the pre-training module can be invoked in every
The two kinds of tuning samples can be merged to form epoch until the stop condition is reached, which could be a
the tuning dataset. Even though such pairs of impulses and fixed epoch number or a predefined loss boundary.
responses cannot fully express the complex patterns in the real
world, it is sufficient to make the initial parameters inclined
to the sensible optimum. Notably, the tuning-sample buffers E. Algorithm of the Proposed Approach
that reflect knowledge of other restriction laws can be created The proposed approach can first run the pre-training module
by repeating the above steps. by processing the tuning samples (which are created by the
As shown in Fig. 2 (c), a hybrid pre-training strategy based pre-defined physical laws) to prepare optimal initial parame-
on meta-learning is adopted. First, in each learning iteration, it ters, and then update the model by using the training samples
will gradually reduce the proportion of tuning-samples in each (a.k.a., the observed samples). Accordingly, the algorithm of
learning iteration by adding observed samples. Second, First- the model pre-training module is described in Algorithm 1.
order Model-agnostic Meta-learning (FOMAML) is used to Note that it is essential to set an appropriate number of pre-
learn knowledge from multiple sample buffers simultaneously. training rounds, otherwise, it may lead to model curing, i.e.,
To be specific, assuming that there are 𝑆 buffers, the objec- over-fitting to the tuning samples.
tive function of FOMAML to find a set of initial parameters
minimizing the learner loss is defined by Formula 10. Algorithm 1 The model pre-training module
𝑆
Require: A group of real samples, 𝑆 laws of physics and
economics;
∑︁
min 𝐿(𝜙) = 𝑙 𝑠 (𝜃 𝑠 ) (10)
𝑠=1
1: Generate 𝑆 tuning-sample buffers according to the given
laws, divide the buffers by time and prepare Support set
where 𝜃 is the temporal parameter related to the initial param- R 1 and Query set R 2 for each buffer;
eter 𝜙; 𝜃 𝑠 represents the temporary parameter in buffer B𝑠 ; 2: Initialize the model 𝜙 and learning rate 𝜆;
𝑙 𝑠 (𝜃 𝑠 ) is the error calculated by the loss function of the model; 3: for each epoch do
and 𝐿(𝜙) is the total after-training loss of initial parameter 𝜙. 4: for each buffer 𝑠 (in parallel) do
Furthermore, its gradient function can be written as Formula 5: Compute 𝑔1𝑠 = ∇ 𝜙 𝑙 𝑠 (𝜙, R 1𝑠 );
(11), where ∇ 𝜙 𝐿(𝜙) is the gradient of 𝐿(𝜙) with respect to 6: Update 𝜙 to 𝜃 𝑠 with 𝑔1𝑠 ;
𝜙; ∇ 𝜙 𝑙 𝑠 (𝜃 𝑠 ) is the gradient of 𝑙 𝑠 (𝜃 𝑠 ) with respect to 𝜙; and 7: Obtain 𝑔2𝑠 = ∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 , R 2𝑠 );
∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 ) is the gradient of 𝑙 𝑠 (𝜃 𝑠 ) with respect to 𝜃. 8: Gradually reduce the proportion of tuning samples in
𝑆 𝑆
each buffer by mixing in real samples;
∑︁ ∑︁ 𝜕𝑙 𝑠 𝜕𝜃 𝑠 9: end for
∇ 𝜙 𝐿 (𝜙) = ∇ 𝜙 𝑙 𝑠 (𝜃 𝑠 ) = (11)
𝑠=1 𝑠=1
𝜕𝜃 𝑠 𝜕𝜙 10: Integrate all the 𝑔2𝑠 , where 𝑠 = 1, 2, ..., 𝑆
11: Update the global model 𝜙
Because the temporary parameter 𝜃 𝑠 is updated by the 12: end for
global initial parameter 𝜃 in buffer B𝑠 , mathematically 𝜃 𝑠 = 13: Return a set of pre-trained model parameters 𝜙′
𝜙 − 𝜀∇ 𝜙 𝑙 (𝜙), where 𝜀 is the learning rate, then 𝜕𝜃 𝑠 /𝜕𝜙 in
Formula (11) can be rewritten to Formula (12), in which, the
second derivative term is omitted to simplify the calculation. Based on the above-described pre-training process, the
The detailed proofs and evaluations of this first-order approx- overall learning procedure of the proposed approach can be
imation of meta-learning can be found in [12]. defined and described in Algorithm 2. Specifically, according
to the hyperparameters, the prediction model is first initialized
𝜕𝜃 𝑠 𝜀∇ 𝜙 𝑙 (𝜙) by the pre-training. After the pre-trained model is created, it is
=1− ≈1 (12) further customized by the observed samples. Once the model
𝜕𝜙 𝜕𝜙𝜕𝜙
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 7

0 10 20km

no data or 0 2 6 50 120 400 Time-based Pricing


Number of Charging Piles

Fig. 5. Spatial distribution of public EV charging piles in Shenzhen, China. The studied area contains 247 traffic zones.

Algorithm 2 The proposed approach PAG A. Evaluation preparation


Require: Hyperparameters (e.g., learning rate, epoch number, 1) Evaluation data: The data used in this study is drawn
loss function, optimizer, and laws of physics and eco- from a publicly available mobile application, which provides
nomics), and training samples (a.k.a., observed samples). the real-time availability of charging piles (i.e., idle or not).
Within Shenzhen, China, a total of 18,013 charging piles are
1: Initialize the prediction model 𝜙 covered during the studied period from 19 June to 18 July
2: Pre-train the model 𝜙 according to Algorithm 1 to obtain 2022 (30 days). Besides, the pricing schemes for the studied
optimal initial network parameters 𝜙′ charging piles are also collected. For analytical purposes,
3: Fine-tune the pre-trained model 𝜙′ on training set with
the data is organized as pile occupancy (i.e., demands) and
observed samples charging price, which are updated every five minutes for the
4: Predict the near-future EV charging demand
247 traffic zones introduced by the sixth Residential Travel
Survey of Shenzhen [33]. As shown in Fig. 5, 57 of the
zones use a time-based pricing scheme for EV charging, while
is ready, it can be used to predict the EV charging demand of the remainders use a fixed pricing scheme. From a spatial
a given zone in the studied area. perspective, all the studied traffic zones are connected with
In summary, first, the proposed approach utilizes GAT and their adjacent neighborhoods to form a graph dataset with
TPA to construct an integrated network, which can enhance 247 nodes and 1006 edges. From a temporal perspective, the
the ability in extracting features from spatiotemporal infor- evaluation data with a total of 8640 timestamps is divided
mation. Moreover, it introduces the physics-informed pre- into training, validation, and test sets with a ratio of [Link] in
training method based on meta-learning to remedy the impacts chronological order, i.e., Day 1-18, Day 19-24, and Day 25-
of misinterpretations. Such that, the approach can achieve 30, respectively. Moreover, each method is configured to run
outstanding performance as evaluated in the following section. separately to predict the charging demand in four different
time intervals, i.e., 15, 30, 45, and 60 minutes. Note that
since the temporal resolution of the dataset is 5 minutes, their
IV. M ODEL EVALUATION corresponding Δ (the fixed time interval for forecasting used
in the method) are 3, 6, 9, and 12, respectively.
In this section, the proposed approach is evaluated on an 2) The setting of the proposed approach: PAG is configured
EV charging dataset collected in Shenzhen and compared with as followings: First, the window size 𝑤 is 12 intervals, which
other representative prediction models. Moreover, ablation means the model retrospects 60 mins for prediction, as the
experiments are conducted to verify the necessity of each part average travel time for private cars in cities is usually less
of the proposed model. Note that the datasets and code used than one hour [34]. Second, the two hyper-parameters, i.e.,
in this paper are available upon request. the number of attention heads and GAT layers 𝐾 and 𝑀,
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 8

TABLE I
T HE LIST OF RUNNING CONFIGURATIONS OF COMPARED METHODS .

Model Setting Value Description


input (512, 247, 12, 2) (batch, node, sequence, feature)
output (512, 247, 1) (batch size, node number, 1)
seed 2023 The random seed for parameters initialization
NNs epoch 1000 Early stop if no loss decline for 100 consecutive epochs
Loss function MSE a widely use loss function for regression tasks
Optimizer Adam weight decay = 0.00001, Learning rate=0.001
𝛽 0.5 The ratio of residual connection
𝜆 0.005 The learning rate of pre-training
PAG 𝛾 1 − 𝑒/1000 Proportion of pseudo-samples in the 𝑒th round of pre-training
Optimizer BGD The gradient integration method in pre-training
GNNs * 𝑀 2 The number of graph propagation layers
GATs ** 𝐾 4 The number of attention heads
STGCN, DCRNN – – Referenced from the original papers
KNN (n neighbors, leaf size) (5, 30) It is implemented on Scikit-learn (sklearn)
Lasso alpha 1 × 10 −9 It is implemented on Scikit-learn (sklearn)
VAR Number of variables 247 × 12 × 2 Namely the number of node, sequence, and feature
* GCN, GAT, GCN-LSTM, AST-GAT, PAG-, and PAG;
** GAT, AST-GAT, PAG-, and PAG.

respectively, are set according to previous studies. Specifically, of the compared models are based on the suggestions of their
relevant research on graph propagation [35] found that 𝑀 authors, while the shared training settings are deployed to the
should be much smaller than the radius of the studied graph, same values as mentioned in the previous subsection, i.e.,
and another research on attention mechanism [10] suggested the retrospective window size 𝑤, maximum epoch number,
that 𝐾 can be equal to a multiple of feature number. Based on early stop mechanism, batch size, loss function, and optimizer.
pre-experiments, the most appropriate 𝑀 and 𝐾 are 2 and In summary, the important running configurations of these
4. Third, for the residual connection between GAT layers, models are listed in Table I.
𝛽 in Formula (4) is set as 0.5. Fourth, two tuning sample 4) Running environment: Four common metrics for regres-
buffers are generated based on two price elasticities for EV sion are used, i.e., Root Mean Squared Error (RMSE), Mean
charging, namely 1) −1.48, the price elasticity of demand Absolute Percentage Error (MAPE), Relative Absolute Error
for EV charging in Beijing [32]; and 2) −0.228, the average (RAE), and Mean Absolute Error (MAE). The result is the
price elasticity of electricity demand for households across the average of all nodes (i.e., the 247 studied traffic zones). All
global covering the period from 1950 to 2014 [36]. To reach the experiments are conducted on a Windows workstation with
the requirement of meta-learning pre-training, the buffers are an NVIDIA Quadro RTX 4000 GPU and an Intel(R) Core(TM)
further divided into two parts by time, i.e., Support set (Day i9-10900K CPU with 64G RAM.
1-12) and Query set (Day 13-24). Moreover, epoch numbers
of the pre-training and fine-tuning processes are set to 200 and B. Evaluation results and discussions
1000, respectively, with an early stopping mechanism: if the
The performance of evaluated methods is analyzed in three
validation loss does not decrease for 100 consecutive epochs.
aspects, namely 1) the forecasting error to illustrate how well
Fifth, Mean Square Error (MSE) is used as the loss function.
the model is to predict the future; 2) the ablation results to
Finally, to improve the learning performance, Adam [37] is
show the role and necessity of each part of the proposed
used with a mini-batch size of 512, a learning rate of 0.001
model; and 3) the model interpretations to demonstrate how
and a weight decay of 0.00001, respectively.
plausible the model is in handling price fluctuations.
3) Compared methods: Three statistical models and seven 1) Forecasting Error: As shown in Table II, the evaluation
neural networks (NN) are used as baselines. Specifically, the metrics of compared models are summarized. The result shows
three statistical models include Vector Auto-Regression (VAR) that the recurrent neural network LSTM outperforms both
[38], Least absolute shrinkage and selection operator (Lasso) the three statistical models (i.e., VAR, Lasso, and KNN) and
[39] and K-Nearest Neighbor (KNN) [40]. The NNs can be the simple neural network FCNN. While the typical graph
categorized into four groups, namely 1) a typical NN-based learning model GCN has a terrible performance in MAPE,
model: Fully Connected Neural Network (FCNN) [41]; 2) an RAE, and MAE, and also GAT, which is more capable
RNN-based model: Long Short-Term Memory (LSTM) [31]; in weight allocation than GCN, still performs poorly. By
3) two GNN-based models: Graph Convolutional Networks contrast, the combination of RNNs and GNNs (i.e., GCN-
(GCN) [42] and Graph Attention Networks (GAT) [10]; 4) four LSTM, DCRNN, and AST-GAT) is superior to the recurrent
spatiotemporal models: GCN-LSTM [17], STGCN [43] and network (i.e., LSTM). It suggests that graph knowledge needs
DCRNN [44], and AST-GAT [23]. In addition, the proposed to be coupled with temporal patterns for better performance.
model without pre-training, called PAG-, is used as a compared Furthermore, the attention-based baseline (i.e., AST-GAT) out-
model as well. To a fair comparison, the specific configurations performs the two convolution-based models (i.e., GCN-LSTM
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 9

TABLE II
P ERFORMANCE COMPARISON IN DIFFERENT PREDICTION INTERVALS .

Metric (10 −2 ) RMSE MAPE


Model 15min 30min 45min 60min average 15min 30min 45min 60min average
VAR 5.56 8.50 11.20 12.51 9.44 53.57 64.02 65.93 67.03 62.64
Lasso 4.35 6.33 7.97 9.34 7.00 12.28 19.43 25.64 31.56 22.23
KNN 4.59 6.50 7.91 9.09 7.02 11.56 16.60 20.79 24.64 18.40
FCNN 3.37 5.74 7.43 8.92 6.37 9.74 15.86 22.37 28.35 19.08
LSTM 3.59 5.65 8.54 9.01 6.70 8.91 14.94 21.00 24.15 17.25
GCN 4.20 6.64 8.13 8.06 6.76 40.28 46.15 49.35 52.03 46.96
GAT 3.39 5.74 7.41 8.89 6.36 10.24 16.74 23.62 29.38 20.00
GCN-LSTM 3.36 5.79 6.34 7.74 5.81 11.72 16.92 22.48 28.17 19.32
STGCN 3.55 5.58 6.91 8.74 6.20 31.41 32.58 43.07 49.62 39.17
DCRNN 3.72 5.40 6.79 7.82 5.93 11.32 16.04 19.94 24.68 18.00
AST-GAT 3.41 5.38 6.58 7.54 5.73 9.93 18.00 17.72 27.69 18.33
PAG 3.02 5.16 6.52 7.21 5.48 9.34 14.57 17.55 26.01 16.87
Metric (10 −2 ) RAE MAE
Model 15min 30min 45min 60min average 15min 30min 45min 60min average
VAR 45.09 56.48 61.20 62.01 56.19 8.03 10.05 10.87 11.02 9.99
Lasso 14.12 21.04 26.45 31.33 23.24 2.51 3.74 4.70 5.64 4.15
KNN 13.97 20.42 25.43 29.70 22.38 2.49 3.63 4.52 5.42 4.02
FCNN 11.06 18.64 24.16 29.34 20.80 1.97 3.32 4.30 5.29 3.72
LSTM 10.66 18.14 26.78 29.07 21.16 1.90 3.23 4.76 6.36 4.06
GCN 35.70 41.38 45.54 45.69 42.08 6.35 7.36 8.10 9.05 7.72
GAT 11.30 18.90 24.54 29.57 21.08 2.01 3.36 4.36 5.35 3.77
GCN-LSTM 11.10 18.80 24.00 29.32 21.80 5.70 3.17 4.56 6.06 4.87
STGCN 28.06 31.87 36.81 40.45 34.30 4.99 5.67 6.54 7.34 6.14
DCRNN 12.80 18.97 23.81 28.94 21.13 2.28 3.37 4.23 5.13 3.75
AST-GAT 11.39 18.70 22.30 28.74 20.28 2.03 3.33 3.96 4.62 3.49
PAG 10.58 17.64 22.10 28.21 19.63 1.88 3.14 3.93 4.35 3.33

and DCRNN), illustrating the superiority of the attention-based marginal.


network structure in spatio-temporal feature representation. To sum up, first, it is efficient and effective by combining
The result also shows that the proposed approach PAG GAT and TPA. Second, the underlying spatiotemporal features
outperforms other methods in all four metrics with fewer cannot be extracted by graph or temporal attention alone, as
forecasting errors, specifically, 0.0573 in RMSE, 16.87% in the models without GAT or TPA give similar results to FCNN
MAPE, 19.63% in RAE, and 0.0393 in MAE. Compared as shown in Table II. Finally, besides the average improvement
to VAR, a widely-used statistical model for data analysis, achieved by PIML of about 4.18%, the main contribution of
the proposed model has an improvement of 61.3% on av- PIML is to reduce misinterpretation, which will be evaluated
erage. Furthermore, PAG performs better than LSTM with and discussed in the following subsection.
18.21%, 13.17%, 11.37%, and 17.98% improvements in
RMSE, MAPE, RAE, and MAE, respectively. When compared TABLE III
to the state-of-the-art spatio-temporal prediction models (i.e., R ESULTS OF ABLATION EXPERIMENT.
AST-GAT and DCRNN), the improvement becomes 6.00%
Interval Model RMSE MAPE RAE MAE
in RMSE, 7.14% in MAPE, 5.14 % in RAE, and 8.01% in
PAG 3.02 9.34 10.58 1.88
MAE. These comparisons illustrate that the graph embedding without Meta 3.11 9.98 11.12 1.95
and multivariate decoding modules can work jointly to achieve 15min without GAT 4.44 16.94 15.56 3.19
optimal performance. without TPA 5.08 17.62 16.45 3.31
PAG 5.16 14.94 18.14 3.23
2) Ablation Experiment: The ablation experiment is set up without Meta 5.33 15.28 18.76 3.34
by eliminating each module of PAG (i.e., the pre-training step 35min without GAT 7.61 20.14 23.19 6.33
PIML, the graph embedding module GAT, and the multivariate without TPA 8.14 21.24 24.77 6.79
decoder TPA) one by one, and the results are summarized PAG 6.52 17.55 22.10 3.93
in Table III. First, it can be concluded from the table that without Meta 6.87 18.07 23.66 4.30
45min without GAT 8.68 23.55 27.94 7.49
all the proposed modules are necessary for PAG to achieve without TPA 8.73 23.88 28.72 7.59
outstanding performance. Specifically, TPA accounts for the PAG 7.21 26.01 28.21 4.35
largest contribution, with a decrease in RMSE of 50.84% without Meta 7.41 26.23 29.23 4.53
compared to the complete model (i..e, noted as PAG), and 60min without GAT 9.61 31.41 34.58 8.76
without TPA 10.36 32.47 35.66 8.93
GAT comes in the second with a 40.27% decrease in RMSE.
To better illustrate the differences, the average degradation of
each module is presented in Fig. 6, in which the bars of TPA 3) Interpretation: In this part, whether PAG has a proper
and GAT make up the majority, while the ones of PIML are interpretation between EV charging demands and prices is
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 10

MAE PAG
PAG-
AST-GAT

* Response of occupancy (∆y/y)


Response of occupancy (∆y/y)
RAE GCN-LSTM

MAPE

RMSE

0% 20% 40% 60% 80% 100%


without PIML without GAT without TPA

Fig. 6. Percent stacked bars of average performance degradation compared


to the full model.

Impulse of price (∆p/p)

discussed. 57 of the studied zones are with dynamic pricing,


Fig. 7. Responses of EV charging occupancy to the std impulses of prices.
i.e., the areas with red boxes shown in Fig. 5. Moreover, the
30-mins prediction model is used for this testing, because the
average commuting time in Shenzhen is about 36 minutes charging prices in one area will raise EV charging demand
according to the 2022 China Major Cities Commuting Moni- in its neighborhoods, and vice versa. While in its 2-hop
toring Report [45]. A typical model (i.e., GCN-LSTM) and a neighbors, the direction of the impacts is less certain. Taking
recently developed model (i.e., AST-GAT) for spatiotemporal a traffic zone in the central business district of Shenzhen as
prediction are used for comparison. an example, the local and neighborhood responses to a 30%
Two types of price impulses are added for the prediction, price change are illustrated in Fig. 10. It can be seen that the
namely 1) pairs of negative and positive standard deviation of corresponding increases in 1-hop neighborhood demand can
local charging prices, i.e., Δ𝑝 = std( 𝑝); 2) fixed percentiles offset the reductions in local demand with a ratio of more
of local charging prices, i.e., {±10%, ±20%, ±30%}. First, than 90%, indicating that surrounding areas can be used as
Fig. 7 shows the responses of EV charging occupancy to the alternatives for related demands. In contrast, it shows that
standard-deviation impulses of electricity prices. It can be seen 2-hop neighbors may not be substitutes to accommodate the
that the demand responses of GCN-LSTM and AST-GAT have shifting demands because their responses to price fluctuations
the same trend encoded in the price impulses, supporting the are marginal. Given the above results, it can be concluded
argument that the existing models over-fit sample distribution that the proposed model can obtain a correct understanding
and make misinterpretations in the relationship between EV of spillover effects to reduce the misinterpretations that are
charging demand and price. commonly existed in current methods.
Furthermore, although the attention-based model PAG- is In summary, the proposed approach can significantly reduce
aware of the reverse impact, it reacts erratically up to 1000% the prediction errors with the highest scores in all four metrics,
and down to −1000%, approximately, while the price fluctua- i.e., RMSE 0.0548, MAPE 16.87%, RAE 19.63%, and MAE
tion ranges between ±40%. Such a drastic reaction is clearly 0.0333. In particular, the combination of GAT and TPA for
misbehaved, because energy consumption is usually inelastic graph embedding and multivariate spatiotemporal decoding is
to price according to the relevant literature, e.g., natural gas shown to be effective according to the ablation experiments.
for households [46], gasoline for vehicle miles traveled [47], Moreover, although it is shown that the gains from the pre-
and residential electricity demand [36]. Similar phenomena training module are marginal, it can prevent misinterpretations
can be seen in Fig. 8, where each box shows the distribution of to handle the impacts of price fluctuations on charging de-
demand responses to the percentile impulses for the 57 traffic mands.
zones with time-varying pricing schemes. In contrast, the pre-
trained model PAG can give reasonable responses as shown
in Fig. 7, where PAG has the most concentrated distribution V. C ONCLUSIONS AND FUTURE WORKS
of responses with absolute values ranging from 0 to 0.6, and EV charging demand prediction plays an important role
also in Fig. 8, where PAG has minimal average response about in improving the ”smartness” of urban power grids and in-
75% of the corresponding impulse strength. All these results telligent transportation systems, e.g., directing EV drivers to
indicate that PAG can interpret an inelastic demand for EV available parking spots with user experience improved and
charging with respect to price fluctuations in the studied city adjusting charging prices to redirect the charging demand to
(i.e., Shenzhen). the surrounding areas. This paper proposes a novel approach,
Third, the price-fluctuation impacts on EV charging demand named PAG, to train a prediction model with misinterpreta-
in 1- and 2-hop neighbors is presented in Fig. 9, where the tions addressed and performance improved. Specifically, PAG
demand changes in 1-hop adjacent zones have the same sign combines graph and temporal attention mechanisms for graph
with the local price fluctuations. It means an increase in embedding and multivariate time-series decoding and also
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 11

4 GCN-LSTM AST-GAT PAG- PAG


Occupancy Response (∆y/y)

-2

-4

-0.3 -0.2 -0.1 0.1 0.2 0.3


Price Fluctuation (∆p/p)

Fig. 8. Responses of EV charging occupancy to the percentile impulses of prices.

0.2
0 1 2 km
Occupancy Response (∆y/y)

0.1 ∆p/p = +30%

∆y/y
0 2-hop
1-hop
0.0328 -4.56e-06
0.0374 4.26e-06
-0.1
0.0375 4.90e-06
1-hop 0.0437 5.04e-06
-0.2 0.0438 5.96e-06
2-hop 0.0524 8.43e-06
1.06e-05
-0.3 -0.2 -0.1 0.1 0.2 0.3 local 1.44e-05
-0.2625
Price Fluctuation (∆p/p) 1.54e-05
1.57e-05
Fig. 9. The proposed model’s responses in neighboring areas to price 1.77e-05
fluctuations.
Fig. 10. Example of neighboring responses in the central business district of
the studied city, Shenzhen.
introduces a model pre-training model based on physics-
informed meta-learning. As shown by the evaluation results,
PAG can not only outperform the state-of-the-art methods by Learning may be one way to address this issue. Finally, the
approximately 6.57% on average in forecasting errors but also approach can be enhanced to quantify the spillover effects.
obtain a correct understanding of spillover effects to reduce Such that regulators and managers can better detect impacts
the misinterpretations between charging demands and prices in related policies to optimize the running of intelligent
that are commonly existed in current methods. transportation systems.
Although PAG can achieve state-of-the-art performance, it
still has certain limitations and can be improved in future
work. First, the focus of this work remains on the relationship R EFERENCES
between EV charging demand and price. However, there are [1] S. Powell, G. V. Cezar, L. Min, I. M. L. Azevedo, and R. Rajagopal,
still many misinterpretations about the current deep learning- “Charging infrastructure access and operation to reduce the grid impacts
based models to assist intelligent transportation systems, espe- of deep electric vehicle adoption,” Nature Energy, vol. 7, no. 10, pp.
932–945, 2022.
cially in spatiotemporal scenarios. Further investigations and [2] IEA, “Global ev outlook 2022.” [Online]. Available: [Link]
corresponding optimizations on the stacked model are needed. org/reports/global-ev-outlook-2022
Moreover, the model pre-training needs to be carefully mon- [3] Y. Ren, X. Sun, P. Wolfram, S. Zhao, X. Tang, Y. Kang, D. Zhao, and
X. Zheng, “Hidden delays of climate mitigation benefits in the race for
itored to obtain the correct knowledge, and the introduction electric vehicle deployment,” Nature Communications, vol. 14, no. 1, p.
of an automated critique module trained by Reinforcement 3164, 2023.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 12

[4] Z. Tian, T. Jung, Y. Wang, F. Zhang, L. Tu, C. Xu, C. Tian, and X.-Y. Li, [24] L. N. Do, H. L. Vu, B. Q. Vo, Z. Liu, and D. Phung, “An effective spatial-
“Real-time charging station recommendation system for electric-vehicle temporal attention based neural network for traffic flow prediction,”
taxis,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, Transportation Research Part C: Emerging Technologies, vol. 108, pp.
no. 11, pp. 3098–3109, 2016. 12–28, 2019.
[5] C. Fang, H. Lu, Y. Hong, S. Liu, and J. Chang, “Dynamic pricing for [25] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac-
electric vehicle extreme fast charging,” IEEE Transactions on Intelligent tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–
Transportation Systems, vol. 22, no. 1, pp. 531–541, 2021. 1359, 2010.
[6] Y. Bao, J. Huang, Q. Shen, Y. Cao, W. Ding, Z. Shi, and Q. Shi, [26] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
“Spatial–temporal complex graph convolution network for traffic flow W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten
prediction,” Engineering Applications of Artificial Intelligence, vol. 121, zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551,
p. 106044, 2023. 1989.
[7] I. Ullah, K. Liu, T. Yamamoto, M. Zahid, and A. Jamal, “Modeling of [27] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities
machine learning with shap approach for electric vehicle charging station improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1.
choice behavior prediction,” Travel Behaviour and Society, vol. 31, pp. Atlanta, Georgia, USA, 2013, p. 3.
78–92, 2023. [28] S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, in deep learning: A comprehensive survey and benchmark,” Neurocom-
L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances puting, 2022.
in Neural Information Processing Systems 30: Annual Conference on [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Neural Information Processing Systems 2017, December 4-9, 2017, Long recognition,” in Proceedings of the IEEE conference on computer vision
Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, and pattern recognition, 2016, pp. 770–778.
R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 5998– [30] J. Klicpera, A. Bojchevski, and S. Gunnemann, “Predict then propagate:
6008. Graph neural networks meet personalized pagerank,” in 7th International
[9] S.-Y. Shih, F.-K. Sun, and H.-y. Lee, “Temporal pattern attention for Conference on Learning Representations (ICLR), New Orleans, LA,
multivariate time series forecasting,” Machine Learning, vol. 108, no. 8, USA, May 6-9, 2019. [Link], 2019.
pp. 1421–1441, 2019. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[10] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Li, and Y. Ben- computation, vol. 9, no. 8, pp. 1735–80, 1997.
gio, “Graph attention networks,” in 6th International Conference on [32] Z. Bao, Z. Hu, D. M. Kammen, and Y. Su, “Data-driven approach for
Learning Representations, (ICLR). [Link], 2018. analyzing spatiotemporal price elasticities of ev public charging demands
[11] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural based on conditional random fields,” IEEE Transactions on Smart Grid,
networks: A deep learning framework for solving forward and inverse vol. 12, no. 5, pp. 4363–4376, 2021.
problems involving nonlinear partial differential equations,” Journal of [33] J. Zhou and L. Ma, “Analysis on the evolution characteristics of
Computational Physics, vol. 378, pp. 686–707, 2019. shenzhen residents’ travel structure and the enlightenment of public
[12] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for transport development policy,” Urban Mass Transit, vol. 24, 2021.
fast adaptation of deep networks,” in the 34th International Conference [34] H. Tenkanen and T. Toivonen, “Longitudinal spatial dataset on travel
on Machine Learning (ICML), vol. 70, 2017, pp. 1126–1135. times and distances by different travel modes in helsinki region,”
[13] Z. Shi, Y. Chen, J. Liu, D. Fan, and C. Liang, “Physics-informed Scientific Data, vol. 7, no. 1, 2020.
spatiotemporal learning framework for urban traffic state estimation,” [35] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Lightgcn:
Journal of Transportation Engineering, Part A: Systems, vol. 149, no. 7, Simplifying and powering graph convolution network for recommenda-
p. 04023056, 2023. tion,” in Proceedings of the 43rd International ACM SIGIR Conference
[14] Z. Zhou and T. Lin, “Spatial and temporal model for electric vehi- on Research and Development in Information Retrieval, ser. SIGIR ’20.
cle rapid charging demand,” in 2012 IEEE VEHICLE POWER AND New York, NY, USA: Association for Computing Machinery, 2020, p.
PROPULSION CONFERENCE (VPPC), ser. IEEE Vehicle Power and 639–648.
Propulsion Conference. IEEE, 2012, pp. 345–348, iEEE Vehicle Power [36] X. Zhu, L. Li, K. Zhou, X. Zhang, and S. Yang, “A meta-analysis on the
and Propulsion Conference (VPPC), Seoul, SOUTH KOREA, OCT 09- price elasticity and income elasticity of residential electricity demand,”
12, 2012. Journal of Cleaner Production, vol. 201, pp. 169–177, 2018.
[15] L. Knapen, B. Kochan, T. Bellemans, D. Janssens, and G. Wets, [37] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
“Activity-based modeling to predict spatial and temporal power demand tion,” in 3rd International Conference on Learning Representations,
of electric vehicles in flanders, belgium,” Transportation Research ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
Record, vol. 2287, no. 1, pp. 146–154, 2012. Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
[16] Y. Mu, J. Wu, N. Jenkins, H. Jia, and C. Wang, “A spatial–temporal [38] A. Inoue and L. Kilian, “Inference on impulse response functions in
model for grid impact analysis of plug-in electric vehicles,” Applied structural var models,” Journal of Econometrics, vol. 177, no. 1, pp.
Energy, vol. 114, pp. 456–465, 2014. 1–13, 2013.
[17] S. Yang, W. Ma, X. Pi, and S. Qian, “A deep learning approach [39] R. Tibshirani, “Regression Shrinkage and Selection via The Lasso:
to real-time parking occupancy prediction in transportation networks A Retrospective,” Journal of the Royal Statistical Society Series B:
incorporating multiple spatio-temporal data sources,” Transportation Statistical Methodology, vol. 73, no. 3, pp. 273–282, 04 2011.
Research Part C: Emerging Technologies, vol. 107, pp. 248–265, 2019. [40] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE
[18] X. Xu, T. Zhang, C. Xu, Z. Cui, and J. Yang, “Spatial–temporal Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
tensor graph convolutional network for traffic speed prediction,” IEEE [41] K.-Y. Hsu, H.-Y. Li, and D. Psaltis, “Holographic implementation of
Transactions on Intelligent Transportation Systems, vol. 24, no. 1, pp. a fully connected neural network,” Proceedings of the IEEE, vol. 78,
92–103, 2023. no. 10, pp. 1637–1645, 1990.
[19] Y. Zhang, Y. Li, X. Zhou, J. Luo, and Z.-L. Zhang, “Urban traffic dynam- [42] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
ics prediction—a continuous spatial-temporal meta-learning approach,” convolutional networks,” in 5th International Conference on Learning
ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, Representations (ICLR), Toulon, France, April 24-26, 2017, Conference
jan 2022. Track Proceedings. [Link], 2017.
[20] Y. Xiang, Z. Jiang, C. Gu, F. Teng, X. Wei, and Y. Wang, “Electric [43] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
vehicle charging in smart grid: A spatial-temporal simulation method,” networks: A deep learning framework for traffic forecasting,” in Pro-
Energy, vol. 189, p. 116221, 2019. ceedings of the Twenty-Seventh International Joint Conference on Ar-
[21] T. Yi, C. Zhang, T. Lin, and J. Liu, “Research on the spatial-temporal tificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden.
distribution of electric vehicle charging load demand: A case study in [Link], 2018, pp. 3634–3640.
china,” Journal of Cleaner Production, vol. 242, p. 118457, 2020. [44] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional re-
[22] S. Su, Y. Li, Q. Chen, M. Xia, K. Yamashita, and J. Jurasz, “Operating current neural network: Data-driven traffic forecasting,” in 6th Inter-
status prediction model at ev charging stations with fusing spatiotempo- national Conference on Learning Representations, (ICLR), Vancouver,
ral graph convolutional network,” IEEE Transactions on Transportation BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
Electrification, vol. 9, no. 1, pp. 114–129, 2023. [Link], 2018.
[23] D. Li and J. Lasenby, “Spatiotemporal attention-based graph convolution [45] CAUPD, “the 2022 annual commuting monitoring report for major
network for segment-level traffic prediction,” IEEE Transactions on cities in china,” 2022. [Online]. Available: [Link]
Intelligent Transportation Systems, vol. 23, no. 7, pp. 8337–8345, 2022. cms/report/2022tongqin
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 13

[46] P. J. Burke and H. Yang, “The price and income elasticities of natural
gas demand: International evidence,” Energy Economics, vol. 59, pp.
466–474, 2016.
[47] N. Rivers and B. Schaufele, “Gasoline price and new vehicle fuel
efficiency: Evidence from canada,” Energy Economics, vol. 68, pp. 454–
465, 2017.

Haohao Qu is currently a PhD student of the


Department of Computing (COMP), The Hong Kong
Polytechnic University. He received the B.E. and
M.E. degree from the School of Intelligent Systems
Engineering, Sun Yat-Sen University, People’s Re-
public of China, in 2019 and 2022, respectively.
His research interests include Meta-learning, Graph
Neural Networks, and their applications in Intelli-
gent Transportation Systems and Recommendation
Systems. He has authored and co-authored inno-
vative works in top-tier journals (e.g., TITS) and
international conferences (e.g., KSEM, UIC, and ICTAI). More information
about him can be found at [Link]

Haoxuan Kuang received his B.E. degree in School


of Intelligent Systems Engineering, Sun Yat-sen
University, China, in 2022, where he is currently
working toward the M.E. degree in Transportation
engineering. His research interests include artificial
intelligence, intelligent transportation systems and
traffic status prediction.

Dr. Jun Li an Associate Professor at School of


Intelligent Systems Engineering, Sun Yat-Sen Uni-
versity. He received his Ph.D. in Civil Engineering
from Nagoya University, his master’s degree from
the Asian Institute of Technology, and his BSc from
Tsinghua University. His research interests include
travel behavior analysis, transportation economic
analysis, and system optimization.

Dr. Linlin You is an Associate Professor at the


School of Intelligent Systems Engineering, Sun Yat-
sen University, and a Research Affiliate at the In-
telligent Transportation System Lab, Massachusetts
Institute of Technology. He was a Senior Postdoc
at the Singapore-MIT Alliance for Research and
Technology, and a Research Fellow at the Singapore
University of Technology and Design. He received
his Ph.D. in Computer Science from the University
of Pavia, his dual master’s degrees with honor in
Software Engineering and Computer Science from
the Harbin Institute of Technology and the University of Pavia, and his BSc
in Software Engineering from the Harbin Institute of Technology. He authored
or co-authored more than 40 journal and conference papers in the research
fields of Internet of Things, Smart Cities, Autonomous Transportation System,
Multi-source Data Fusion, and Federated Learning.

View publication stats

You might also like