A Physics-Informed and Attention-Based Graph Learn
A Physics-Informed and Attention-Based Graph Learn
net/publication/373837842
CITATIONS READS
0 145
4 authors, including:
All content following this page was uploaded by Haohao Qu on 15 September 2023.
30
cantly alleviate the growing load on intelligent transportation 1.08
systems. As the foundation to achieve such an optimization, a 25
1.06
spatiotemporal method for EV charging demand prediction in 20 1.04
urban areas is required. Although several solutions have been 15 1.02
proposed by using data-driven deep learning methods, it can be
10 1.00
found that these performance-oriented methods may suffer from
0.98
misinterpretations to correctly handle the reverse relationship Mon. Tue. Wed. Thu. Fri. Sat. Sun.
between charging demands and prices. To tackle the emerging Time (June 19-26, 2022)
challenges of training an accurate and interpretable prediction
model, this paper proposes a novel approach that enables
the integration of graph and temporal attention mechanisms
for feature extraction and the usage of physic-informed meta-
learning in the model pre-training step for knowledge transfer. Data-driven
Evaluation results on a dataset of 18,013 EV charging piles Model
in Shenzhen, China, show that the proposed approach, named Physical/Economic
PAG, can achieve state-of-the-art forecasting performance and Laws
the ability in understanding the adaptive changes in charging
demands caused by price fluctuations.
“the higher the price, the “the higher the price, the
Index Terms—Electric vehicle charging, spatio-temporal pre-
diction, graph attention networks, meta-learning greater the demand” lower the demand”
integration with a stronger feature representation capability mand prediction and analysis, several challenges are emerging,
[9], [10]. However, the corresponding network can still not which include:
cover the dependencies and interactions between EV charging- • Feature extraction: EV charging stations are scattered
related factors. As for misinterpretations, Physics-informed all over the city and correlated with each other, as
Neural Networks (PINNs) [11] and Meta-learning [12] are the temporal demands can only be fulfilled once and
two feasible solutions. Specifically, PINNs can encode explicit the price and location differences may influence their
laws of physics, while meta-learning can facilitate the transfor- utilization. By considering charging stations as nodes,
mation of knowledge learned from related domains. However, their relationships (such as distances) as edges, time-
the approach with PINN, Meta-learning, GCN, and attention varying demand as sequences, and influencing factors as
mechanism integrated is still missing to support accurate and features, how to capture the hidden patterns in such a
unbiased knowledge learning. high-dimensional tensor with heterogeneous components
To fill the gap, this paper proposes a physics-informed and is the key issue in the EV charging demand prediction.
attention-based graph learning approach for the prediction of • Biased information: The data-driven models for EV
EV charging demand, called PAG, which combines Physics- charging demand prediction can suffer from an intrinsic
Informed Meta-Learning (PIML), Graph Attentional Network problem of misinterpretation, i.e., ”the higher the price
(GAT), and Temporal Pattern Attention (TPA) for model pre- the higher the occupancy” as illustrated by Fig. 1, which
training, graph embedding, and multivariate temporal decod- is caused by the partially observed data associated with
ing, respectively. Specifically, on the one hand, the typical mass-deployed pricing strategies of reducing peak charg-
convolutional and recursive operations are replaced by the ing demands by raising prices. In the case that unbiased
two attention mechanisms to enhance the flexibility and inter- data is hard to be obtained, how to counterbalance the
pretability of the prediction model. On the other hand, the pre- influence of biased information is an important problem
training step (i.e., PIML) processes tuning samples generated to ensure the correctness of forecasting results.
according to predefined physic/ economic laws and the ob- • Knowledge learning: The relationship between EV charg-
served samples together to extract common knowledge through ing demand and related factors vary across time and place.
meta-learning with potential misinterpretations remedied. When introducing different physical and economic laws
Moreover, through a rigorous evaluation based on a real- to equip data-driven models [13], a unified and scalable
world dataset with samples of 18,013 EV charging piles in method is required to learn the knowledge fairly from
Shenzhen, China, from 19 June to 18 July 2022 (30 days), observed and tuning samples (generated from physical
the efficiency and effectiveness of the proposed model are and economic laws).
demonstrated. Specifically, as shown by the results, PAG can
reduce forecasting errors by about 6.57% in four metrics, and
compared to other state-of-the-art baselines, it can also provide B. Related solutions
correct responses to price fluctuations. To tackle the aforementioned challenges, several solutions
In summary, the main contributions of this paper include: are proposed. Initially, to address graph embedding and mul-
1) An effective prediction model combing two attention tivariate time-series forecasting, scholars mainly focus on
mechanisms, i.e., GAT and TPA, is designed for EV statistic reasoning of model graph structures and infer future
charging demand prediction. statuses, separately [14], [15]. For instance, a spatiotemporal
2) A informed model pre-training method, called PIML, model is proposed [16] to evaluate the impact of large-
is proposed to tackle the intrinsic misinterpretations of scale deployment of plug-in electric vehicles (EVs) on urban
conventional data-driven methods to mine the correct distribution networks. Although these statistical reasoning-
relationship between the charging demands and prices. based models can be computationally efficient and easy to
3) An integration of GAT, TPA, and PIML enables knowl- be interpreted, there is still a limitation in expressing high-
edge adaptation and propagation to build an outstanding dimensional and non-linear features.
predictor with high accuracy and correct interpretation. More recently, with the rapid development of deep learning
The remainder of this paper is structured as follows. Section techniques, the integration of Graph Neural Networks (GNNs)
II summarizes related challenges and solutions about EV and Recurrent Neural Networks (RNNs) is emphasized to
charging demand prediction. Then, Section III introduces the enable accurate spatiotemporal prediction in transportation
proposed approach PAG, which is evaluated in Section IV. systems, such as parking occupancy [17], traffic speed [18],
Finally, conclusions and future works are drawn in Section V. flow [19] and also EV charging demand predictions [20],
[21], e.g., SGCN [22] combines graph convolutional network
II. L ITERATURE REVIEW (GCN) and gated recurrent unit (GRU) to extract spatial and
temporal features of the operating status at EV charging sta-
In this section, related challenges together with solutions
tions, respectively to better assist the prediction. However, the
for EV charging demand prediction are summarized.
typical convolutional and recurrent structures can be crippled
by their inflexibility and ambiguity in weight assignment.
A. Emerging challenges Later on, a more advanced technique, known as attention
In general, to create an efficient and effective model with mechanism [8], and its variants [9], [10] pose the potential
temporospatial information embedded for EV charging de- in refining the typical network structures, due to their strong
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 3
: Predicted Value
a. -hop (1 − ) b. MLP
c.
Mul�-head GAT
TPA
stack Network
Learnable …
Networks 1-hop Conv2d Params
(1 − ) ℎ
…
Mul�-head GAT {ℎ1 , … , ℎ Gradients
−1 }
local Pre-training (Parallel)
Conv2d LSTM
…
Mix
High-
dimensional … …
′′ ′′ ′′ ′′ …
Features 1 2 3 1 2 2
Fig. 2. Overall structure of the proposed approach, which consists of (a). Graph Embedding Module, (b). Multivariate Decoder Module, and (c). Model
Pre-training Module.
where e𝑖 𝑗 represents the similarity between node 𝑖 and 𝑗; ∥ residual connection [29], [30] is introduced. Specifically, as
′
is the concatenation operation; W ∈ R𝐹 ×𝐹 denotes a shared for node 𝑖, the final output of the graph embedding module
linear transformation that changes the number of input feature 𝑥𝑖′′ can be calculated according to Formula (4).
′ ′
from 𝐹 to 𝐹 ′ for each node; and 𝑎 : R𝐹 × R𝐹 → R
xi ′′ = ∥ 𝑚=1
𝑀
[(1 − 𝛽)x𝑖′𝑚 + 𝛽x𝑖′𝑚−1 ], x𝑖′0 = x𝑖 (4)
represents a shared attention mechanism, which applies the
LeakyReLU function [27] to map the concatenated high- where 𝑥𝑖′′
has 𝑀 𝐹 features and 𝛽 is the residual coefficient.
′
dimensional features with a weight vector a® ∈ R2𝐹 to generate Note that due to the data frame constraint of residual connec-
the similarity score. tion, the number of features 𝐹 and 𝐹 ′ before and after GAT
Step 2: For node 𝑖, its attention coefficient for each neighbor are set to be the same.
𝑗 ∈ N𝑖 is calculated according to Formula (2).
C. Multivariate Decoder Module
exp(e𝑖 𝑗 )
𝛼𝑖 𝑗 = softmax 𝑗 (e𝑖 𝑗 ) = Í (2) It adopts TPA-LSTM [9] as the decoder to learn the mul-
𝑛∈ N𝑖 exp(e𝑖𝑛 ) ′′ ,
tivariate time series patterns from the prepared features x𝑖,𝑡
where 𝛼𝑖 𝑗 represents the attention coefficient of node 𝑖 and which represents the features of node 𝑖 at time 𝑡. Note that
𝑗. The attention coefficient indicates the feature importance since there is not any node-wise computation in the decoder
module, x𝑖,𝑡′′ is simplified as x in this section.
of node 𝑗 to node 𝑖 and the softmax function [28] is used to 𝑡
normalize it. Within the decoder, as shown in Fig. 2 (b), first, a Long
Step 3: For node 𝑖, 𝐾 independent attention mechanisms Short-Term Memory (LSTM) layer [31] is used to extract
are performed and averaged to generate x𝑖′ the output of GAT temporal patterns. Specifically, given an input x𝑡 ∈ R 𝑀𝐹 , the
layer according to Formula (3). hidden state ht and the cell state ct at time 𝑡 can be computed
according to Formula (5).
𝐾
1 ∑︁ ∑︁ 𝑘 𝑘
x𝑖′ = 𝜎( 𝛼 W x𝑗) (3) u𝑡 = 𝜎(W𝑢𝑢 x𝑡 + 𝑏 𝑢𝑢 + Wℎ𝑢 h𝑡 −1 + 𝑏 ℎ𝑢 )
𝐾 𝑘=1 𝑗 ∈ N 𝑖 𝑗
𝑖 f𝑡 = 𝜎(W𝑢 𝑓 x𝑡 + 𝑏 𝑢 𝑓 + Wℎ 𝑓 h𝑡 −1 + 𝑏 ℎ 𝑓 )
where 𝜎 is the activation function (e.g., Sigmoid or ReLU). g𝑡 = tanh(W𝑢𝑔 x𝑡 + 𝑏 𝑢𝑔 + Wℎ𝑔 h𝑡 −1 + 𝑏 ℎ𝑔 )
(5)
As shown in Fig. 2 (a), by stacking multiple GAT layers, q𝑡 = 𝜎(W𝑢𝑞 x𝑡 + 𝑏 𝑢𝑞 + Wℎ𝑞 h𝑡 −1 + 𝑏 ℎ𝑞 )
the spatial information of nodes (i.e., zones in the area) can
c𝑡 = f𝑡 ⊙ c𝑡 −1 + u𝑡 ⊙ g𝑡
be captured and propagated among their neighbors, e.g., if 𝑀
GAT layers are used, each node can obtain the knowledge h𝑡 = q𝑡 ⊙ tanh(c𝑡 )
within 𝑀-hop neighbors. Moreover, to tackle the over-smooth where h𝑡 −1 is the hidden state of the LSTM layer at time 𝑡 − 1
problem caused by the stacked GAT layers, a momentum or the initial hidden state at time 0; u𝑡 , f𝑡 , g𝑡 , and q𝑡 are the
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 5
𝐶
ℎ1,1 𝐶
ℎ1,2 … 𝐶
ℎ1,𝑀 𝛼Ԧ1
𝐶
ℎ2,1 𝐶
ℎ2,2 … 𝐶
ℎ2,𝑀 𝛼Ԧ2
𝑀-hop Scoring
Feature 𝐇𝐶 𝐖𝛼
…
… Function
…
…
Maps …
𝐶
ℎ𝑀,1 𝐶
ℎ𝑀,2 … 𝐶
ℎ𝑀,𝑀 𝛼Ԧ𝑀
Average Pooling
LSTM … 𝐡𝑡
input, forget, cell and output gates, respectively; W and 𝑏 are Physics/Economics Laws
Pre-Training
weights and biases; 𝜎 is the sigmoid function, and ⊙ is the Fine-tuning
Hadamard product. Assuming that h𝑡 ∈ R 𝑀 𝐹 , then u𝑡 , f𝑡 , g𝑡 , Random Init
q𝑡 , and c𝑡 ∈ R 𝑀 𝐹 ; W𝑢𝑢 , W𝑢 𝑓 , W𝑢𝑔 , W𝑢𝑞 ∈ R 𝑀 𝐹 ×𝑀𝐹 ; Wℎ𝑢 , Op�mal Init
Wℎ 𝑓 , Wℎ𝑔 and Wℎ𝑞 ∈ R 𝑀 𝐹 ×𝑀 𝐹 . Well-Trained
Second, by applying 𝑀 filters of CNN C𝑚 ∈
R𝐹 ×𝑤 with a stride length of 𝐹 on the hidden states
{h𝑡 −𝑤 , h𝑡 −𝑤+1 , ..., h𝑡 −1 }, and an average pooling P 𝛼 ∈ R𝐹 ×1
with a stride length of 𝐹 on h𝑡 , the computational process of
TPA can be revised to create attention on both hop-wise and
sequence-wise features as illustrated in Fig. 3. Mathematically,
𝑚 𝑡 ℎ attention score can be calculated according to Formula (6).
Fig. 4. Idea behind the model pre-training framework of Physics-Informed
Meta-Learning.
® 𝑚 = 𝜎(PT𝛼 h𝑡 ⊙ ∥ 𝑚=1
𝛼 𝑀
[CT𝑚 H𝑚 ]) = 𝜎(h𝑡𝑃 ⊙ ∥ 𝑚=1
𝑀
[ℎ𝐶
𝑚,𝑚 ]) (6)
of China) is approximately −1.48, which can be formulated Therefore, the gradient function of FOMAML can be de-
by (Δ𝑦/𝑦)/(Δ𝑝/𝑝) = −1.48. Given this, the tuning samples fined in Formula (13).
of node 𝑖 can be generated according to Formula (8).
𝑆 𝑆
∑︁ 𝜕𝑙 𝑠 ∑︁
∇ 𝜙 𝐿 (𝜙) ≈ = ∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 ) (13)
Δ𝑦 𝑖 = F (Δ𝑝 𝑖 ) = −1.48(Δ𝑝 𝑖 /𝑝 𝑖 )𝑦 𝑖 (8) 𝜕𝜃 𝑠
𝑠=1 𝑠=1
Furthermore, the spillover effect between adjacent areas On this basis, the pre-training module updates 𝜙𝑒 (the initial
caused by price fluctuations needs to be addressed as well. parameter of the 𝑒 𝑡 ℎ epoch) to 𝜙𝑒+1 (the initial parameter of
In general, the fluctuation may impact self and neighboring the next epoch). The samples in buffers are divided into two
charging demands differently (i.e., the continuous or intolera- parts, i.e., Support and Query sets, for obtaining intermediate
ble increase of charging prices leads to self-demand decrease parameters (𝜃 1 , . . . , 𝜃 𝑆 ) and the insightful gradients ∇ 𝜙 𝐿(𝜙),
but neighboring demand increase). Therefore, assuming that respectively. Finally, 𝜙𝑒+1 is updated according to Formula
there are 𝑁 neighbors, the tuning sample of neighbor 𝑗 can (14), where 𝜆 is the learning rate.
be generated according to Formula (9).
∇ 𝜙 𝐿(𝜙)
1 𝜙𝑒+1 = 𝜙𝑒 − 𝜆 (14)
Δ𝑦 𝑗 = − Δ𝑦 𝑖 (9) 𝑆
𝑁
In general, the pre-training module can be invoked in every
The two kinds of tuning samples can be merged to form epoch until the stop condition is reached, which could be a
the tuning dataset. Even though such pairs of impulses and fixed epoch number or a predefined loss boundary.
responses cannot fully express the complex patterns in the real
world, it is sufficient to make the initial parameters inclined
to the sensible optimum. Notably, the tuning-sample buffers E. Algorithm of the Proposed Approach
that reflect knowledge of other restriction laws can be created The proposed approach can first run the pre-training module
by repeating the above steps. by processing the tuning samples (which are created by the
As shown in Fig. 2 (c), a hybrid pre-training strategy based pre-defined physical laws) to prepare optimal initial parame-
on meta-learning is adopted. First, in each learning iteration, it ters, and then update the model by using the training samples
will gradually reduce the proportion of tuning-samples in each (a.k.a., the observed samples). Accordingly, the algorithm of
learning iteration by adding observed samples. Second, First- the model pre-training module is described in Algorithm 1.
order Model-agnostic Meta-learning (FOMAML) is used to Note that it is essential to set an appropriate number of pre-
learn knowledge from multiple sample buffers simultaneously. training rounds, otherwise, it may lead to model curing, i.e.,
To be specific, assuming that there are 𝑆 buffers, the objec- over-fitting to the tuning samples.
tive function of FOMAML to find a set of initial parameters
minimizing the learner loss is defined by Formula 10. Algorithm 1 The model pre-training module
𝑆
Require: A group of real samples, 𝑆 laws of physics and
economics;
∑︁
min 𝐿(𝜙) = 𝑙 𝑠 (𝜃 𝑠 ) (10)
𝑠=1
1: Generate 𝑆 tuning-sample buffers according to the given
laws, divide the buffers by time and prepare Support set
where 𝜃 is the temporal parameter related to the initial param- R 1 and Query set R 2 for each buffer;
eter 𝜙; 𝜃 𝑠 represents the temporary parameter in buffer B𝑠 ; 2: Initialize the model 𝜙 and learning rate 𝜆;
𝑙 𝑠 (𝜃 𝑠 ) is the error calculated by the loss function of the model; 3: for each epoch do
and 𝐿(𝜙) is the total after-training loss of initial parameter 𝜙. 4: for each buffer 𝑠 (in parallel) do
Furthermore, its gradient function can be written as Formula 5: Compute 𝑔1𝑠 = ∇ 𝜙 𝑙 𝑠 (𝜙, R 1𝑠 );
(11), where ∇ 𝜙 𝐿(𝜙) is the gradient of 𝐿(𝜙) with respect to 6: Update 𝜙 to 𝜃 𝑠 with 𝑔1𝑠 ;
𝜙; ∇ 𝜙 𝑙 𝑠 (𝜃 𝑠 ) is the gradient of 𝑙 𝑠 (𝜃 𝑠 ) with respect to 𝜙; and 7: Obtain 𝑔2𝑠 = ∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 , R 2𝑠 );
∇ 𝜃 𝑙 𝑠 (𝜃 𝑠 ) is the gradient of 𝑙 𝑠 (𝜃 𝑠 ) with respect to 𝜃. 8: Gradually reduce the proportion of tuning samples in
𝑆 𝑆
each buffer by mixing in real samples;
∑︁ ∑︁ 𝜕𝑙 𝑠 𝜕𝜃 𝑠 9: end for
∇ 𝜙 𝐿 (𝜙) = ∇ 𝜙 𝑙 𝑠 (𝜃 𝑠 ) = (11)
𝑠=1 𝑠=1
𝜕𝜃 𝑠 𝜕𝜙 10: Integrate all the 𝑔2𝑠 , where 𝑠 = 1, 2, ..., 𝑆
11: Update the global model 𝜙
Because the temporary parameter 𝜃 𝑠 is updated by the 12: end for
global initial parameter 𝜃 in buffer B𝑠 , mathematically 𝜃 𝑠 = 13: Return a set of pre-trained model parameters 𝜙′
𝜙 − 𝜀∇ 𝜙 𝑙 (𝜙), where 𝜀 is the learning rate, then 𝜕𝜃 𝑠 /𝜕𝜙 in
Formula (11) can be rewritten to Formula (12), in which, the
second derivative term is omitted to simplify the calculation. Based on the above-described pre-training process, the
The detailed proofs and evaluations of this first-order approx- overall learning procedure of the proposed approach can be
imation of meta-learning can be found in [12]. defined and described in Algorithm 2. Specifically, according
to the hyperparameters, the prediction model is first initialized
𝜕𝜃 𝑠 𝜀∇ 𝜙 𝑙 (𝜙) by the pre-training. After the pre-trained model is created, it is
=1− ≈1 (12) further customized by the observed samples. Once the model
𝜕𝜙 𝜕𝜙𝜕𝜙
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 7
0 10 20km
Fig. 5. Spatial distribution of public EV charging piles in Shenzhen, China. The studied area contains 247 traffic zones.
TABLE I
T HE LIST OF RUNNING CONFIGURATIONS OF COMPARED METHODS .
respectively, are set according to previous studies. Specifically, of the compared models are based on the suggestions of their
relevant research on graph propagation [35] found that 𝑀 authors, while the shared training settings are deployed to the
should be much smaller than the radius of the studied graph, same values as mentioned in the previous subsection, i.e.,
and another research on attention mechanism [10] suggested the retrospective window size 𝑤, maximum epoch number,
that 𝐾 can be equal to a multiple of feature number. Based on early stop mechanism, batch size, loss function, and optimizer.
pre-experiments, the most appropriate 𝑀 and 𝐾 are 2 and In summary, the important running configurations of these
4. Third, for the residual connection between GAT layers, models are listed in Table I.
𝛽 in Formula (4) is set as 0.5. Fourth, two tuning sample 4) Running environment: Four common metrics for regres-
buffers are generated based on two price elasticities for EV sion are used, i.e., Root Mean Squared Error (RMSE), Mean
charging, namely 1) −1.48, the price elasticity of demand Absolute Percentage Error (MAPE), Relative Absolute Error
for EV charging in Beijing [32]; and 2) −0.228, the average (RAE), and Mean Absolute Error (MAE). The result is the
price elasticity of electricity demand for households across the average of all nodes (i.e., the 247 studied traffic zones). All
global covering the period from 1950 to 2014 [36]. To reach the experiments are conducted on a Windows workstation with
the requirement of meta-learning pre-training, the buffers are an NVIDIA Quadro RTX 4000 GPU and an Intel(R) Core(TM)
further divided into two parts by time, i.e., Support set (Day i9-10900K CPU with 64G RAM.
1-12) and Query set (Day 13-24). Moreover, epoch numbers
of the pre-training and fine-tuning processes are set to 200 and B. Evaluation results and discussions
1000, respectively, with an early stopping mechanism: if the
The performance of evaluated methods is analyzed in three
validation loss does not decrease for 100 consecutive epochs.
aspects, namely 1) the forecasting error to illustrate how well
Fifth, Mean Square Error (MSE) is used as the loss function.
the model is to predict the future; 2) the ablation results to
Finally, to improve the learning performance, Adam [37] is
show the role and necessity of each part of the proposed
used with a mini-batch size of 512, a learning rate of 0.001
model; and 3) the model interpretations to demonstrate how
and a weight decay of 0.00001, respectively.
plausible the model is in handling price fluctuations.
3) Compared methods: Three statistical models and seven 1) Forecasting Error: As shown in Table II, the evaluation
neural networks (NN) are used as baselines. Specifically, the metrics of compared models are summarized. The result shows
three statistical models include Vector Auto-Regression (VAR) that the recurrent neural network LSTM outperforms both
[38], Least absolute shrinkage and selection operator (Lasso) the three statistical models (i.e., VAR, Lasso, and KNN) and
[39] and K-Nearest Neighbor (KNN) [40]. The NNs can be the simple neural network FCNN. While the typical graph
categorized into four groups, namely 1) a typical NN-based learning model GCN has a terrible performance in MAPE,
model: Fully Connected Neural Network (FCNN) [41]; 2) an RAE, and MAE, and also GAT, which is more capable
RNN-based model: Long Short-Term Memory (LSTM) [31]; in weight allocation than GCN, still performs poorly. By
3) two GNN-based models: Graph Convolutional Networks contrast, the combination of RNNs and GNNs (i.e., GCN-
(GCN) [42] and Graph Attention Networks (GAT) [10]; 4) four LSTM, DCRNN, and AST-GAT) is superior to the recurrent
spatiotemporal models: GCN-LSTM [17], STGCN [43] and network (i.e., LSTM). It suggests that graph knowledge needs
DCRNN [44], and AST-GAT [23]. In addition, the proposed to be coupled with temporal patterns for better performance.
model without pre-training, called PAG-, is used as a compared Furthermore, the attention-based baseline (i.e., AST-GAT) out-
model as well. To a fair comparison, the specific configurations performs the two convolution-based models (i.e., GCN-LSTM
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 9
TABLE II
P ERFORMANCE COMPARISON IN DIFFERENT PREDICTION INTERVALS .
MAE PAG
PAG-
AST-GAT
MAPE
RMSE
-2
-4
0.2
0 1 2 km
Occupancy Response (∆y/y)
∆y/y
0 2-hop
1-hop
0.0328 -4.56e-06
0.0374 4.26e-06
-0.1
0.0375 4.90e-06
1-hop 0.0437 5.04e-06
-0.2 0.0438 5.96e-06
2-hop 0.0524 8.43e-06
1.06e-05
-0.3 -0.2 -0.1 0.1 0.2 0.3 local 1.44e-05
-0.2625
Price Fluctuation (∆p/p) 1.54e-05
1.57e-05
Fig. 9. The proposed model’s responses in neighboring areas to price 1.77e-05
fluctuations.
Fig. 10. Example of neighboring responses in the central business district of
the studied city, Shenzhen.
introduces a model pre-training model based on physics-
informed meta-learning. As shown by the evaluation results,
PAG can not only outperform the state-of-the-art methods by Learning may be one way to address this issue. Finally, the
approximately 6.57% on average in forecasting errors but also approach can be enhanced to quantify the spillover effects.
obtain a correct understanding of spillover effects to reduce Such that regulators and managers can better detect impacts
the misinterpretations between charging demands and prices in related policies to optimize the running of intelligent
that are commonly existed in current methods. transportation systems.
Although PAG can achieve state-of-the-art performance, it
still has certain limitations and can be improved in future
work. First, the focus of this work remains on the relationship R EFERENCES
between EV charging demand and price. However, there are [1] S. Powell, G. V. Cezar, L. Min, I. M. L. Azevedo, and R. Rajagopal,
still many misinterpretations about the current deep learning- “Charging infrastructure access and operation to reduce the grid impacts
based models to assist intelligent transportation systems, espe- of deep electric vehicle adoption,” Nature Energy, vol. 7, no. 10, pp.
932–945, 2022.
cially in spatiotemporal scenarios. Further investigations and [2] IEA, “Global ev outlook 2022.” [Online]. Available: [Link]
corresponding optimizations on the stacked model are needed. org/reports/global-ev-outlook-2022
Moreover, the model pre-training needs to be carefully mon- [3] Y. Ren, X. Sun, P. Wolfram, S. Zhao, X. Tang, Y. Kang, D. Zhao, and
X. Zheng, “Hidden delays of climate mitigation benefits in the race for
itored to obtain the correct knowledge, and the introduction electric vehicle deployment,” Nature Communications, vol. 14, no. 1, p.
of an automated critique module trained by Reinforcement 3164, 2023.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 12
[4] Z. Tian, T. Jung, Y. Wang, F. Zhang, L. Tu, C. Xu, C. Tian, and X.-Y. Li, [24] L. N. Do, H. L. Vu, B. Q. Vo, Z. Liu, and D. Phung, “An effective spatial-
“Real-time charging station recommendation system for electric-vehicle temporal attention based neural network for traffic flow prediction,”
taxis,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, Transportation Research Part C: Emerging Technologies, vol. 108, pp.
no. 11, pp. 3098–3109, 2016. 12–28, 2019.
[5] C. Fang, H. Lu, Y. Hong, S. Liu, and J. Chang, “Dynamic pricing for [25] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac-
electric vehicle extreme fast charging,” IEEE Transactions on Intelligent tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–
Transportation Systems, vol. 22, no. 1, pp. 531–541, 2021. 1359, 2010.
[6] Y. Bao, J. Huang, Q. Shen, Y. Cao, W. Ding, Z. Shi, and Q. Shi, [26] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
“Spatial–temporal complex graph convolution network for traffic flow W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten
prediction,” Engineering Applications of Artificial Intelligence, vol. 121, zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551,
p. 106044, 2023. 1989.
[7] I. Ullah, K. Liu, T. Yamamoto, M. Zahid, and A. Jamal, “Modeling of [27] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities
machine learning with shap approach for electric vehicle charging station improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1.
choice behavior prediction,” Travel Behaviour and Society, vol. 31, pp. Atlanta, Georgia, USA, 2013, p. 3.
78–92, 2023. [28] S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, in deep learning: A comprehensive survey and benchmark,” Neurocom-
L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances puting, 2022.
in Neural Information Processing Systems 30: Annual Conference on [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Neural Information Processing Systems 2017, December 4-9, 2017, Long recognition,” in Proceedings of the IEEE conference on computer vision
Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, and pattern recognition, 2016, pp. 770–778.
R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 5998– [30] J. Klicpera, A. Bojchevski, and S. Gunnemann, “Predict then propagate:
6008. Graph neural networks meet personalized pagerank,” in 7th International
[9] S.-Y. Shih, F.-K. Sun, and H.-y. Lee, “Temporal pattern attention for Conference on Learning Representations (ICLR), New Orleans, LA,
multivariate time series forecasting,” Machine Learning, vol. 108, no. 8, USA, May 6-9, 2019. [Link], 2019.
pp. 1421–1441, 2019. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[10] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Li, and Y. Ben- computation, vol. 9, no. 8, pp. 1735–80, 1997.
gio, “Graph attention networks,” in 6th International Conference on [32] Z. Bao, Z. Hu, D. M. Kammen, and Y. Su, “Data-driven approach for
Learning Representations, (ICLR). [Link], 2018. analyzing spatiotemporal price elasticities of ev public charging demands
[11] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural based on conditional random fields,” IEEE Transactions on Smart Grid,
networks: A deep learning framework for solving forward and inverse vol. 12, no. 5, pp. 4363–4376, 2021.
problems involving nonlinear partial differential equations,” Journal of [33] J. Zhou and L. Ma, “Analysis on the evolution characteristics of
Computational Physics, vol. 378, pp. 686–707, 2019. shenzhen residents’ travel structure and the enlightenment of public
[12] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for transport development policy,” Urban Mass Transit, vol. 24, 2021.
fast adaptation of deep networks,” in the 34th International Conference [34] H. Tenkanen and T. Toivonen, “Longitudinal spatial dataset on travel
on Machine Learning (ICML), vol. 70, 2017, pp. 1126–1135. times and distances by different travel modes in helsinki region,”
[13] Z. Shi, Y. Chen, J. Liu, D. Fan, and C. Liang, “Physics-informed Scientific Data, vol. 7, no. 1, 2020.
spatiotemporal learning framework for urban traffic state estimation,” [35] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Lightgcn:
Journal of Transportation Engineering, Part A: Systems, vol. 149, no. 7, Simplifying and powering graph convolution network for recommenda-
p. 04023056, 2023. tion,” in Proceedings of the 43rd International ACM SIGIR Conference
[14] Z. Zhou and T. Lin, “Spatial and temporal model for electric vehi- on Research and Development in Information Retrieval, ser. SIGIR ’20.
cle rapid charging demand,” in 2012 IEEE VEHICLE POWER AND New York, NY, USA: Association for Computing Machinery, 2020, p.
PROPULSION CONFERENCE (VPPC), ser. IEEE Vehicle Power and 639–648.
Propulsion Conference. IEEE, 2012, pp. 345–348, iEEE Vehicle Power [36] X. Zhu, L. Li, K. Zhou, X. Zhang, and S. Yang, “A meta-analysis on the
and Propulsion Conference (VPPC), Seoul, SOUTH KOREA, OCT 09- price elasticity and income elasticity of residential electricity demand,”
12, 2012. Journal of Cleaner Production, vol. 201, pp. 169–177, 2018.
[15] L. Knapen, B. Kochan, T. Bellemans, D. Janssens, and G. Wets, [37] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
“Activity-based modeling to predict spatial and temporal power demand tion,” in 3rd International Conference on Learning Representations,
of electric vehicles in flanders, belgium,” Transportation Research ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
Record, vol. 2287, no. 1, pp. 146–154, 2012. Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
[16] Y. Mu, J. Wu, N. Jenkins, H. Jia, and C. Wang, “A spatial–temporal [38] A. Inoue and L. Kilian, “Inference on impulse response functions in
model for grid impact analysis of plug-in electric vehicles,” Applied structural var models,” Journal of Econometrics, vol. 177, no. 1, pp.
Energy, vol. 114, pp. 456–465, 2014. 1–13, 2013.
[17] S. Yang, W. Ma, X. Pi, and S. Qian, “A deep learning approach [39] R. Tibshirani, “Regression Shrinkage and Selection via The Lasso:
to real-time parking occupancy prediction in transportation networks A Retrospective,” Journal of the Royal Statistical Society Series B:
incorporating multiple spatio-temporal data sources,” Transportation Statistical Methodology, vol. 73, no. 3, pp. 273–282, 04 2011.
Research Part C: Emerging Technologies, vol. 107, pp. 248–265, 2019. [40] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE
[18] X. Xu, T. Zhang, C. Xu, Z. Cui, and J. Yang, “Spatial–temporal Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
tensor graph convolutional network for traffic speed prediction,” IEEE [41] K.-Y. Hsu, H.-Y. Li, and D. Psaltis, “Holographic implementation of
Transactions on Intelligent Transportation Systems, vol. 24, no. 1, pp. a fully connected neural network,” Proceedings of the IEEE, vol. 78,
92–103, 2023. no. 10, pp. 1637–1645, 1990.
[19] Y. Zhang, Y. Li, X. Zhou, J. Luo, and Z.-L. Zhang, “Urban traffic dynam- [42] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
ics prediction—a continuous spatial-temporal meta-learning approach,” convolutional networks,” in 5th International Conference on Learning
ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, Representations (ICLR), Toulon, France, April 24-26, 2017, Conference
jan 2022. Track Proceedings. [Link], 2017.
[20] Y. Xiang, Z. Jiang, C. Gu, F. Teng, X. Wei, and Y. Wang, “Electric [43] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
vehicle charging in smart grid: A spatial-temporal simulation method,” networks: A deep learning framework for traffic forecasting,” in Pro-
Energy, vol. 189, p. 116221, 2019. ceedings of the Twenty-Seventh International Joint Conference on Ar-
[21] T. Yi, C. Zhang, T. Lin, and J. Liu, “Research on the spatial-temporal tificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden.
distribution of electric vehicle charging load demand: A case study in [Link], 2018, pp. 3634–3640.
china,” Journal of Cleaner Production, vol. 242, p. 118457, 2020. [44] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional re-
[22] S. Su, Y. Li, Q. Chen, M. Xia, K. Yamashita, and J. Jurasz, “Operating current neural network: Data-driven traffic forecasting,” in 6th Inter-
status prediction model at ev charging stations with fusing spatiotempo- national Conference on Learning Representations, (ICLR), Vancouver,
ral graph convolutional network,” IEEE Transactions on Transportation BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
Electrification, vol. 9, no. 1, pp. 114–129, 2023. [Link], 2018.
[23] D. Li and J. Lasenby, “Spatiotemporal attention-based graph convolution [45] CAUPD, “the 2022 annual commuting monitoring report for major
network for segment-level traffic prediction,” IEEE Transactions on cities in china,” 2022. [Online]. Available: [Link]
Intelligent Transportation Systems, vol. 23, no. 7, pp. 8337–8345, 2022. cms/report/2022tongqin
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. XX, NO. X, XX 2021 13
[46] P. J. Burke and H. Yang, “The price and income elasticities of natural
gas demand: International evidence,” Energy Economics, vol. 59, pp.
466–474, 2016.
[47] N. Rivers and B. Schaufele, “Gasoline price and new vehicle fuel
efficiency: Evidence from canada,” Energy Economics, vol. 68, pp. 454–
465, 2017.