Journal of Physics: Conference
Series
PAPER • OPEN ACCESS You may also like
- MSTAN: multi-scale spatiotemporal
Energy consumption forecasting with deep attention network with adaptive
relationship mining for remaining useful life
learning prediction in complex systems
Kai Huang, Guozhu Jia, Zeyu Jiao et al.
- Energy demand transitions and climate
To cite this article: Yunfan Li 2024 J. Phys.: Conf. Ser. 2711 012012 mitigation in low-income urban households
in India
Radhika Khosla, Neelanjan Sircar and
Ankit Bhardwaj
- Estimating residential hot water
View the article online for updates and enhancements. consumption from smart electricity meter
data
Joseph L Bongungu, Paul W Francisco,
Stacy L Gloss et al.
This content was downloaded from IP address 154.192.8.96 on 16/10/2024 at 11:50
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
Energy consumption forecasting with deep learning
Yunfan Li
Beijing University of Posts and Telecommunications, 10 Xitucheng Road, Haidian
District, Beijing, 100876, China
[email protected]
Abstract. This research endeavors to create an advanced machine learning model designed for
the prediction of household electricity consumption. It leverages a multidimensional time-series
dataset encompassing energy consumption profiles, customer characteristics, and meteorological
information. A comprehensive exploration of diverse deep learning architectures is conducted,
encompassing variations of recurrent neural networks (RNNs), temporal convolutional networks
(TCNs), and traditional autoregressive moving average models (ARIMA) for reference purposes.
The empirical findings underscore the substantial enhancement in forecasting accuracy attributed
to the inclusion of meteorological data, with the most favorable outcomes being attained through
the application of time-series convolutional networks. Additionally, an in-depth investigation is
conducted into the impact of input duration and prediction steps on model performance,
emphasizing the pivotal role of selecting an optimal duration and number of steps to augment
predictive precision. In summation, this investigation underscores the latent potential of deep
learning in the domain of electricity consumption forecasting, presenting pragmatic
methodologies and recommendations for household electricity consumption prediction.
Keywords: Energy Consumption Prediction, Deep Learning, Long Short-Term Memory,
Temporal Convolutional Network, Gated Recurrent Unit.
1. Introduction
With the growing concern regarding the close relationship between energy consumption and
environmental sustainability, forecasting electricity usage is gaining importance as a key aspect of
energy management and planning. Accurately predicting household electricity consumption is not only
crucial for energy companies in terms of resource allocation and grid scheduling but also directly
impacts the energy cost management for household users. Nevertheless, household electricity
consumption forecasting encounters several challenges, including the complexity of multidimensional
time series, the influence of seasonal and weather changes, and the modeling of long-term dependencies.
In this context, deep learning technology has emerged as a forefront research area in electricity
consumption forecasting, owing to its exceptional feature extraction and modeling capabilities. In recent
times, the field of time series forecasting has witnessed notable advancements driven by deep learning
models. These models include recurrent neural networks (RNNs) and their various iterations, as well as
the more recent introduction of time series convolutional networks (TCNs). These models not only excel
at capturing intricate patterns within time series data but also adeptly handle seasonal variations and
nonlinear relationships. Additionally, the introduction of external factors, such as weather data, has
further enhanced the predictive performance of these models.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
This paper delves into the approach of constructing deep learning models for predicting household
energy consumption using an extensive multidimensional time-series dataset. The dataset encompasses
energy consumption curves, customer profiles, and weather data, spanning a continuous 12-month
period. We will explore various deep learning architectures and analyze their predictive capabilities
while considering different input features. Through this research, our objective is to provide a more
precise and practical solution for household electricity consumption forecasting, thereby offering robust
support for energy management and planning.
2. Related work
As industrial development continues, the demand for electricity consumption in our country is steadily
rising, often leading to situations of insufficient supply. Accurate power consumption forecasting plays
a crucial role in helping enterprises optimize power resource allocation, enhance power utilization,
reduce wastage, and ultimately save costs. Furthermore, power consumption forecasting aids companies
in planning their power supply more effectively, thereby enhancing the reliability and sustainability of
their power infrastructure and, consequently, the overall power consumption environment.
However, forecasting electricity consumption is a complex task influenced by various factors,
including user demand and weather conditions. Therefore, a straightforward linear prediction model is
inadequate to address this multifaceted challenge. With the ongoing advancements in big data and
artificial intelligence technology [1], an array of forecasting models have emerged and found success
across diverse applications, including power forecasting [2-4], traffic flow forecasting [5-7], network
traffic forecasting [8-11], and financial forecasting [12, 13].
Dai and his research team (Dai Y et al.) introduced an inventive method for short-term power load
prediction. They incorporated a bidirectional long short-term memory (Bi-LSTM) network within a
sequence-to-sequence (Seq2Seq) framework [14]. In the data preprocessing phase, their method
effectively handled holiday load data by employing the random forest feature selection algorithm and
the weighted gray relational projection algorithm. These steps proved beneficial in addressing the
complexities associated with load forecasting during holiday periods.
Criado-Ramón and his team (Criado-Ramón D et al.) conducted an evaluation of models focused on
sequential patterns, particularly for extensive datasets [15]. Their algorithm stands out by utilizing a
genetic algorithm not only to optimize the amount of clusters but also to fine-tune all other
hyperparameters of the prediction model. This approach represents a departure from earlier methods that
commonly relied on the cluster validity index (CVI).
In the realm of long-term load forecasting, Wang (Wang K) proposed an ensemble learning-based
model called LSTM-Informer [16]. This model features a two-layer architecture. The lower layer
employs a long short-term memory network (LSTM) to capture short-term temporal correlations within
electric load data. Meanwhile, the upper layer incorporates an Informer model to address long-term
correlations in load forecasting.
Caro E et al. introduced an algorithm for determining optimal weather station selections, including
the optimal number of stations and their specific geographical locations, resulting in a significant
reduction in forecast error [17].
Li J and Wei S presented a manifold learning-based MTLF (Multi-Timescale Load Forecasting)
method designed to extract latent factors of load changes, thereby enhancing the accuracy of MTLF
while significantly reducing computational requirements [18]. The proposed MTLF approach was
rigorously tested on the Independent System Operator (ISO) New England dataset, producing load
forecasts for timeframes of 24, 168, and 720 hours into the future. Numerical results corroborated the
method's superior predictive accuracy compared to many well-established methods, particularly on
medium-term timescales.
Chen and Zhao (Chen Y and Zhao J) introduced an innovative approach that seamlessly integrates
metamodeling and optimization processes within a Predictive Data Framework (PDF) [19]. They also
presented a thorough examination of a meta-modeling algorithm grounded in computer experiments,
along with a heuristic exhaustive search optimization algorithm. Their findings highlight that the
2
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
practice of data curation prior to constructing predictive models yields notably superior results compared
to the direct application of machine learning model building.
A comprehensive model that was intended for short-term wind power forecasting was presented by
Wang and Wu (Wang Y and Wu Y) [20]. This model was anchored on the T-LSTNet_markov
architecture. Their exhaustive methodology is comprised of a number of essential stages, the first of
which is the preprocessing and enhancement of the initial dataset. After that, they use the T-LSTNet
model to make predictions based on the raw data on the power generated by the wind. An error correction
step is added, which makes use of the k-means approach in conjunction with the weighted Markov
process in order to further improve the predicted accuracy of the model. The empirical findings, which
were produced from a case study that was based on a wind farm that was located in the Inner Mongolia
Autonomous Region of China, clearly demonstrate a notable improvement in forecast precision after
the error correction process was carried out.
3. Data description
This study investigates the concept of predicting the energy consumption of households as well as the
development of predictive algorithms. This inquiry makes use of a dataset that is available to the general
public and was initially detailed in a research article with the title "Evaluating Short-Term Forecasting
of Multiple Time Series in IoT Environments." The following web address (URL) provides access to
this dataset: https://fordatis.fraunhofer.de/handle/fordatis/215. It is comprised of three separate files that
are saved in the xlsx file format. These files are titled as follows: 20201015_consumption.xlsx (which
has a file size of 21.06 MB), 20201015_profiles.xlsx (which has a file size of 19.21 kB), and
20201015_weather.xlsx (which has a file size of 18 MB). These files include a wide variety of data,
such as energy consumption profiles for 499 customers, which are characterized by time series data with
hourly resolution, as well as customer profiles and weather data. Additionally, these files contain
information regarding the weather. The latter explicitly refers to the outside temperature time series data,
which also has an hourly resolution and corresponds to the appropriate regions of each of the customer's
locations.
The data for this study cover twelve consecutive months and are divided according to the classic
75%:25% split ratio, segregating the data into training and testing sets, as depicted in Figure 1. Under
this division, approximately nine months of data are allocated for model training, with the remaining
three months reserved for model testing. The temporal span of the data extends from January 1st, 2019,
at 00:00 hours to December 31st, 2019, at 23:00 hours, with hourly time intervals. Utilizing this data,
we generated the following line chart, where the x-axis denotes time, and the y-axis denotes energy
consumption. In the chart, the blue region represents the training set data, while the red region denotes
the testing set data. This partitioning approach ensures a comprehensive evaluation of model
performance across different time periods.
3
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
Figure 1. Training and test split.
In the foundational methodologies, we commence with an elementary strategy of employing the most
recent known value as the predictive estimate for the current time point. We illustrate the application of
this technique by employing an individual user's energy consumption dataset, which encompasses
anomalous observations. The utilization of the most recent known value as the predictive estimate for
the current time point serves as the establishment of a fundamental predictive model. During the training
phase, we juxtapose actual values against their corresponding previous values and calculate a series of
error metrics. These metrics encompass bias, root mean squared error (RMSE), mean squared error
(MSE), mean absolute percentage error (MAPE), and mean absolute error (MAE).
Subsequently, we explore the simple moving average method with a sliding window of size five
hours. This method calculates the moving average value within a specified period (e.g., five hours) and
updates it as time progresses. We apply this method to the training set data, calculate predicted values
using the moving average, and compute corresponding error metrics for both the training and testing
sets.
Furthermore, we introduce the autoregressive integrated moving average (ARIMA) model, which
leverages past time step values and relevant lag terms to predict future values. We train an ARIMA
model using a specific user's data as an example and utilize the trained model to predict the testing set
data. We evaluate the model's performance using ARIMA's predicted results and the previously
mentioned error metrics.
We have decided to use the Vector Autoregression (VAR) model to anticipate future energy
consumption since it takes into account a number of climatic variables and adheres to stringent processes
for both the processing of data and the selection of models. The major purpose of this research is to
analyze the complex relationship that exists between consumption patterns and the climatic
circumstances that are experienced. The first thing that needs to be done is to combine the data on
consumption as well as the data on the weather that belong to the particular user who is being considered.
This will result in one uniform dataset. Within the context of this consolidated dataset, 'consumption'
and 'weather' play the parts of input features, with 'consumption' functioning as the variable that will be
analyzed. Following that, we split the dataset into two separate subsets: a training set, which contains
data spanning the period from January 1, 2019, to September 30, 2019, and a testing set, which covers
the period from October 1, 2019, to December 31, 2019.
To determine the appropriate lag order for the VAR model, we employ information criteria, including
AIC, BIC, FPE, and HQIC. Based on the results, we select a lag order of 24 as the optimal model
4
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
parameter. We apply this parameter to the VAR model and fit it. The following are summary statistics
of the model, along with regression coefficients and statistical significance related to the relationship
between consumption and weather.
The model results reveal a close correlation between consumption and the current hour, as well as
consumption over the previous several hours within a lag period of 24 hours, and weather factors.
However, it should be noted that some lag period regression coefficients do not exhibit significance.
From the model fit, we gain preliminary insights into the extent of consumption's relationship with
weather.
Next, we predict electricity consumption in the testing set using training set data. Leveraging the
VAR model, we predict hourly consumption based on historical information from the training set data.
We then compute error metrics, including bias (BIAS), root mean squared error (RMSE), mean squared
error (MSE), mean absolute percentage error (MAPE), and mean absolute error (MAE), for the predicted
values against the actual values.
The final results demonstrate that the VAR model exhibits certain predictive capabilities for
electricity consumption. However, compared to the previously used ARIMA model, there is an increase
in prediction errors. Given the complexity of prediction and model performance, we will further explore
deep learning methods to potentially enhance prediction.
4. Model description
4.1. Gated recurrent unit (GRU)
The Gated Recurrent Unit (GRU) [21] stands as a variant of recurrent neural networks (RNNs) tailored
for processing sequential data. It effectively addresses the vanishing gradient problem and exhibits
superior performance in capturing long-range dependencies compared to conventional RNNs. GRUs
have demonstrated success across various tasks, including sequence modeling and natural language
processing.
The GRU unit comprises two pivotal components: an update gate and a reset gate, which wield
control over the flow and retention of information. These gates endow GRUs with the capability to
manage long-range dependencies adeptly. The computational process of the GRU can be represented
using the following equations:
1. Reset Gate:
𝑟𝑡 = 𝜎(𝑊𝑟 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
2. Update Gate:
𝑧𝑡 = 𝜎(𝑊𝑧 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
3. Candidate Hidden State:
ℎ̃𝑡 = 𝑡𝑎𝑛ℎ(𝑊ℎ ∙ [𝑟𝑡 ⊙ ℎ𝑡−1 , 𝑥𝑡 ])
4. Updated Hidden State:
ℎ𝑡 = (1 − 𝑧𝑡 ) ⊙ ℎ𝑡−1 + 𝑧𝑡 ⊙ ℎ̃𝑡
Here are the key components:
∙ 𝑥𝑡 is the input at the current time step.
∙ ℎ𝑡−1 signifies the hidden state from the preceding time step.
∙ 𝑟𝑡 signifies the reset gate, which regulates the influence of the previous hidden state on the candidate
state.
∙ 𝑧𝑡 stands as the update gate, governing the equilibrium between retaining the previous hidden state
and employing the candidate state.
∙ ℎ̃𝑡 denotes the candidate hidden state, its computation dependent on the present input and the reset
gate.
∙ 𝑊𝑟 , 𝑊𝑧 , 𝑊ℎ correspond to weight matrices.
∙ 𝜎 denotes the sigmoid function.
∙⊙ represents elementwise multiplication.
5
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
4.2. Long Short-Term memory (LSTM)
Within the field of recurrent neural networks (RNNs), the Long Short-Term Memory (LSTM) model is
an important step forward in technological development. It has been painstakingly designed in order to
address the problem of the disappearing gradient while also being capable of capturing complicated
dependencies within sequential data [22]. LSTM presents a memory cell structure that is more complex
than the traditional RNN design, which is how it differentiates itself from that architecture. Because of
this breakthrough, its application has been ubiquitous across a variety of fields, including sequence
modeling, language translation, and speech recognition.
Central to the LSTM architecture is the inclusion of a cell state, which functions as a conveyor for
information transfer across various time steps. Additionally, it features three crucial gates: the output
gate the forget gate, and the input gate. These gates meticulously manage the flow of information into
and out of the cell state.
The computations within LSTM can be elucidated using the following equations:
1. Forget Gate:
𝑓𝑡 = 𝜎(𝑊𝑓 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
2. Input Gate:
𝑖𝑡 = 𝜎(𝑊𝑖 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
3. Candidate Cell State:
𝐶̃𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝐶 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
4. Updated Cell State:
𝐶𝑡 = 𝑓𝑡 ⊙ 𝐶𝑡−1 + 𝑖𝑡 ⊙ 𝐶̃𝑡
5. Output Gate:
𝑜𝑡 = 𝜎(𝑊𝑜 ∙ [ℎ𝑡−1 , 𝑥𝑡 ])
6. Hidden State:
ℎ𝑡 =𝑜𝑡 ⊙ tanh(𝐶𝑡 )
Here are the key components:
∙ 𝑥𝑡 is the input at the current time step.
∙ ℎ𝑡−1 signifies the hidden state from the preceding time step.
∙ 𝑓𝑡 signifies the forget gate, determining which information is to be discarded from the cell state.
∙ 𝑖𝑡 is the input gate, governing the introduction of new information into the cell state.
∙ 𝐶̃𝑡 is the candidate cell state, potentially incorporated into the cell state.
∙ 𝐶𝑡 stands for the updated cell state, a blend of the prior cell state and the candidate state.
∙ 𝑜𝑡 represents the output gate, controlling the extent to which the cell state contributes to the hidden
state.
∙ Weight matrices 𝑊𝑓 𝑊𝑖 𝑊𝐶 𝑊𝑜 are used in the calculations.
∙ 𝜎 plays a fundamental role within the gating mechanisms.
∙tanh is utilized in specific computations.
∙⊙ signifies elementwise multiplication.
4.3. Temporal convolutional network (TCN)
The Temporal Convolutional Network (TCN) [23] is a type of deep learning model that has been
carefully crafted for the purpose of the processing of sequential input. TCN makes use of dilated
convolutions, as opposed to the typical recurrent models, in order to properly capture long-range
dependencies within the data. TCN has attracted a lot of interest due to the effectiveness with which it
handles sequences of varied lengths and the extremely parallelizable architecture that it possesses.
The input sequence is processed by TCN using a series of stacked convolutional layers that are dilated.
Because of these dilated convolutions, the model is now able to investigate a more extensive context
window without requiring a substantial increase in the amount of parameters. As a consequence of this,
the model is able to effectively represent both long-term and short-term dependencies that exist inside
the sequence.
6
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
The pivotal characteristic of TCN lies in its utilization of dilated convolutions. The computational
process can be depicted through the following equation:
𝑘
𝑦𝑡 = ∑ 𝑤𝑖 ∙ 𝑥𝑡+(𝑖−1)∙𝑑
𝑖=1
Here’s the breakdown:
∙ 𝑦𝑡 represents the output at time step t.
∙ 𝑥𝑡+(𝑖−1)∙𝑑 corresponds to input elements, each subjected to a dilation factor d.
∙ 𝑤𝑖 stands for the convolutional filter weights positioned at index i.
∙k denotes the kernel size, effectively determining the size of the receptive field.
∙ 𝑑 serves as the dilation factor, governing the spacing between filter elements.
GRU and LSTM have the advantages of capturing the long-term dependency. However, they are
troubled by the gradient vanishing problem from the theoretical aspect, making them less desirable for
the energy consumption forecasting problem considered in this study. On the other hand, TCN has the
advantage of efficient training and strong learning capacity.
5. Experiment and discussion
As part of the scope of this inquiry, we evaluated the overall performance of a number of different
models that anticipate power use. In order to carry out this examination, we made use of three distinct
metrics: the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute
percentage error (MAPE). The various models that were investigated include the Autoregressive
Integrated Moving Average (ARIMA), the Simple Moving Average (SMA), the Vector Autoregressive
(VAR), the Long Short-Term Memory (LSTM), the Gated Recurrent Unit (GRU), and the Temporal
Convolutional Network (TCN). The TCN model was shown to have consistently produced the most
positive findings across all three measures, which is indicative of its superior fitting ability when
compared to its contemporaries. This was discovered through the use of stringent experimental analysis.
We support the incorporation of weather-related information in multi-time series prediction initiatives
wherever it is practicable. In pragmatic applications, this is especially important when taking into
account the influence that meteorological variables have on the accuracy of predictions.
RMSE, which stands for Root Mean Square Error, serves as a metric for quantifying the disparity
between observed values and ground truth values. The formula for RMSE is defined as follows:
𝑚
1
𝑅𝑀𝑆𝐸(𝑋, ℎ) = √ ∑(ℎ(𝑥𝑖 ) − 𝑦𝑖 )2
𝑚
𝑖=1
Conversely, MAE, denoting Mean Absolute Error, represents the mean of absolute error values. Its
calculation is as follows:
𝑚
1
𝑀𝐴𝐸(𝑋, ℎ) = ∑ |ℎ(𝑥𝑖 ) − 𝑦𝑖 |
𝑚
𝑖=1
The MAPE (Mean Absolute Percentage Error) is a metric sensitive to relative errors, remaining
unchanged due to the global scaling of the target variable. It is suitable for problems with relatively large
differences in target variable magnitude. The expression for MAPE is:
𝑚
1 |ℎ(𝑥𝑖 ) − 𝑦𝑖 |
𝑀𝐴𝑃𝐸(𝑋, ℎ) = ∑
𝑚 𝑦𝑖
𝑖=1
The evaluation results are summarized in Table 1. For RMSE, MAE, and MAPE, smaller values
indicate better model performance. The performance differences between different models can be seen
from Table 1. Deep learning models have a better performance than traditional models in general. And
TCN has the smallest error with the weather information.
7
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
Table 1. Evaluation results of different models.
Model RMSE MAE MAPE
Without Weather
Last Value 0.391 0.245 260.359
SMA 0.381 0.271 216.713
ARIMA 0.364 0.255 191.795
LSTM 0.338 0.223 171.605
BiLSTM 0.315 0.224 162.325
GRU 0.303 0.208 158.767
BiGRU 0.304 0.207 159.414
TCN 0.301 0.206 134.407
With Weather
VAR 0.341 0.255 250.304
LSTM 0.301 0.221 161.958
BiLSTM 0.313 0.194 144.605
GRU 0.298 0.205 158.021
BiGRU 0.304 0.206 150.240
TCN 0.296 0.191 89.445
Some examples of the forecasting results of TCN with and without weather are shown in Figure 2
and Figure 3.
Figure 2. TCN model performance without weather information.
8
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
Figure 3. TCN model performance with weather information.
Subsequently, we further evaluated the TCN model in greater depth by analyzing the effect of the
input time length as well as the output prediction step. Figures 4 and 5 exhibit the findings in their
respective formats.
Figure 4. Influence of the input time length on the TCN model.
9
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
Figure 5. Influence of the output prediction step on the TCN model.
Observing the experimental results, we can draw the following conclusions:
For all models that do not include weather data, RMSE, MAE, and MAPE show a decreasing trend,
indicating a gradual enhancement in predictive capability when weather data are excluded. Among these,
the TCN model achieved the lowest RMSE, MAPE, and MAE. This suggests that TCN and GRU exhibit
favorable electricity consumption prediction performance in the absence of weather data.
Upon introducing weather data, the predictive capabilities of the models generally improve,
particularly the TCN model, which saw a significant improvement in MAPE from 134.407 to 89.445,
signifying a substantial enhancement in prediction accuracy. This strongly underscores the importance
of incorporating weather data for boosting predictive performance.
Among all models that consider weather data, TCN exhibits the best performance, achieving the
lowest RMSE (0.296), MAPE (89.445), and MAE (0.191).
When comparing models with and without weather data, it becomes evident that models
incorporating weather data typically offer more accurate predictions. Among these, the TCN model
benefits the most. Nonetheless, it is essential to highlight that the LSTM and GRU models experience
relatively smaller increases in RMSE and MAE upon the inclusion of weather data.
6. Conclusion
Based on the experimental outcomes in this study, it is evident that incorporating weather data proves
to be an effective approach for enhancing the accuracy of user electricity consumption predictions.
Furthermore, the TCN model emerges as the optimal choice for predictive performance when integrated
with weather data. To enhance the precision of predicting user electricity consumption, we advocate for
the integration of the TCN model in tandem with weather data. This synergy presents promising avenues
for future research exploration: The first research direction involves exploring the application of graph-
based deep learning models [24] for energy consumption forecasting, as these models have demonstrated
effectiveness in addressing similar problems. The second research direction entails the implementation
of distributed learning techniques [25], which would be well-suited for real-world systems, enabling
scalable and efficient predictions. The third research direction revolves around the joint forecasting of
weather and energy consumption [26], a potentially more effective approach that considers the interplay
10
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
between these two critical factors. This holistic approach could lead to enhanced predictive capabilities
and improved energy management strategies.
References
[1] Xu, Z., Tang, N., Xu, C., & Cheng, X. (2021). Data science: connotation, methods, technologies,
and development. Data Science and Management, 1(1), 32-37.
[2] Jiang, W. (2022). Deep learning based short‐term load forecasting incorporating calendar an
-d weather information. Internet Technology Letters, 5(4), e383.
[3] Xu, A., Tian, M. W., Firouzi, B., Alattas, K. A., Mohammadzadeh, A., & Ghaderpour, E.
(2022). A new deep learning Restricted Boltzmann Machine for energy consumption for
ecasting. Sustainability, 14(16), 10081.
[4] Hong, Y., Wang, D., Su, J., Ren, M., Xu, W., Wei, Y., & Yang, Z. (2023). Short-Term Pow
-er Load Forecasting in Three Stages Based on CEEMDAN-TGA Model. Sustainability,
15(14), 11123.
[5] Jiang, W., & Zhang, L. (2018). Geospatial data to images: A deep-learning framework for t
-raffic forecasting. Tsinghua Science and Technology, 24(1), 52-64.
[6] Jiang, W., & Luo, J. (2022). Graph neural network for traffic forecasting: A survey. Expert
Systems with Applications, 207, 117921.
[7] Jiang, W., Luo, J., He, M., & Gu, W. (2023). Graph Neural Network for Traffic Forecastin
g: The Research Progress. ISPRS International Journal of Geo-Information, 12(3), 100.
[8] Ferreira, G. O., Ravazzi, C., Dabbene, F., Calafiore, G. C., & Fiore, M. (2023). Forecasting
Network Traffic: A Survey and Tutorial with Open-Source Comparative Evaluation. IE
EE Access.
[9] Jiang, W. (2022). Cellular traffic prediction with machine learning: A survey. Expert Syste
ms with Applications, 201, 117163.
[10] Jiang, W. (2022). Internet traffic matrix prediction with convolutional LSTM neural networ-
k. Internet Technology Letters, 5(2), e322.
[11] Jiang, W. (2022). Internet traffic prediction with deep neural networks. Internet Technology
Letters, 5(2), e314.
[12] Kumbure, M. M., Lohrmann, C., Luukka, P., & Porras, J. (2022). Machine learning techniq
-ues and data for stock market forecasting: A literature review. Expert Systems with Ap-
plications, 197, 116659.
[13] Jiang, W. (2021). Applications of deep learning in stock market prediction: recent progress.
Expert Systems with Applications, 184, 115537.
[14] Dai, Y., Yang, X., & Leng, M. (2023). Optimized Seq2Seq model based on multiple metho
-ds for short-term power load forecasting. Applied Soft Computing, 142, 110335.
[15] Criado-Ramón, D., Ruiz, L. G. B., & Pegalajar, M. C. (2023). An Improved Pattern Seque-
nce-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artifici
-al Neural Networks. Big Data and Cognitive Computing, 7(2), 92.
[16] Wang, K., Zhang, J., Li, X., & Zhang, Y. (2023). Long-Term Power Load Forecasting Usin
-g LSTM-Informer with Ensemble Learning. Electronics, 12(10), 2175.
[17] Caro, E., Juan, J., & Nouhitehrani, S. (2023). Optimal Selection of Weather Stations for El
-ectric Load Forecasting. IEEE Access.
[18] Li, J., Wei, S., & Dai, W. (2021). Combination of manifold learning and deep learning alg-
orithms for mid-term electrical load forecasting. IEEE Transactions on Neural Networks
and Learning Systems.
[19] Chen, Y., Zhao, J., Qin, J., Li, H., & Zhang, Z. (2021). A novel pure data-selection frame
work for day-ahead wind power forecasting. Fundamental Research.
[20] Wang, Y., Wu, Y., Xu, H., Chen, Z., Gao, J., Xu, Z., & Li, L. (2023). A combination predi
-cting methodology based on T-LSTNet_Markov for short-term wind power prediction.
Network: Computation in Neural Systems, 1-23.
11
2023 International Conference on Machine Learning and Automation IOP Publishing
Journal of Physics: Conference Series 2711 (2024) 012012 doi:10.1088/1742-6596/2711/1/012012
[21] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Rec
-urrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555.
[22] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation,
9(8), 1735-1780.
[23] Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutio
nal and Recurrent Networks for Sequence Modeling. arXiv preprint arXiv:1803.01271.
[24] Jiang, W. (2022). Graph-based deep learning for communication networks: A survey. Comp
-uter Communications, 185, 40-54.
[25] Jiang, W., He, M., & Gu, W. (2022). Internet Traffic Prediction with Distributed Multi-Age
nt Learning. Applied System Innovation, 5(6), 121.
[26] Jiang, W., & Luo, J. (2022). An evaluation of machine learning and deep learning models f
-or drought prediction using weather data. Journal of Intelligent & Fuzzy Systems, 43(3),
3611-3626.
12