0% found this document useful (0 votes)

55 views

Powernet: A Smart Energy Forecasting Architecture Based On Neural Networks

Uploaded by

kkarthiks

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Powernet: A Smart Energy Forecasting Architecture Based On Neural Networks

Uploaded by

kkarthiks

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345970274

PowerNet: A Smart Energy Forecasting Architecture based on Neural

Networks

Article · October 2020

DOI: 10.1049/iet-smc.2020.0003

CITATIONS READS

0 3

7 authors, including:

Partha Biswas
Nanyang Technological University
43 PUBLICATIONS 362 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Evolutionary algorithms for power system optimisation View project

Enhancing smart grid security and resilience View project

All content following this page was uploaded by Partha Biswas on 17 November 2020.

The user has requested enhancement of the downloaded file.

IET Research Journals

PowerNet: A Smart Energy Forecasting ISSN 1751-8644

doi: 0000000000
www.ietdl.org
Architecture based on Neural Networks
Yao Cheng1 Chang Xu2 Daisuke Mashima3 Partha P. Biswas3 Geetanjali Chipurupalli3 Bin Zhou3
Yongdong Wu4
1
Institute for Infocomm Research, A*STAR, Singapore.
2
School of Computing Engineering, Nanyang Technological University, Singapore.
3
Advanced Digital Sciences Center, Singapore.
4
Jinan University, Guangzhou, China
* E-mail: Daisuke Mashima: daisuke.m@adsc-create.edu.sg

Abstract:
Electricity demand forecasting is a critical task for efficient, reliable and economical operation of the power grid, which is one of the
most essential building blocks of smart cities. Accurate forecasting allows grid operators to properly maintain the balance of supply
and demand as well as to optimize operational cost for generation and transmission. Several efforts have been made recently to
apply various machine learning techniques for this purpose. This article proposes a novel neural network architecture PowerNet
which can incorporate multiple heterogeneous features such as historical energy consumption data, weather data and calendar
information for the demand forecasting task. Using real-world smart meter dataset, we conduct an extensive evaluation to show
the advantages of PowerNet over recently-proposed machine learning methods such as Gradient Boosting Tree (GBT), Support
Vector Regression (SVR), Random Forest (RF) and Gated Recurrent Unit (GRU). PowerNet demonstrates notable performance in
reducing both the median and worst-case prediction errors when forecasting demands of individual residential households. We fur-
ther provide empirical results concerning the two operational considerations that are crucial when using PowerNet in practice: the
time horizon the model can predict with a decent accuracy and the frequency of training the model to retain its modeling capability.
Finally, we briefly discuss a multi-layer anomaly/electricity-theft detection approach based on PowerNet demand forecasting.

1 Introduction mind, we propose a novel forecasting neural network architecture

named as PowerNet. We take into account a set of features from
The smart grid is an integral part of modern smart cities. It is the three heterogeneous dimensions - the historical consumption data,
enhanced electrical grid that takes advantage of sensing and informa- the weather information and the calendar information, all of which
tion communication technologies to improve the efficiency, reliabil- are considered influential on electricity customers’ power consump-
ity and security of traditional power grid. Compared to the traditional tion patterns. In each dimension, a set of features is developed.
power grid, entities in the smart grid are able to obtain timely power Thereafter, we introduce our model PowerNet which is capable of
grid status of many kinds. Smart metering, which is a major improve- incorporating all the designed features. The key property of Pow-
ment brought by the smart grid, facilitates real-time metering and erNet is the ability to model both sequential data (i.e., historical
reporting of electricity consumption data. One resulting benefit is consumption data) and non-sequential data (i.e., weather and cal-
that the accurate, fine-grained power demand forecasting can be car- endar information) in a unified manner. The underpinning idea lies
ried out based on such meter measurement. Such forecasting based in the use of recurrent neural network for encoding dependen-
on the historical data facilitates power generation scheduling and cies implied in sequential data and multilayer perceptron network
power dispatching in a future period. for capturing correlations between non-sequential features and pre-
Demand forecasting is important in demand management for both dictions. In order to evaluate the effectiveness of our model, we
power companies and electricity customers [1]. Power companies compare PowerNet with four state-of-the-art demand forecasting
can allocate proper resources to balance the supply and demand techniques which are Gradient Boosting Tree (GBT) [2], Support
based on the demand forecasting results. They can also adjust the Vector Regression (SVR) [3], Random Forest (RF) [4][5] and Gated
demand response strategy such as dynamic pricing to shape the load Recurrent Unit (GRU) [6]. We show that the performance of our
so as to avoid the infrastructure capacity strain or to avoid additional proposed PowerNet model is competitive in the case studies of pre-
cost for starting plants operating near to their peak. Furthermore, dicting demands of smart apartments in a smart city. Furthermore,
the utilities can detect abnormal meter measurements, caused either we tackle two crucial questions that need to be answered when oper-
by unexpected meter failures or deliberate meter manipulation, by ating PowerNet in practice: how far in the future the model can
identifying those measurements that do not conform to the pre- forecast with a reasonable accuracy and how often should we train
dicted/expected values. For the electricity customers, power demand the forecasting model to retain its modeling capability? Lastly, we
forecasting provides them with their expected power consumption discuss a multilayer data-driven anomaly detection approach based
and cost in a future period under dynamic pricing strategy, so that on PowerNet.
they can adjust their usage schedule accordingly to achieve a lower The contributions of this work are summarized below.
cost. Therefore, the importance of accurate demand forecasting for
effective and efficient management of the smart grid in a smart urban
set up is paramount. • We propose PowerNet, a novel power demand forecasting neural
Although demand forecasting has been widely studied for years, network that captures heterogeneous features in a unified way.
attaining high accuracy in forecasting is a challenge as power • We compare PowerNet with four representative models adopted
demand is dependent upon various factors which may have discrim- in recent research works, i.e., GBT, SVR, RF and GRU. The results
inative capability in influencing the demand. With this challenge in show that PowerNet can reduce and bound the error over these

IET Research Journals, pp. 1–9

© The Institution of Engineering and Technology 2020 1
2.2 Feature Design
Dataset
The features used by the existing forecasting models fall into
three categories in terms of privacy issue, i.e., publicly available
information (e.g., weather information and calendar information),
Historical Weather Calendar household private information (e.g., demography) and quasi-private
Consumption Information Information information (e.g., historical consumption data acquired by power
utility companies). The quasi-private information here is defined as
Feature Design privacy-related but not public available data. For example, the histor-
ical electricity consumption data can be used to infer certain private
household characteristics [8][9], but it is only available to the autho-
PowerNet Training rized personnel within power utility companies instead of to the
public.
Although it is natural that private household data would have a
PowerNet Model direct influence on the household power demand, e.g., more people
living in the house leads to more power demand, in this work, we
Fig. 1: Approach Overview limit the predictors to non-private information due to the following
reasons. First of all, we would like to involve no household-specific
data in forecasting procedure other than power meter readings due
competitors, even in the prediction of individual apartment energy to user privacy concern. Secondly, some utility companies may have
demand. access to household private data such as locations. However, it is
• We further evaluate the forecasting model under different fore- not common for utility companies to have other private information,
casting duration and re-training frequency using publicly available for example, the demography information. Thirdly, the forecasting
datasets. model independent of the house specific data can be applied to larger
• We provide brief discussion on potential application of PowerNet scales easily, such as building level or area level.
for anomaly/electricity-theft detection. We develop three categories of features from the dataset, i.e.,
historical consumption data, weather information and calendar infor-
The rest of this paper is organized as follows. In Section 2, we dis- mation. Historical consumption data is the actual observation of
cuss the features to be incorporated into power demand forecasting. the prediction target, which directly reflects the consumption pat-
Section 3 elaborates the design of PowerNet. We discuss evaluation tern. Power utility companies can get this data by reading power
results, including comparison with state-of-the-art techniques and meters. Weather information has an influence on the power demand
empirical results to answer the aforementioned questions for prac- since some appliances are sensitive towards weather conditions. For
tical operation in Section 4, followed by a brief discussion about example, the use of air conditioner depends on the temperature
the application for anomaly detection in Section 5. Related work is and humidity. Calendar information, such as weekday or weekend,
discussed in Section 6 and we conclude the paper in Section 7. shapes the user consumption behavior in terms of different liv-
ing/working styles. It indicates the consumption pattern according
to the calendar feature and cycle.
2 Feature Design and Dataset Our features based on the above three categories are summarized
in Table 1. There are nd + 18 features in total, among which, nd
Power consumption patterns are affected by a variety of factors. Thus features are from historical consumption data, 13 are from weather
demand forecasting mechanism should incorporate such factors as information and 5 are designed from calendar information. The his-
features, in addition to historical energy consumption data. We focus torical data involves a large number of data points. Therefore, it is
on weather and calendar data. Below, we elaborate these features and necessary to find out nd historical data points that are most corre-
the dataset we have utilized in this paper. lated with the target forecasting value. To solve this problem, we use
the autocorrelation function (ACF), which can quantify the correla-
tion between data points in the same time series of various lags, to
2.1 Energy Usage Dataset find out the most related number of lag values as nd .

We use the publicly available dataset provided by the University of

Massachusetts [7]. It includes two parts, the apartment dataset and
the weather dataset. 3 PowerNet
The apartment dataset contains consumption data for 114 single-
family apartments located in Western Massachusetts for the period 3.1 Overview
of year 2014 to 2016. The dataset records the demand of every single
apartment in fixed temporal frequency∗ . The metering frequency is Our approach to forecasting power demand is by modeling the
once every 15 minutes for the year 2014 and 2015 (till December relationship between the target power demand and a set of indica-
15), and once every 1 minute for the year 2016. The data is in .csv tive features. Fig. 1 illustrates the high-level process sketching our
files, each of which records the power consumption details for one approach. First, we extract several suites of features discussed in
apartment within one year with apartment ID as its file name. Section 2.2 from the datasets. Then, we train the proposed Power-
Together with the power consumption data, hourly weather infor- Net model by feeding the feature vectors as input, supervised by
mation during the record period from 2014 to 2016 is also avail- the signals from the power demand ground-truth. Fig. 2 shows the
able. Fourteen meteorological attributes are included in the weather architecture of PowerNet, which consists of two major components.
dataset, including weather summary, temperature, humidity, cloud The left component (in blue) is designed to model the historical con-
cover, wind speed, wind bearing, visibility, pressure, etc. In our sumption time series data. The idea is to capture the temporal effects
experiment, we use the data of 2016 because of its finer granular- of power consumption as future consumption may be correlated to
ity in recording frequency as well as the latest consumption pattern consumption in the recent past. Here, we utilize the Long Short-Term
it may reflect. Memory (LSTM) [10] network to encode the correlations between
consecutive power demands across time. The right component (in
orange) is a Multilayer Perceptron (MLP) [11] model for model-
∗ Given
ing the non-linearity between the weather and calendar features, and
the metering interval is fixed, power values are able to represent the target power demand. Finally, we aggregate the outputs of these
the power consumption. two components and make the final prediction of the target power

IET Research Journals, pp. 1–9

2 © The Institution of Engineering and Technology 2020
Table 1 Features for the power demand forecasting task
Category Detail
Historical Consumption Data Consumption data in past nd time slots
Weather summary, weather representation icon name, temperature, apparent temperature, cloud cover,
Weather Information precipitation probability, precipitation intensity, visibility, wind speed, wind bearing, humidity, pressure,
dew point
Day of the month, day of the week, hour of the day, period of the day (i.e., daytime and night time),
Calendar Information
is weekend (boolean value)

Power Demand

!' !& !% !$ !# !" Aggregation & Prediction Layer

Power Consumption
Weather & Calendar Fusion Layer
Encoding Layer

Input Layer Input Layer

e e2 e3 e4 e5 e6 (-* ()* (+* (-, (), (+,
(Historical Consumption) 1 (Weather & Calendar Info)

Fig. 2: The Architecture of PowerNet

demand via a Prediction Layer. In the following, we dissect each Specifically, we apply a stacked LSTM to every time step of the
component of PowerNet. power consumption time series data E,

[h1 s1 ] = LSTMstack (e1 , h0 , s0 )

...

3.2 Input Layer [ht st ] = LSTMstack (et , ht−1 , st−1 ) (1)

...
To process the sequential historical energy consumption data and
the non-sequential weather & calendar data, the input layer of [h|E| s|E| ] = LSTMstack (e|E| , h|E|−1 , s|E|−1 )
PowerNet consists of two components for each of the data types.
The first is a sequence of historical power consumption values where h and s are the hidden and cell states of LSTM, respectively.
denoted by E = {e1 , ..., et , ..., e|E| } where |E| is the cardinality Then, the output of LSTMstack at the last time step h|E| ∈ Rn is
of E (i.e., the number of meter readings), with each value et ∈ R+ used as a final encoding of the entire power consumption series,
is a real-valued non-negative power meter reading at time t. The where n is the LSTM memory size.
second is a vector of weather and calendar features, denoted by
w
F = [F w ; F c ], where F w = {f1w , ..., f|w| c
}, F c = {f1c , ..., f|c| }, 3.4 Weather & Calendar Fusion Layer
[; ] represents vector concatenation, and |w| and |c| are the numbers
of the weather and calendar features, respectively, already introduced Here we deal with the input from the weather & calendar feature
in Section 2.2. vectors F w and F c . Specifically, we jointly model these two feature
vectors through an MLP network as follows,

hwc = ReLU(W2 ReLU(W1 [F w ; F c ] + b1 ) + b2 ) (2)

where W1 ∈ Rd1 ×m , W2 ∈ Rd2 ×d1 , b1 ∈ Rd1 , b2 ∈ Rd2 are train-

3.3 Power Consumption Encoding Layer able weights, m = |F w | + |F c |, d1 and d2 are the sizes of hidden
units, [; ] denotes vector concatenation by column and hwc ∈ Rd2 is
The utility of this layer is to encode the power consumption time
the output encoding of this MLP. ReLU [16] is used as the activation
series data by using an LSTM network, a widely-used variant of
function for introducing non-linearity.
Recurrent Neural Network (RNN) that can learn long-term depen-
dencies. Different from traditional neural networks that can only
take historical energy consumption readings as input, LSTM allows 3.5 Aggregation & Prediction Layer
unlimited history information to persist with an internal loop mecha-
nism while avoiding the gradient vanishing problem [12]. Therefore, Having both power consumption history and weather & calendar
it has been successfully applied to various areas, e.g., continual pre- information encoded, we finally aggregate the obtained encoding
diction [13], language modeling [14] and translation [15]. The core representations and make the final power demand predictions. Con-
of LSTM is a memory cell that can maintain information across time cretely, we concatenate the two encoding representations h|E| and
via gating mechanism. The LSTM cell maintains a cell status s based hwc and feed the result into a final feed-forward regression network,
on both current input et and previous output ht−1 (i.e., the recurrent
input) and then decides what information to be dropped and what to ŷ = W4 ReLU(W3 [h|E| ; hwc ] + b3 ) + b4 (3)
be passed on (i.e., ht ). We do not detail the gating mechanisms here
which can be found in previous literature [10]. We use LSTM(·) to where W3 ∈ Rd3 ×(|E|+d2 ) , b3 ∈ Rd3 , W4 ∈ R1×d3 , b4 ∈ R are
represent the cell function. trainable parameters and d3 is the hidden size of the inner layer. Note

IET Research Journals, pp. 1–9

© The Institution of Engineering and Technology 2020 3
that both W4 and b4 of the outer layer have only one hidden unit for both LSTM and GRU. In place of the hidden units in RNN, LSTM
producing the final predicted reading value. ŷ ∈ R is the predicted network uses LSTM cells consisting of the input gate, output gate
power consumption reading value. and forget gate. GRU is simpler compared to LSTM as it combines
the input gate and forget gate with a single gate called update gate.
3.6 Optimization Wang et al. [6] performs short term load forecasting using GRU and
including factors such as weather, temperature, day, etc.
For model training, we use mean squared error loss (Eq. (4)) with
dropout regularization [17], 4.1.2 Evaluation Metric: We introduce two metrics to evalu-
ate the accuracy of the forecasting model, i.e., Mean Square Error
N (MSE) and Mean Absolute Percentage Error (MAPE). The smaller
1 X the error is, the more accurate the model prediction is.
L(W∗ , b∗ ) = (ŷi − yi )2 (4)
N MSE measures the average of the squared errors/deviations as
i=1
directed by Eq. 5. Nv is the total number of forecasting values, At
where ŷi and yi are the ith predicted and actual energy consump- denotes the actual value at time t and Ft denotes the forecasting
tion values respectively, N is the number of training examples, value at time t. A smaller MSE value signifies better prediction.
W∗ , b∗ are all the aforementioned trainable parameters in our model.
In addition, all trainable parameters in the fully-connected lay- Nv
1 X
ers are regularized by L2 norm. Finally, Adam (Adaptive Moment M SE = (At − Ft )2 (5)
Nv
Estimation) [18] is used as the optimizer for stochastic gradient t=1
descent. Different from MSE, MAPE measures the error proportion to the
absolute value. It expresses the error as a percentage and can be
calculated using Eq. 6.
4 Evaluation
Nv
This section first compares PowerNet with several representative 100% X At − Ft
M AP E = (6)
models used in recent works in terms of two quantitative met- Nv At
t=1
rics. Then, we evaluate PowerNet under different settings, including
the forecasting frequencies, forecasting periods and the newness of MSE is more useful in comparison among identical test data as
PowerNet. it is the absolute square error value, which depends on the scale of
actual values. Compared to MSE, MAPE is more indicative in the
4.1 Preparation comparison between different data since it represents the error in a
percentage manner.
4.1.1 Baseline: We select four prediction methods utilized in
recent works as our baseline models in this work: Gradient Boost- 4.2 Comparison with Baselines
ing Tree (GBT) [2], Support Vector Regression (SVR) [3], Random
Forest (RF) [4][5] and Gated Recurrent Unit (GRU) [6]. For a fair In this sub-section, we present empirical results to demonstrate the
comparison, we apply these models to the same public dataset as advantage of PowerNet over the four baseline models. Our PowerNet
described in Section 2.1. uses a two-layered LSTM network. The cell memory size for every
GBT is adopted by Bansal et al. [2] to forecast power consump- layer is tuned from the set {64, 128, 256, 512} using grid search.
tion. GBT is a supervised learning predictive model which can be Early stopping is employed when there is no further improvement
used for classification and regression purposes [19][20]. GBT builds on the validation set.
the model, i.e., a series of trees, in a step-wise manner. In each step, it Similarly, the parameters for baseline models are also automati-
adds one tree and maintains the existing trees unchanged. The added cally tuned in the same way. For GBT, three parameters are involved,
tree is the optimal tree by minimizing a predefined loss function. In i.e., the number of boosting stages to perform n_estimators, max-
summary, the prediction model of GBT is formed with the ensemble imum depth of the individual regression estimators max_depth and
of weaker prediction models following the core idea of the gradient. learning rate learning_rate. Its parameter grid is constructed using
SVM is used in the work by Yu et al. [3] to forecast power usage. n_estimators: {50, 100, 150, 200, 250, 300, 350, 400, 450, 500},
SVM is a supervised machine learning algorithm for solving both max_depth: {1, 2, 3, 4, 5} and learning_rate: {0.001, 0.01, 0.1,
classification and regression problems [21]. SVM performs classifi- 1}. For SVR, three parameters C, kernel and gamma are involved.
cation by seeking the hyperplane that differentiates the two classes to We construct the parameter grid using C: {0.001, 0.01, 0.1, 1},
the largest extent, i.e., maximizing the margin. Similarly, regression kernel: {rbf, linear, poly, sigmoid} and hence gamma is automat-
using SVM is called SVR [22] is to seek and optimize the generation ically set to the corresponding kernel coefficient or the reciprocal
bounds by minimizing the predefined error function. The regression of the number of features. For RF, we set the number of trees in
can be conducted in both linear and non-linear manner. For the non- the forest n_trees, maximum depth of the tree max_depth and
linear SVR, it needs to transform the data into a higher dimensional minimum number of samples for a leaf, min_samples_leaf . Like
space so that it is possible to perform the linear separation. other algorithms, parameters are automatically tuned from the search
RF, an ensemble machine learning method consisting of many range n_trees: {50, 100, 150, 200, 250, 300, 350, 400, 450, 500},
decision trees, has been widely used for classification and regression max_depth: {80, 90, 100, 110} and min_samples_leaf : {3, 4,
problems. A decision tree, also termed as Classification And Regres- 5}. GRU is similar to LSTM network. The cell memory size for
sion Tree (CART), has many nodes where each node stores a test every layer is tuned from the set {32, 64, 96, 128, 256} using grid
function to apply on incoming data [4]. In RF, the bagging principle search.
is incorporated in which observations of a certain sample size (called We use the power consumption data of past 26 days, i.e., 624
bootstrap samples) are randomly selected from the training set to fit hours as the training set to train all the models and the next 48
a regression tree. While single CART is sensitive to data noise, the hours data, i.e., day 27-28 as the validation set. Finally, we make
bootstrap aggregation is immune to it to a large extent. RF shows predictions on the test data of day 29-30. Due to space limitation,
competitive performance in time series forecasting as observed in we demonstrate the results of our model and the four baselines on
recent studies on load demand forecasting [4][5]. the data of only few randomly chosen apartments from different sea-
GRU is based on the framework of RNN. Conventional RNN net- sons (No. 69 in Spring, No. 91 in Summer and No. 39 in Autumn as
work uses gradient descent method in back-propagation for learning. seen in Table 2). The plots of predicted consumption patterns against
However, vanishing gradient in dealing with long time sequence is the real consumption for the first two apartments are found in Fig. 3
the common problem encountered in RNN. The vanishing gradi- and Fig. 4. From the figures we can see that PowerNet model cap-
ent is countered by adding control gates for information buffer in tures trends well and offers accuracy improvement in MAPE for the

IET Research Journals, pp. 1–9

4 © The Institution of Engineering and Technology 2020
1.8
Original Original
PowerNet 0.7
PowerNet
SVR SVR
1.6 GBT 0.6 GBT
RF
Energy Consumption(kW)

Energy Consumption(kW)
GRU 0.5 GRU
1.4
0.4

1.2 0.3

0.2
1.0 0.1

0 10 20 30 40 0 10 20 30 40
Time(hour)
Time(hour)
Fig. 4: Forecasting results of Apartment 91 (Summer)
Fig. 3: Forecasting results of Apartment 69 (Spring)

400 Original
1.8 PowerNet
350 SVR
1.6 GBT
300 Energy Consumption(kW)
RF
GRU
250 1.4
MAPE

200
1.2
150
100 1.0

50 0.8
0
PowerNet GBT SVR RF GRU 0.6
Prediction Method 0 10 20 30 40
Time(hour)
Fig. 5: Comparison in Distribution of MAPE Fig. 6: Forecasting results of Aggregated Consumption of 16 apart-
ments (Spring)

selected datasets by 8%, 34% and 14%, respectively, compared to 4.3 Forecasting Period of PowerNet
the second-best model.
We further conduct experiments with more apartment data to In general, the accuracy of power demand forecasting deteriorates as
compare the accuracy in terms of distribution of MAPE. We use the prediction horizon moves farther. Therefore, it is crucial for grid
summer season consumption data for randomly picked 50 apart- operators to know how much ahead in time the PowerNet can pre-
ments and perform the same experiment for each apartment to dict the demand without significant drop in accuracy. In this section,
calculate MAPE. The result is summarized in Fig. 5. As can be seen, we provide empirical results on forecasting accuracy against differ-
overall the MAPE of PowerNet is lower than the other competitors. ent forecasting periods using the real-world electricity consumption
In general, as predicting demand of individual household level is data. By doing so, grid operators can evaluate whether PowerNet
challenging, MAPE is often high. However, MAPE of PowerNet is is suitable for certain tasks that require different lengths of predic-
still bounded below 100% and the median is below 50%. tion period, such as bidding in the day-ahead electricity market and
Lastly, we conduct experiments with aggregated energy consump- day-ahead electricity scheduling which require the one day-ahead
tion data. We evaluate accuracy with 2 different aggregation levels, forecasting results [23].
16 apartments and 114 apartments (i.e., all apartments available in Some features for predicting the power demand in the far future
the dataset). For each case, we predict for 48 hours as done in may not available at the time of prediction. For example, the power
the experiments for individual apartments, and calculate MAPE and consumption of the previous one hour is an important feature to
MSE. As seen in Table 3, when the aggregation level is low, Power- predict the power demand for the next hour. If we predict beyond
Net has advantage over the others. The corresponding plot is found one hour at once, we would not know the actual consumption value
in Fig. 6. On the other hand, when consumption values of all apart- for every ‘previous’ hour. Therefore, the prediction in the far future
ments are aggregated, all prediction models except GRU perform relies on the predicted values prior to that. The fact has an inherent
reasonably well. Based on these results as well as the results of indi- risk of error accumulation.
vidual apartment experiments discussed earlier, PowerNet exhibits In this experiment, we predict the power demand for the future 30
competitive performance alongwith mostly better accuracy over the days at once based on current historical data. We train the model on
other models evaluated in our set-up. However, we admit that our the aggregated historical data in July and predict the power demand
results may have bias caused by the specific dataset we use in this for the following 30 days. The forecasting results are shown in Fig. 7
study and evaluation with other datasets for generality will be part in red. We can see that the red line follows the original peaks and val-
of our future work. leys well at the beginning. However, starting from a point around 550
on the x-axis, the red line totally loses track of the original values. In

IET Research Journals, pp. 1–9

© The Institution of Engineering and Technology 2020 5
Table 2 MAPE and MSE on prediction of individual apartment consumption
Apartment (Season) Model MAPE MSE
PowerNet 7.98% 0.017 Table 3 MAPE and MSE on prediction of aggregated consumption
SVR 8.69% 0.018 # of Apartment Model MAPE MSE
69 (Spring) GBT 8.84% 0.019
PowerNet 13.98% 0.024
RF 8.86% 0.019
SVR 15.61% 0.034
GRU 9.80% 0.026
16 GBT 14.18% 0.028
PowerNet 13.82% 0.014
RF 15.03% 0.032
SVR 106.75% 0.016
GRU 16.03% 0.036
91 (Summer) GBT 22.41% 0.013
PowerNet 10.00% 0.012
RF 21.38% 0.013
SVR 10.94% 0.014
GRU 21.00% 0.015
114 GBT 8.48% 0.009
PowerNet 16.73% 0.213
RF 9.71% 0.012
SVR 19.62% 0.408
GRU 15.75% 0.036
39 (Autumn) GBT 19.85% 0.368
RF 22.83% 0.449
GRU 22.83% 0.449

Fig. 7: Forecasting results using predicted and actual values

4.4 Model Retraining Interval

For any data-driven model, it is necessary to keep the model up to

date by retraining the model using fresh data. In particular, power
consumption patterns are not fixed and the trained model would
become obsolete over time; a fact which would result in lower fore-
casting accuracy. Thus, the timing for retraining is a crucial tuning
parameter in real-world operation. Retraining is desired when degra-
dation in prediction is noticed. This subsection is to empirically
investigate appropriate model retraining interval and to find how
long a trained PowerNet model can be used with acceptable accu-
racy. It also provides us with insight on how often PowerNet should
be trained to capture the new power demand characteristics evolved
with time.
This experiment is different from the previous experiment in
Section 4.3. The experiment in Section 4.3 focuses on exploring the
Fig. 8: Forecasting MAPE using predicted and actual values accuracy fluctuation caused by different lengths of forecasting peri-
ods and it forecasts the power demand for a period at once based
on the data on hand at that moment. Differently, the experiment in
this section uses actual data, which eliminates the error accumula-
tion caused by forecasting using estimated feature values. We use
order to understand the error quantitatively, we plot MAPE in Fig. 8 the model trained in Section 4.3 and test it using the actual data in
in red. We can see from the MAPE plot that the error increases as it August.
goes farther into the future. Specifically, before 24 on the x-axis, the The results are shown in Fig. 7 using the blue line. Generally, the
MAPE is at a low level, less than 10%. Thereafter, the MAPE rises prediction based on actual values (the blue line) is better than the
a regional peak 18% at 52 on the x-axis. Subsequently, the MAPE prediction based on predicted values (the red line), which is reason-
declines a bit to 16% and maintains the value till 550 on the x-axis, able and in line with the expectation. From the MAPE plot in Fig. 8,
the point from which the error increases sharply. Given the experi- the same conclusion can be drawn. We can see that both ‘prediction
mental results, we can infer that the model is suitable for forecasting (use ground-truth)’ and ‘prediction (use estimation)’ show almost
in the day-ahead bidding task and day-ahead electricity scheduling.

IET Research Journals, pp. 1–9

6 © The Institution of Engineering and Technology 2020
10
Train on date 1~28 11
Test on date 29, 30
GBT
month
powernet gbt
2 percentage
1000% when theft is small, the MAPE grows linearly as the per-
mse mape centage 3of theft grows. However, from the experimental results in
800%
90% 0.037519 0.12976528 4 can see that the overall MAPE increases in an exponential
Fig. 9, we
600% 80% 0.087807 0.249248248
MAPE

manner.5It means that the more the user steals, the larger the devia-
400% 70% 0.165389 0.409455582
tion between
6 the predicted value Mp and the reported value Mr is.
200% 60% 0.318785 0.685253794
In other words, the more the user steals, the more obvious the devi-
7
50% 0.475108 0.991160088
ation is. A reasonable threshold that would trigger an alarm can be
8from the historical data as well as the tolerance of theft.
0% inferred
40% 0.656682 1.453382155 9
10% 20% 30% 40% 50% 60% 70% 80% 90%
30% 0.896347 2.293428141 Anomaly detection can be deployed in both substation layer and
10 consumer layer. We discuss how PowerNet can be utilized
Electricity theft percetage individual
20% 1.295557 4.10831409 11 such anomalies in both layers.
to detect
Fig. 9: The MAPE predictions10% 1.618826 theft scenar-
over different electricity 9.155661892
Anomaly detection in substation layer: At the substation level,
0%
ios characterized by the theft percentage 1.852523
from 10% nan
to 90%. there is a master meter which is a meter to measure the aggregated
consumption of the whole supply region.P The reading of master
Nc i
50.00% powernet gbt
meter is denoted as M s . So we have M s = i=1 Mu + T L, where
mse mape Nc is the number of consumers in the supply region and T L is the
40.00%
90% 0.037519 0.12976528 technical loss. The substation can observe Mri which is the reported
30.00%
MAPE

80% 0.087807 0.249248248 P c of i consumer i. We can obtain

consumption T L through T L =
20.00% 70% 0.165389 0.409455582 Ms − N i=1 Mu . In normal case where Mri = Mui , we have T L =
Theft
60%PerceMAPE 0.318785 Ms − N i i i
P c
10.00% 0.685253794 i=1 Mr . In order to detect the anomaly where Mr 6= Mu ,
0.00% 50% 0% 7.12%0.475108 0.991160088 we use PowerNet to model the indirectly observed T Lo . In the attack
40%10% 12.98%0.656682
case where Mri 6= Mui , a deviation would be observed between the
0% 10% 20% 30% 1.453382155
30% 20% 24.92%0.896347 2.293428141
predicted T Lp and the observed T Lo . Hence, PowerNet is able to
Electricity theft percetage
20% 30% 40.95%1.295557 4.10831409
detect the anomaly under a substation supply region by constructing
one model for one substation.
10%40% over
Fig. 10: The MAPE predictions 68.53%
different
1.618826electricity theft
9.155661892 Anomaly detection in individual consumer layer: Anomaly
0%50%
scenarios characterized by the 99.12%
theft percentage from 10%
1.852523 nan to 30%. detection at substation level can detect the anomaly but cannot pin-
60% 145.34%
point which consumer is suspicious. At the individual consumer
level, with the help of the PowerNet, we can build a model for
same errors till 15 hours on the x-axis. The latter digresses signifi-
the consumer u based on his/her historical Mu . Once the attacker
cantly afterwards. The ‘prediction (use ground-truth)’ i.e., the blue
reduces his/her Mr to make Mr 6= Mu , we shall notice that there
line maintains itself around 10% MAPE at 36 hours and around 11%
is a deviation between his/her Mr and Mp which is predicted by
till 550 on the x-axis. At the very end, it reaches the largest error of
PowerNet. In this sense, anomaly detection at individual consumer
around 13%. In practice, depending on the error tolerance of the pre-
Theft Perce
layer can work as a complementary to anomaly detection in substa-
diction task, we can adjust our MAPE
model by re-training the model with
tion layer, which is able to locate the consumer who is suspiciously
new data. For example, we can 0% re-train
7.12%
the model every 36 hours to
reporting false readings.
10% of the
capture the new characteristics 12.98%
data generated during the 36
hours. Generally, the model 20% 24.92% an MAPE of around 11%
can maintain
for more than 3 weeks in the 30%future40.95%
prediction horizon.
40% 68.53% 6 Related Work
50% 99.12%
5 PowerNet for Anomaly Detection The existing works on power demand forecasting can be generally
60% 145.34%
classified into two categories - classic statistical models and modern
Anomaly detection is to identify patterns in data that do not con- machine learning algorithms.
form to the defined normal behavior [24]. Anomaly detection in In terms of statistical models, time-series models have been used
smart grids focuses on the non-technical loss which is not caused by to capture the time-series characteristics of power demand, e.g.,
the intrinsic loss (technical loss, e.g., transmission loss) in a power ARMA [25][26], ARIMA [27][28][29]. Beside time-series mod-
system. Electricity theft is the most focused non-technical loss that els, Hong et al. [30] adopt multiple linear regression to model the
causes anomalies. Data-driven anomaly detection can be done by hourly energy demand using seasonality (regarding year, week and
modeling the normal consumption behavior and defining a normal day) and temperature information. Their results indicate that com-
region. Any consumption that does not fall within the normal region plex featuring of the same information results in a more accurate
is considered as an anomaly and it potentially indicates a problem in forecasting. Fan and Hyndman [31] use the semi-parametric additive
the smart grid. The forecasting results from PowerNet can be inter- model to explore the non-linear relationship between energy usage
preted differently depending on the tasks, e.g., the power demand at data and variables, i.e., calendar variables, consumption observations
some time in the future or the expected normal consumption at that and temperatures in the short-term time period. Their model demon-
time. In the latter sense, PowerNet can be used to define the normal strates sensitivity towards the temperature. In addition, conditional
consumption behavior based on which further anomaly detection can kernel density estimation is applied to the power demand forecasting
be carried out. area which performs well on the data with strong seasonality [32].
Normally, for a consumer u, the reported consumption Mr should However, these models have limitations in incorporating heteroge-
be roughly equal to the actual consumption Mu . However, an neous features in a unified way. Differently, the design of PowerNet
attacker may be able to manipulate Mr aiming at reducing the creates a neural network that can encode sequential features and
electricity bill by making Mr < Mu . We conduct a preliminary single-value features simultaneously.
experiment to understand the performance of PowerNet to capture Regarding the machine-learning models, there are three mod-
electricity theft. We here consider ‘forecasting using predicted val- els widely used for demand forecasting tasks, viz., Decision Tree
ues’ approach and evaluate the deviation from the prediction as the (DT) [2][33][34], Support Vector Machines (SVM) [3][35][36][37]
criteria for detection. In order to prevent manipulated consumption and Artificial Neural Network (ANN) [38][39][40][41]. DT is used
data from affecting the prediction, this is a reasonable design. to predict building energy demand levels [34] and analyze the elec-
We artificially reduce the power consumption by different theft tricity load level based on hourly observations of the electricity load
percentages in the test data to simulate different electricity theft sce- and weather [33]. Later, Bansal et al. [2] use the boosted DT to
narios. Fig. 9 shows the forecasting MAPE results under different model and forecast energy consumption so as to create personal-
theft percentages and Fig. 10 magnifies the the first 30% of the x- ized electricity plans for residential consumers based on the usage
axis in Fig. 9. We can see from the magnified view (Fig. 10) that history. There are also works using SVR, the regression based on

IET Research Journals, pp. 1–9

© The Institution of Engineering and Technology 2020 7
SVM, etc., to forecast power consumption in combination with Acknowledgement
other techniques such as fuzzy-rough feature selection [37], particle
swarm optimization algorithms [36] and chaotic artificial bee colony This research is supported by the National Research Foundation,
algorithm [35]. The SVR-based prediction has demonstrated good Prime Minister’s Office, Singapore under the Energy Programme
prediction results [3]. For the third model ANN, Gajowniczek and and administrated by the Energy Market Authority (EP Award No.
Zabkowski choose ANN because they believe that time-series anal- NRF2017EWT-EP003-047 and NRF2014EWT-EIRP002-040). It is
ysis is not suitable for their work since they observe high volatility also in part supported by National Natural Science Foundation of
in the data [38]. Zufferey et al. [39] apply time delay neural network China (Grant No. 61932011), Guangdong Provincial Key R&D Plan
and find out that the individual consumer’s consumption is harder (Grant No. 202020022911500032).
to predict than an aggregation of multiple consumers. Recently,
researchers take the advantage of LSTM to forecast building energy
load using historical consumption data [40]. Historical load data and
ambient temperature are utilized to build a prediction model based 8 References
on ANN in [41]. Cheng et al. [42] further manage to feed the con- 1 Siano, P.: ‘Demand response and smart grids—a survey’, Renewable and Sustain-
catenation of historical data and influence features as a sequential able Energy Reviews, 2014, 30, pp. 461–478
input to the LSTM network. Since they only use the LSTM network, 2 Bansal, A., Rompikuntla, S.K., Gopinadhan, J., Kaur, A., Kazi, Z.A.: ‘Energy
all data are treated as sequential data. Short term demand forecast- consumption forecasting for smart meters’, arXiv preprint arXiv:151205979, 2015,
3 Yu, W., An, D., Griffith, D., Yang, Q., Xu, G.: ‘Towards statistical modeling and
ing using LSTM network based on historical load data and weather machine learning based energy usage forecasting in smart grid’, ACM SIGAPP
information has been proposed in [43]. In this article, historical data Applied Computing Review, 2015, 15, (1), pp. 6–16
are input to the LSTM layer and output of the LSTM layer is com- 4 Lahouar, A., Slama, J.B.H.: ‘Day-ahead load forecast using random forest and
bined with weather data which is the output of a fully connected expert input selection’, Energy Conversion and Management, 2015, 103, pp. 1040–
1051
neural network. A similar study is performed in [44] where categor- 5 Moon, J., Kim, Y., Son, M., Hwang, E.: ‘Hybrid short-term load forecasting
ical features like time of the day, holiday flag, etc., are incorporated scheme using random forest and multilayer perceptron’, Energies, 2018, 11, (12),
in addition to the weather data to enhance prediction accuracy for pp. 3283
short term load demand using LSTM network. Despite the exten- 6 Wang, Y., Liu, M., Bao, Z., Zhang, S.: ‘Short-term load forecasting with multi-
source data using gated recurrent unit neural networks’, Energies, 2018, 11, (5),
sive research carried out in power demand forecasting area, to the pp. 1138
best of our knowledge, there is no such neural network architecture 7 Risinger, E.. ‘Umass smart* dataset - 2017 release’, 2017. http://traces.
that takes consideration of heterogeneous features to the extent the cs.umass.edu/index.php/Smart/Smart
PowerNet does. 8 Anderson, B., Lin, S., Newing, A., Bahaj, A., James, P.: ‘Electricity consumption
and household characteristics: Implications for census-taking in a smart metered
Another stem of related work is the anomaly detection in smart future’, Computers, Environment and Urban Systems, 2016,
grids for non-technical loss such as electricity theft. Bandim et. 9 Mashima, D., Serikova, A., Cheng, Y., Chen, B.: ‘Towards quantitative evalua-
al [45] introduce an observer meter to observe the meter con- tion of privacy protection schemes for electricity usage data sharing’, ICT Express,
sumption of a set of users and further identify the tampered meter 2018, 4, (1), pp. 35–41
10 Hochreiter, S., Schmidhuber, J.: ‘Long short-term memory’, Neural computation,
using the deterministic and statistical approach. Later, Krishna et 1997, 9, (8), pp. 1735–1780
al. [46] discuss the detection capability based on such extra meters 11 Hornik, K., Stinchcombe, M., White, H.: ‘Multilayer feedforward networks are
on different attacks. Other than these, linear regression [47], clus- universal approximators’, Neural networks, 1989, 2, (5), pp. 359–366
ter outlier [48][49] and SVM [50][51] are also used to detect the 12 Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.:
‘LSTM: A search space odyssey’, IEEE transactions on neural networks and
anomaly in smart girds. Biswas et al. [52] perform correlation anal- learning systems, 2016,
ysis to pinpoint electricity thieves among a large pool of domestic 13 Gers, F.A., Schmidhuber, J., Cummins, F.: ‘Learning to forget: Continual predic-
consumers. Furthermore, Mashima et al. [26] evaluate the effec- tion with LSTM’, Neural computation, 2000, 12, (10), pp. 2451–2471
tiveness of several anomaly detection models including the average 14 Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S. ‘Recurrent
neural network based language model.’. In: Interspeech. vol. 2, 2010. p. 3
detector, ARMA-GLR, and non-parametric statistics and Local Out- 15 Sutskever, I., Vinyals, O., Le, Q.V. ‘Sequence to sequence learning with neu-
lier Factor (LOF). In this work, we discuss how PowerNet can be ral networks’. In: Advances in neural information processing systems, 2014.
used in multiple anomaly detection layers. pp. 3104–3112
16 Nair, V., Hinton, G.E. ‘Rectified linear units improve restricted boltzmann
machines’. In: Proceedings of the 27th international conference on machine
learning (ICML-10), 2010. pp. 807–814
17 Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.:
‘Dropout: a simple way to prevent neural networks from overfitting.’, Journal of
machine learning research, 2014, 15, (1), pp. 1929–1958
18 Kingma, D., Ba, J.: ‘Adam: A method for stochastic optimization’, arXiv preprint
arXiv:14126980, 2014,
7 Conclusion 19 Friedman, J.H.: ‘Greedy function approximation: a gradient boosting machine’,
Annals of statistics, 2001, pp. 1189–1232
In this article, we propose PowerNet, a power demand forecasting 20 Friedman, J.H.: ‘Stochastic gradient boosting’, Computational Statistics & Data
Analysis, 2002, 38, (4), pp. 367–378
model based on modern recurrent neural network and multilayer per- 21 Boser, B.E., Guyon, I.M., Vapnik, V.N. ‘A training algorithm for optimal mar-
ceptron network, which is capable of incorporating heterogeneous gin classifiers’. In: Proceedings of the fifth annual workshop on Computational
influencing factors in a unified way. It demonstrates improvement learning theory. ACM, 1992. pp. 144–152
in prediction accuracy compared to four state-of-the-art approaches, 22 Müller, K.R., Smola, A.J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.
‘Predicting time series with support vector machines’. In: International Conference
viz., GBT, SVR, RF and GRU. Further, evaluation under different on Artificial Neural Networks. Springer, 1997. pp. 999–1004
settings with the real-world dataset is carried out to better under- 23 Conejo, A.J., Plazas, M.A., Espinola, R., Molina, A.B.: ‘Day-ahead electricity
stand the model capability and crucial operational considerations price forecasting using the wavelet transform and arima models’, IEEE transac-
in practice, mainly the length of the forecasting period and the tions on power systems, 2005, 20, (2), pp. 1035–1042
24 Chandola, V., Banerjee, A., Kumar, V.: ‘Anomaly detection: A survey’, ACM
model retraining interval. Based on our evaluation results, PowerNet computing surveys (CSUR), 2009, 41, (3), pp. 15
shows advantages in terms of prediction accuracy when the predic- 25 Gross, G., Galiana, F.D.: ‘Short-term load forecasting’, Proceedings of the IEEE,
tion is made for individual and aggregated consumption of a group of 1987, 75, (12), pp. 1558–1573
households which is often challenging in practice. Finally, we briefly 26 Mashima, D., Cárdenas, A.A. ‘Evaluating electricity theft detectors in smart grid
networks’. In: International Workshop on Recent Advances in Intrusion Detection.
discuss the usability of PowerNet in anomaly detection task in the Springer, 2012. pp. 210–229
smart metering processes. 27 Alberg, D., Last, M. ‘Short-term load forecasting in smart meters with sliding
Our potential future work includes the evaluation with other smart window-based arima algorithms’. In: Asian Conference on Intelligent Information
meter datasets, such as datasets of commercial/industrial electric- and Database Systems. Springer, 2017. pp. 299–307
28 Cho, M., Hwang, J., Chen, C. ‘Customer short term load forecasting by using
ity consumers. Moreover, development and evaluation of anomaly arima transfer function model’. In: Energy Management and Power Delivery, 1995.
detection scheme based on PowerNet under more sophisticated Proceedings of EMPD’95., 1995 International Conference on. vol. 1. IEEE, 1995.
attacker models is also an interesting research direction. pp. 317–322

IET Research Journals, pp. 1–9

8 © The Institution of Engineering and Technology 2020
29 Nepal, B., Yamaha, M., Yokoe, A., Yamaji, T.: ‘Electricity load forecasting on Trends in Electronics and Informatics (ICOEI). IEEE, 2019. pp. 1274–1278
using clustering and ARIMA model for energy management in buildings’, Japan 42 Cheng, Y., Xu, C., Mashima, D., Thing, V.L., Wu, Y. ‘PowerLSTM: Power
Architectural Review, 2019, demand forecasting using long short-term memory neural network’. In: Interna-
30 Hong, T., Gui, M., Baran, M.E., Willis, H.L. ‘Modeling and forecasting hourly tional Conference on Advanced Data Mining and Applications. Springer, 2017.
electric load by multiple linear regression with interactions’. In: IEEE Power and pp. 727–740
Energy Society General Meeting. IEEE, 2010. pp. 1–8 43 Kwon, B.S., Park, R.J., Song, K.B.: ‘Short-term load forecasting based on deep
31 Fan, S., Hyndman, R.J.: ‘Short-term load forecasting based on a semi-parametric neural networks using LSTM layer’, Journal of Electrical Engineering and
additive model’, IEEE Transactions on Power Systems, 2012, 27, (1), pp. 134–141 Technology, 2020,
32 Arora, S., Taylor, J.W.: ‘Forecasting electricity smart meter data using conditional 44 Hossain, M.S., Mahmood, H. ‘Short-term load forecasting using an LSTM neural
kernel density estimation’, Omega, 2016, 59, pp. 47–59 network’. In: 2020 IEEE Power and Energy Conference at Illinois (PECI). IEEE,
33 Gładysz, B., Kuchta, D.: ‘Application of regression trees in the analysis of 2020. pp. 1–6
electricity load’, Badania Operacyjne i Decyzje, 2008, , (4), pp. 19–28 45 Bandim, C., Alves, J., Pinto, A., Souza, F., Loureiro, M., Magalhaes, C., et al.
34 Yu, Z., Haghighat, F., Fung, B.C., Yoshino, H.: ‘A decision tree method for building ‘Identification of energy theft and tampered meters using a central observer meter:
energy demand modeling’, Energy and Buildings, 2010, 42, (10), pp. 1637–1646 a mathematical approach’. In: Transmission and Distribution Conference and
35 Hong, W.C.: ‘Electric load forecasting by seasonal recurrent SVR (support vector Exposition, 2003 IEEE PES. vol. 1. IEEE, 2003. pp. 163–168
regression) with chaotic artificial bee colony algorithm’, Energy, 2011, 36, (9), 46 Krishna, V.B., Lee, K., Weaver, G.A., Iyer, R.K., Sanders, W.H. ‘F-deta: A frame-
pp. 5568–5578 work for detecting electricity theft attacks in smart grids’. In: Dependable Systems
36 Qiu, Z.: ‘Electricity consumption prediction based on data mining techniques and Networks (DSN), 2016 46th Annual IEEE/IFIP International Conference on.
with particle swarm optimization’, International Journal of Database Theory and IEEE, 2016. pp. 407–418
Application, 2013, 6, (5), pp. 153–164 47 Liu, X., Nielsen, P.S.: ‘Regression-based online anomaly detection for smart grid
37 Son, H., Kim, C.: ‘Forecasting short-term electricity demand in residential sector data’, arXiv preprint arXiv:160605781, 2016,
based on support vector regression and fuzzy-rough feature selection with particle 48 Menon, D.M., Radhika, N. ‘Anomaly detection in smart grid traffic data for home
swarm optimization’, Procedia Engineering, 2015, 118, pp. 1162–1168 area network’. In: International Conference on Circuit, Power and Computing
38 Gajowniczek, K., Zabkowski,
˛ T.: ‘Short term electricity forecasting using individ- Technologies (ICCPCT), 2016. IEEE, 2016. pp. 1–4
ual smart meter data’, Procedia Computer Science, 2014, 35, pp. 589–597 49 Chen, C., Cook, D.J.: ‘Energy outlier detection in smart environments.’, Artificial
39 Zufferey, T., Ulbig, A., Koch, S., Hug, G. ‘Forecasting of smart meter time series Intelligence and Smarter Living, 2011, 11, pp. 07
based on neural networks’. In: Workshop „Data Analytics for Renewable Energy 50 Nagi, J., Yap, K.S., Tiong, S.K., Ahmed, S.K., Mohamad, M.: ‘Nontechnical loss
Integration (DARE), European Conference on Machine Learning and Principles detection for metered customers in power utility using support vector machines’,
and Practice of Knowledge Discovery in Databases (ECML PKDD), Riva del IEEE transactions on Power Delivery, 2010, 25, (2), pp. 1162–1171
Guarda, 2016. pp. 19–23 51 Jokar, P., Arianpoo, N., Leung, V.C.: ‘Electricity theft detection in ami using cus-
40 Marino, D.L., Amarasinghe, K., Manic, M. ‘Building energy load forecasting tomers’ consumption patterns’, IEEE Transactions on Smart Grid, 2016, 7, (1),
using deep neural networks’. In: 42nd Annual Conference of the IEEE Industrial pp. 216–226
Electronics Society, IECON 2016, 2016. pp. 7046–7051 52 Biswas, P.P., Cai, H., Zhou, B., Chen, B., Mashima, D., Zheng, V.W.: ‘Electric-
41 Fernandes, K.C., Sardinha, R., Rebelo, S., Singh, R. ‘Electric load analysis and ity theft pinpointing through correlation analysis of master and individual meter
forecasting using artificial neural networks’. In: 2019 3rd International Conference readings’, IEEE Transactions on Smart Grid, 2019, pp. 1–1