Sales Forecasting IJsing Neural Networks
Frank M. Thiesing and Oliver Vornberger
Department of Mathematics and Computer Science
University of Osnabriick
D-49069 Osnabruck, Germany
frank@informat,[Link]
Abstract As an answer to the weakness of statistical meth-
ods in forecasting multidimensional time series an al-
Neural networks trained with the back-propagation ternative approach gains increasing attraction: neu-
algorithm are applied t o predict the future values of ral networks [5]. The practicability of using neural
time series that consist of the weekly demand o n item.5 networks for economic forecasting has already been
an a supermarket. T h e influencing indicators of prices, demonstrated in a variety of applications, such as stock
advertising campaigns and holidays are taken into con- market and currency exchange rate prediction, market
sideration. The design and implementation of a neural analysis and forecasting time series of political econ-
network forecasting system i s described that has been omy [6, 3, 1, 21.
developed as a prototype f o r the headquarters of a Ger- The approaches are based on the idea of training a
m a n supermarket company t o support the management feed-forward multi-layer network by a supervised train-
in the process of determining the expected sale figures. ing algorithm in order t o generalize the mapping be-
T h e performance of the networks is evaluated by com- tween the input and output data and t o discover the
paring t h e m t o two prediction techniques used in the implicit rules governing the movement of the time se-
supermarket now. T h e comparison shows that neural ries and predict its continuation in the future. Most of
nets outperform the conventional techniques with re- the proposals deal with one or only few time series.
gard t o the prediction quality. In this paper, neural networks trained with the back-
propagation algorithm [7] are applied t o predict the fu-
ture values of 20 time series that consist of the weekly
demand on items in a German supermarket. An ap-
1. Introduction propriate network architecture will be presented for a
mixture of both explanatory and time series forecast-
A central problem in science is predicting the fu- ing. Unlike many other neural prediction approaches
ture of temporal sequences. Examples range from fore- described in the literature, we compare the forecasting
casting the weather to anticipating currency exchange quality of the neural network to two prediction tech-
rates. The desire to know the future is often the driv- niques currently used in the supermarket. This com-
ing force behind the search for laws in science and eco- parison shows that our approach produces good results.
nomics.
In recent years many sophisticated statistical meth.- 2. Time Series Considered
ods have been developed and applied to forecasting
problems [8],however, there are two major drawback.s The times series used in this paper consist of the
to these methods. First for each problem an individual sales information of 20 items in a product group of a
statistical model has t o be chosen that makes some as- supermarket. The information about the number of
sumptions about underlying trends. Second the power items sold and the sales revenue are on a weekly ba-
of deterministic data analysis can be exploited for sin- sis starting in September 1994. There are important
gle time series with some hidden regularity (though influences on the sales that should be taken into con-
strange and hard to see but existent), however, this ap- sideration: advertising campaigns sometimes combined
proach fails for multidimensional time series with mu- with temporary price reductions; holidays shorten the
tual non-linear dependencies. opening hours; the season has an effect on the sales of
0-7803-4122-8/97 $10.0001997 IEEE LldJ
the considered items.
We take the sales information, prices and advertising past presence future
campaigns from the cash registers and the marketing
team of the supermarket. The holidays are calculated. n n I
For the season information we use the time series of the
turnover sum in DM of all items of this product group
as an indicator. Its behavior over a term of 19 months
is shown in figure 1.
1
n =2
weeks 3611994 to 131996 legend
turnover sum (DM) / turnover sum
I
3000
2500-
2000 -
1500-
13000
2500
2000
1500 price
r
1000- 1000
111111
15 -
0
36 41 46 5 1 4 Y : i i
14 19 24 2Y 34 39 44 49 2
II LLL 7 II
7 12
I
:1 holldaya
I
Figure 2. Input and output of the MLP
Figure 1. Turnover sum in DM of the product
group September 1994 to March 1996 series x = (xt):
xt - min(s)
We use feed-forward multilayer perceptron (MLP) zt = . 0.8 t- 0.1 resp.
max(z) - min(x)
networks with one hidden layer together with the back-
propagation training method. In order to predict the Zt = xt -
C . U
’+ 0.5
future sales the past information of n recent weeks is
given in the input layer. The only result in the output where min and max are the minimum and maximum
layer is the sale for the next week. values of time series x and p and U are the average
Due to the purchasing system used in the supermar- and the standard deviation. c is a factor to control the
ket there is a gap of one week between the newest sale interval of the values.
value and the forecasted week. In addition the pric- For the prices the most effecting indicator is the
ing information, advertising campaigns and holidays price change. So the prices are modeled as follows:
are already known for the future, when the forecast is
calculated. This information is also given to the input 0.9 : price increases
within
layer as shown in figure 2. prit := 0.0 : price keeps equal
week t
-0.9 : price decreases
3. Preprocessing the Input Data For both the time series of holidays and advertising
campaigns we tested binary coding and linear aggrega-
An efficient preprocessing of the data is necessary tion to make them weekly. Their indicators are:
to input it into the net. In general it is better to
1
transform the raw time series data into indicators that 0.9 : if there is a holiday resp.
represent the underlying information more explicitly. yt := advertising within week t resp.
Due t o the sigmoidal activation function of the back- 0.0 : otherwise
propagation algorithm the sales information must be number of advertising resp. holidays within t
yt :=
scaled t o IO, l[. The scaling is necessary t o support 6
the back-propagation learning algorithm [4].We tested (normalized number of special days within week t )
several scalings ( z t ) for the sale and the turnover time
2126
4. Experimental Results series and the a-yscaling for sales. The learning rate
of the back-propagation algorithm was set to 0.3 with
To determine the appropriate configuration of the a momentum of 0.1. The initial weights were chosen
feed-forward MLP network several parameters have from [-0.5, 0.51 by chance. The training was validated
been varied: by 12 patterns and stopped at the minimum error.
modeling of input time series
4.4. Comparison of Prediction Techniques
width of the sliding time window
The results of the forecasting accuracy are calcu-
the number of hidden neurons
lated for the successive prediction of the 22 weeks
interval of the initial random weights 4411995 to 13/1996. In this weeks there are influences
of many campaigns and Christmas holidays.
training rate and momentum To measure the error the root mean squared error
number and selection of validation patterns (RMSE) is divided by the mean value (Mean) of the
time series. This is done for all the 20 items and the
dealing with overfitting average value is show in table 1. Theil’s U-statistic is
calculated as well.
To evaluate the efficiency of the neural network a p
proach several tests have been performed. We compare
the prediction error of a naive (“Naive”) and a statis- Table 1. Measuring forecasting accuracy, av-
tical prediction method ( “MovAvg”) t o the successive erage of20 items
prediction by neural networks (“Neural”).
4.1. Naive Prediction Accuracy Measure I( Neural I MovAvg I Naive 1
I RMSE/Mean I 0.84 I 1.01 I 1.16 I
The naive prediction method uses the last known
value of the time series of sales as the forecast value for
the future. In our terms: &+I := Q-1. This forecast-
ing method is often used by the supermarket’s person- Based on the information in table 1 the naive ap-
nel. proach is outperformed by the two other methods. For
18 of the 20 items the prediction by the neural network
4.2. Statistical Prediction is better than the statistical prediction method.
A close inspection of the times series favored by the
The statistical method is currently being used b y statistical approach shows that these are very noisy
the supermarket’s headquarters t o forecast sales and without any implicit rules that could be learned by the
to guide personnel responsible for purchasing. It calcu- neural network. Especially one of these items has an
lates the moving average of a maximum of nine recent average weekly sale of less than 4 items.
weeks, after these sale values have been filtered frorn Figure 3 shows the predicted values for item 468978
exceptions and smoothed. calculated by the statistical and neural approach. The
price, advertising and holiday information is included
4.3. Neural Prediction in this figure.
We reached good results for n = 2 recent values of
the sale time series in the sliding window. The other 5. Conclusions and Future Research
inputs are, one neuron each: both the difference of the
sale (z: = zt - xt-1) and the turnover of the whole For a special group of items in a German super-
group of items for the last week and the holiday, ad- market neural nets have been trained to forecast fu-
vertising and pricing information for the week t o ble ture demands on the basis of the past data augmented
predicted. with further influences like price changing, advertising
Thus, for each item a net with 7 input neurons and 4 campaigns and holiday season information. The ex-
hidden neurons is trained for a one week ahead forecast perimental results show that neural nets outperform
with a gap of one week. We reached better results with the naive and statistical approaches that are currently
the binary scaling for the holiday and advertising time being used in the supermarket.
2127
weeks 1311995 to 1311996 legend
Price in DM Sale in pieces 0,, Sale 468978
t ,f Price 468978
6.00
5.00
V Sale Prediction 468978
4.00 30
Sale Prediction 468978
25 Neural Network
3.00
20
2.00 15
10
[Link]
5
0.00
113 17 21 25 29 33 37 41 45 49 1 5 9 13
I 0 II na Advertising campaigns 468978
Ill I l l I II Holidays
Figure 3. Comparison of sale prediction for an item by statistical and neural approach
Our procedure preprocesses the data of all kinds of References
time series in the same manner and uses the same net-
work architecture for the prediction of all 20 time series [l] K. Chakraborty, K. Mehrotra, C. Mohan, and S. Ranka.
of sales. The parameter optimization is based on all of Forecasting the behaviour of multivariate time series us-
the time series instead of on one special item. ing neural networks. Neural Networks, 5:961-970, 1992.
The program runs a a prototype and handles only [2] B. F’reisleben and K. Ripper. Economic forecasting
a small subset of the supermarket’s inventory. Future using neural networks. In Proceedings of the 1995
IEEE International Conference on Neural Networks,
work will concentrate on the integration of our fore-
volume 2, pages 833-838, Perth, W.A., 1995. IEEE.
casting tool into the whole enterprise data flow pro- [3] A. Refenes, M. Azema-Barac, L. Chen, and [Link]-
cess. Since a huge number of varying products have SOS. Currency exchange rate prediction and neural net-
to be managed a selection process has to be installed work design strategies. Neural Computing & Applica-
that discriminates between steady time series suitable tions, 1(1):46-58, 1993.
for conventional methods and chaotic candidates which [4] H. Rehkugler and H. G. Zimmermann. Neuronale Netze
will be processed by neural nets. an der Okonomie (in German; Neural Networks an Eco-
The prototype is part of a forecasting system that is nomics). Verlag Vahlen, Munchen, 1994.
able to take the raw data, do the necessary preprocess- [5] R. Rojas. Neural Nets. Springer, 1996.
ing, train the nets and produce an appropriate fore- [6] E. Schoneburg. Stock price prediction using neural net-
cast. The next steps will be the development of addi- works: An empirical test. Neurocomputing, 2 ( l), 1991.
[7] V. R. Vemuri and R. D. Rogers. Artificial Neural Net-
tional adaptive transformation techniques and methods works - Forecasting Time Series. IEEE Computer So-
to test the significance of inputs which can be used to ciety Press, 1994.
reduce the complexity of the nets. [8] A. S. Weigend and N. A. Gershenfeld. Tzme Series
Prediction: Forecasting the Future and Understanding
the Past. Addison-Wesley, 1994.
2128