Hidden Markov Models for Stock Prediction
Hidden Markov Models for Stock Prediction
Abstract— The stock market presents a challenging environ- complex mathematical models and statistical techniques. It is
ment for accurately predicting future stock prices due to its estimated that 50 percent of stock trading volume in the U.S.
intricate and ever-changing nature. However, the utilization of is currently being driven by algorithmic trading [3].
advanced methodologies can significantly enhance the precision
of stock price predictions. One such method is Hidden Markov While algorithmic trading offers significant advantages for
Models (HMMs). HMMs are statistical models that can be used institutional investors, retail investors should exercise caution
to model the behavior of a partially observable system, making when relying solely on algorithmic strategies. The success of
them suitable for modeling stock prices based on historical data.
arXiv:2310.03775v1 [eess.SY] 5 Oct 2023
with Genetic Algorithms, have been utilized to detect temporal Let xk ∈ Rn be the vector containing the probabilities of
patterns [17], [18]. Bollen et al. scraped data from Twitter to being in each state at time k, the system evolves according to:
forecast the stock market based on user mood in 2011 [19].
In 2005, Hassan et al. employed Hidden Markov Models xTk+1 = xTk A (2)
for time series prediction [20], building upon the concepts We refer to the initial distribution of probabilities as
introduced in Rabiner’s influential tutorial published in 1989 π = {π1 , π2 . . . πn }.
[21]. Their work demonstrated the effectiveness of HMMs,
leading to their widespread adoption in subsequent studies
[22], [23], [24]. B. Hidden Markov Models
In this project, we aimed to replicate and extend the The model presented above implicitly assumes that each
findings reported by Gupta et al. in their 2012 study [25]. We state corresponds to an observable (physical) event. Markov
used Mathworks Statistics and Machine Learning Toolbox™. chains have proven to be valuable tools for modeling sequen-
Moreover, we decided to make our project open-sourced [26], tial data in various domains. However, many real-world scenar-
hoping to foster collaboration and encourage the research ios involve underlying states that influence observations but are
community to build upon our work, replicate our experiments, not directly observable. This limitation led to the development
and explore further enhancements to advance the field. of Hidden Markov Models, which extend the basic Markov
chain model by introducing hidden or unobservable states that
III. A PPROACH affect the observed data.
A. Markov Chains The hidden state process of a HMM is a Markov chain,
where each state generates an observation having a certain
A Markov chain is a stochastic model that represents a probability distribution that only depends on the state itself.
sequence of events or states. In our analysis, we will specifi- Let O = {Ok |Ok ∈ O, k = 1, . . . , T } be the observed
cally focus on first-order Markov chains, which adhere to the sequence, where O is the set of possible observations. The
Markov property. Let S = {S1 , S2 , . . . , Sn } be the set of all hidden states are assumed to evolve according to Equation 2.
possible states, and X = {Xk |Xk ∈ S, k = 1, . . . , T } be the
The probability of symbol o ∈ O being emitted by state
states time series. The Markov property states that for any
Si is described by the emission probability function bi : o 7→
k ≥ 0 and states X0 , . . . , Xk :
bi (o) ∈ [0, 1], ∀i.
P (Xk+1 = Sj |Xk = Si , Xk−1 , . . . , X0 (1) Both in the current and following sections, we refer to the
= P (Xk+1 = Sj|Xk = Si ) emission probability matrix B = {bij = bi (oj )}.
Fig. 2. Hidden Markov Chain Example with two states and two possible
outputs.
The procedure used to solve this problem involves comput- state at each time step. It complements the forward procedure
ing the forward variable, denoted as αk (i), which represents by computing the backward variable, denoted as βk (i), which
the probability of being in state Si at time k and observing represents the probability of being in state Si at time k and
the sequence up to time k. It is computed recursively for each observing the remaining sequence from time k + 1 to the end.
time step k and state Si . The backward variable is computed recursively for each time
At the initialization step, the forward variable for the ini- step k and state Si . At the initialization step, the backward
tial time step k = 1 is calculated as: α1 (i) = πi · bi (O1 ), ∀i, variable for the final time step is set as: βT (i) = 1, ∀i.
where πi is the initial probability of being in state Si and For each time step k < T , the backward variable βk (i) is
bi (O1 ) is the emission probability for the first observation O1 . computed as the sum of the probabilities of all possible paths
For each subsequent time step k > 1, the forward variable that can be generated from state Si at time k to the end of the
αk (i) is computed as the sum of the probabilities of all sequence. This is expressed by the following formula:
possible paths that could have generated the observations up to n
time k and reached state Si . This is expressed by the following
X
βk (i) = βk+1 (j) · aij · bj (Ok+1 ) (5)
formula: j=1
n
αk (i) =
X
αk−1 (j) · aji · bi (Ok ), ∀i (3) After computing the forward and backward variables, they
j=1
are used to refine the parameters of the HMM based on the
observed sequence. In fact, let ξk (i, j) be the probability of
By recursively calculating the forward variables for each being in state Si at time k and state Sj at time k + 1, given
time step, the algorithm evaluates the probability of observing the model and the observation sequence. It can be computed
the entire sequence, given the HMM parameters. as:
Finally, to obtain the overall probability of the observed
sequence, the forward procedure computes the sum of the P (Xk = i, Xk+1 = j, O|π, A, B)
ξk (i, j) = (6)
forward variables at the final time step T : P (O|π, A, B)
αk (i) · aij · bj (Ok+1 ) · βk+1 (j)
n = Pn Pn
h=1 αk (r) · arh · bh (Ok+1 ) · βk+1 (h)
X
P (O|π, A, B) = αT (i) (4) r=1
i=1 Let’s also define γk (i) as the probability of being in state Si
2) Decoding Problem: The decoding problem involves de- at time k, given the observation sequence and the model:
termining the most likely sequence of hidden states given an
observation sequence and the model. Given an HMM with γk (i) = P (Xk = i|O, π, A, B)
the parameters π, A, and B, and an observed sequence O, αk (i) · βk (i)
= Pn (7)
we want to find the hidden state sequence X that maximizes j=1 αk (j) · βk (j)
P (X|O, π, A, B).
The solution to the decoding problem is commonly ad- We can relate those quantities:
dressed using the Viterbi algorithm. n
X
The algorithm works by iteratively calculating the most γk (i) = ξk (i, j) (8)
likely path to each state at each time step, considering both j=1
the current observation and the previous states probabilities. PT −1
At each time step, it computes the probability of being in each Moreover, k=1 γk (i) represents the expected number of
PT −1
state and tracks the most probable path leading to that state. transitions from state Si , and k=1 ξk (i, j) represents the
3) Parameter Estimation Problem: The parameter estima- expected number of transitions from state Si to state Sj .
tion problem in HMMs involves adjusting the model param- The update step involves estimating the new values for the
eters to maximize the probability of an observed sequence. initial state probabilities, transition probabilities, and emission
Given an observation sequence O, we want to estimate the probabilities. For the initial state probabilities, the updated
optimal values for the parameters π, A, and B that maximize values are calculated as the normalized forward-backward
P (O|π, A, B). probabilities at the initial time step:
The solution to the this problem is typically addressed using
the forward-backward algorithm, also known as the Baum- πi = γ1 (i) (9)
Welch algorithm. The transition probabilities are updated based on the ex-
The Baum-Welch algorithm is a specific implementation pected number of transitions between states, normalized over
of the Expectation–Maximization (EM) algorithm, tailored the expected number of transitions from each state
for HMMs. It iteratively performs three main steps: forward,
PT −1
backward, and update step. In the forward step, the algorithm ξk (i, j)
computes the probability of being in a specific state at each aij = Pk=1 T −1
(10)
k=1 γk (i)
time step, given the observed sequence up to that point. This
is the evaluation problem explained before. The estimate for the emission probability bj (o) — the prob-
In the backward step, the algorithm computes the probability ability of observing symbol o from state Sj — is calculated by
of observing the remaining part of the sequence from a given evaluating the expected number of times in which o is emitted
UNIVERSITÀ DEGLI STUDI DI NAPOLI FEDERICO II - IDENTIFICATION AND OPTIMAL CONTROL 4
from state j. This value is then normalized by the expected utilised for the training and testing of the HMM: hmmtrain,
number of transitions from state Sj : hmmdecode and fitgmdist.
PT The model was trained and tested on the historical daily
k=1 γ k (j) · δ(Ok , o) prices of Apple, IBM and Dell stocks that are publicly avail-
bj (o) = PT (11) able on the web. Each observation in our dataset comprises
k=1 γk (j)
three distinct values representing the daily fractional change,
where δ(Ok , o) is the Kronecker delta function that evaluates fractional high, and fractional low prices.
to 1 when Ok is equal to the observed symbol o, and 0
Close − Open High − Open Open − Low
otherwise. Ok = , ,
Open Open Open
This process iteratively refines the model parameters until
convergence, where they stabilize, meaning that the likelihood
of the sequence is maximized. Ok := f racChange, f racHigh, f racLow (13)
Choosing the initial conditions wisely is crucial as it can The array Ok is a three-dimensional array consisting of
significantly impact the estimation of the HMM parameters. real values. Since the probability of guessing any real value
By selecting appropriate initial conditions, the Baum-Welch is mathematically zero, it was necessary to discretize the
algorithm is more likely to converge to some parameters that observations. The number of points used for the discretization
provide accurate modeling of the underlying dynamics. is specified in Table I.
The resulting probability density function served as the initial On day i, pi is the predicted stock closing price, ci is the actual
estimate for the emission matrix. Moreover, we assigned a stock closing price, and np is the total number of predictions.
uniform distribution of probabilities as the initial estimate for Table II shows our results on two different stocks, compared
the transition matrix. to the results from other papers [25], [24].
The training data for the HMM is constructed by using a
rolling window approach. In this approach, each observation TABLE II
sequence spans a fixed duration of 10 days. We refer to this C OMPARISON OF M EAN A BSOLUTE P ERCENTAGE E RROR
duration as latency. The window is shifted incrementally along Stock Our HMM- HMM + Fuzzy ARIMA /
the training period: the first sequence captures observations Name HMM MAP [25] Model [24] ANN
from the initial time point, while each following sequence Apple 1.50 1.51 1.77 1.80
incorporates new observations by sliding the window by one Inc.
IBM 0.68 0.61 0.78 0.97
day. The dataset is then provided as input to the MATLAB Corp.
function hmmtrain. This function utilizes the Baum-Welch
algorithm to estimate the transition and emissions matrices. Predicting the movement of stock prices for a single day is
These estimations are initialized with the initial guesses dis- a challenging task in itself. Achieving accuracy in predicting
cussed previously. multiple consecutive days becomes even more difficult and
often approaches the realm of impossibility. Recognizing
B. Prediction the potential utility of predicting whether the stock value
Following the training phase, we proceeded to test our will increase or decrease during the day, we introduced a
model by predicting the stock daily close prices for different novel evaluation metric called Directional Prediction Accuracy
time frames. For each day d within the target period, the (DPA). DPA measures the percentage of correct directional
prediction process involved the following steps: predictions, providing valuable information about the accuracy
1) We considered the last latency − 1 = 9 observations of our model.
np
available. These observations represent the preceding 9 1 X
days. DPA = δ sgn(pi − si ), sgn(ci − si ) · 100% (17)
np i=1
2) Next, we appended each possible output for the current
day d, creating a 10-day sequence. This sequence now In Equation 17, np is the total number of predictions, δ is
encompasses the 9 historical observations and one po- the Kronecker delta function, pi is the predicted stock closing
tential observation for the next day. There are nfc ·nfH ·nfL price, ci the actual closing price, and si is the stock opening
possibilities for the current day. price. Table III demonstrates that MAPE and DPA are not
3) We computed the probability for each sequence to be equivalent measures as they are in general not correlated.
generated from our trained model. Finally, we selected
the observation with the highest emission probability as TABLE III
D IRECTIONAL P REDICTION ACCURACY
the observation for the next day.
In certain cases, there may arise situations where the prob- Stock Name MAPE DPA Training Testing
2003-02-10 2004-09-13
ability of emitting the historical observations, along with any Apple Inc. 1.50% 52.11%
2004-09-10 2005-01-21
hypothetical observation, becomes 0 or very close to 0. This 1 2003-02-10 2004-10-13
1.73% 63.27%
can occur due to numerical errors or limitations inherent the 2004-09-10 2005-01-21
2021-01-04 2023-01-03
model. However, we have found that incorporating a dynamic 1.05% 53.06%
2022-01-03 2023-06-30
window can enhance the performance of our model. 2017-01-03 2023-01-03
1.01% 40.57%
To address the issue, we have modified the prediction 2019-01-03 2023-06-30
2003-02-10 2004-10-13
algorithm as follows: if the highest probability obtained is 0, IBM Corp. 0.77% 54.55%
2004-09-10 2005-01-21
we repeat the prediction algorithm while gradually reducing 2003-02-10 2004-10-13
0.82% 57.58%
the latency by one day. By reducing the latency, we aim to find 2004-09-10 2005-01-21
2003-02-10 2004-09-13
a viable solution where the emitted probabilities are non-zero. 0.68% 60.23%
2004-09-10 2005-01-21
This process of reducing latency is repeated iteratively until a 2 2021-01-04 2023-01-04
0.88% 52.73%
solution is found, while ensuring that the historical sequence 2023-01-03 2023-07-11
2021-01-04 2023-01-03
remains sufficiently long. In our case, we have set a minimum Dell Inc. 1.45% 60.32%
2022-01-03 2023-07-11
requirement of four days for the historical sequence.
Figure 4 remarks the difference between the two indicators:
V. R ESULTS the candlestick chart shows the actual prices for AAPL during
We used multiple metrics to evaluate the performance of 1 Figure 3 shows predicted and actual close values for this train. Green lines
our models. Mean Absolute Percentage Error (MAPE) is indicate predictions that guessed right the sign of the fractional change with
calculated between the actual and predicted stock closing respect to the opening price. Therefore, 63.27% of the predictions are correct
prices. in sign. The high MAPE reflects the accuracy of the predictions that often
np deviate from the actual close value, although being correct in sign.
1 X |pi − ci | 2 The results of this train are shown in Figure 5: since the MAPE is low,
MAPE = · 100% (16)
np i=1 |ci | predicted close values are, on average, more accurate.
UNIVERSITÀ DEGLI STUDI DI NAPOLI FEDERICO II - IDENTIFICATION AND OPTIMAL CONTROL 6
further investigation to capitalize on recent market dynamics. [14] S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic
Our work contributes to the growing field of stock market time series using support vector machines,” in Neural Networks for
Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing
prediction and provides valuable insights for traders and Society Workshop, 1997, pp. 511–520.
financial analysts. [15] F. E. Tay and L. Cao, “Application of support vector machines in
July 2023 financial time series forecasting,” Omega, vol. 29, no. 4, pp. 309–317,
2001.
[16] P.-F. Pai and C.-S. Lin, “A hybrid ARIMA and support vector machines
R EFERENCES model in stock price forecasting,” Omega, vol. 33, no. 6, pp. 497–505,
2005.
[1] “Institutional investors definition — nasdaq.” [Online]. Available:
[17] H. jung Kim and K. shik Shin, “A hybrid approach based on neural
https://www.nasdaq.com/glossary/i/institutional-investors
networks and genetic algorithms for detecting temporal patterns in stock
[2] “Investor bulletin: Hedge funds.” [Online]. Avail-
markets,” Applied Soft Computing, vol. 7, no. 2, pp. 569–576, 2007.
able: https://www.investor.gov/introduction-investing/investing-basics/
[18] K. jae Kim and I. Han, “Genetic algorithms approach to feature
investment-products/private-investment-funds/hedge-funds
discretization in artificial neural networks for the prediction of stock
[3] “Algo or algorithmic trading definition — nasdaq.” [Online]. Available:
price index,” Expert Systems with Applications, vol. 19, no. 2, pp. 125–
https://www.nasdaq.com/glossary/a/algo-trading
132, 2000.
[4] E. F. Fama, “Random walks in stock market prices,” Financial Analysts
[19] J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock
Journal, vol. 51, no. 1, pp. 75–80, 1995, publisher: Routledge.
market,” Journal of Computational Science, vol. 2, no. 1, pp. 1–8, 2011.
[5] A. W. Lo and A. C. MacKinlay, “Stock market prices do not follow
[20] M. R. Hassan and B. Nath, “Stock market forecasting using hidden
random walks: Evidence from a simple specification test,” The Review
markov model: a new approach,” in 5th International Conference on
of Financial Studies, vol. 1, no. 1, pp. 41–66, 2015-04.
Intelligent Systems Design and Applications (ISDA’05), 2005, pp. 192–
[6] F. Longin and B. Solnik, “Is the correlation in international equity returns
196.
constant: 1960–1990?” Journal of International Money and Finance,
[21] L. Rabiner, “A tutorial on hidden markov models and selected applica-
vol. 14, no. 1, pp. 3–26, 1995.
tions in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2,
[7] Z. Ding, C. W. J. Granger, and R. F. Engle, “A long memory property of
pp. 257–286, 1989.
stock market returns and a new model,” Journal of Empirical Finance,
[22] M. Zhang, X. Jiang, Z. Fang, Y. Zeng, and K. Xu, “High-order hidden
vol. 1, no. 1, pp. 83–106, 1993.
markov model for trend prediction in financial time series,” Physica A:
[8] T. D. Matteo, T. Aste, and M. M. Dacorogna, “Scaling behaviors in
Statistical Mechanics and its Applications, vol. 517, pp. 1–12, 2019.
differently developed markets,” Physica A: Statistical Mechanics and
[23] Y. Zhou and J. Zheng, “Research on stock market forecasting model
its Applications, vol. 324, no. 1, pp. 183–188, 2003.
based on data mining and markov chain,” in 2022 IEEE 2nd Interna-
[9] R. N. Mantegna and H. E. Stanley, “Scaling behaviour in the dynamics
tional Conference on Mobile Networks and Wireless Communications
of an economic index,” Nature, vol. 376, no. 6535, pp. 46–49, 1995.
(ICMNWC), 2022, pp. 1–5.
[10] K. R. French, G. W. Schwert, and R. F. Stambaugh, “Expected stock
[24] M. R. Hassan, “A combination of hidden markov model and fuzzy
returns and volatility,” Journal of Financial Economics, vol. 19, no. 1,
model for stock market forecasting,” Neurocomputing, vol. 72, no. 16,
pp. 3–29, 1987.
pp. 3439–3446, 2009.
[11] W. Y. Lee, C. X. Jiang, and D. C. Indro, “Stock market volatility,
[25] A. Gupta and B. Dhingra, “Stock market prediction using hidden markov
excess returns, and the role of investor sentiment,” Journal of Banking
models,” in 2012 Students Conference on Engineering and Systems,
& Finance, vol. 26, no. 12, pp. 2277–2299, 2002.
2012, pp. 1–4.
[12] X. Gabaix, P. Gopikrishnan, V. Plerou, and H. E. Stanley, “A theory
[26] (2023) valentinomario/HMM-stock-market-prediction. [Online]. Avail-
of power-law distributions in financial market fluctuations,” Nature, vol.
able: https://github.com/valentinomario/HMM-Stock-Market-Prediction
423, no. 6937, pp. 267–270, 2003, publisher: Nature Publishing Group
UK London.
[13] R. N. Mantegna, Z. Palágyi, and H. E. Stanley, “Applications of
statistical mechanics to finance,” Physica A: Statistical Mechanics and
its Applications, vol. 274, no. 1, pp. 216–221, 1999.