Predicting The Stock Market Using Machine Learning
Predicting The Stock Market Using Machine Learning
by
DISSERTATION
Presented to the Swiss School of Business and Management, Geneva
In Fulfilment
Of the Requirements
For the Degree
May 2025
PREDICTING THE STOCK MARKET USING MACHINE LEARNING
by
Partha Majumdar
APPROVED BY
RECEIVED/APPROVED BY:
SSBM Representative
Dedication
I deeply appreciate the Professors who mentored me in this journey – Dr. Hanadi
iv
ABSTRACT
PREDICTING THE STOCK MARKET USING MACHINE LEARNING
Partha Majumdar
2025
This research examines how different machine learning models perform in predicting stock
market trends, particularly by converting stock market data into a stationary format. The
main hypothesis posits that transforming stock market time series into a stationary state
before utilising predictive models enhances forecasting accuracy. To evaluate this,
historical data from the Indian stock market, along with pertinent macroeconomic
indicators, was gathered and prepared to create both raw and stationary datasets. The study
employed four deep learning models – Artificial Neural Networks (ANN), Recurrent
Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units
(GRU) – on each dataset utilising a consistent sliding-window method.
The models were evaluated across multiple metrics, primarily R2 and Mean Squared Error
(MSE), to assess their predictive performance over time. The results clearly indicate that
stationarising the data enhances model stability and predictive accuracy. Across all
architectures, models trained on stationary data consistently outperformed those trained on
raw data. Among the four models, GRU demonstrated the strongest performance,
especially in identifying intricate temporal relationships within stationary datasets. While
the GRU model faced challenges with raw data, its performance greatly improved when
the input was preprocessed for stationarity, exceeding that of ANN, RNN, and LSTM
models in most instances.
The research indicated that traditional models like ANN struggled to identify trends in
fluctuating financial data. In contrast, recurrent models such as RNN and LSTM showed
some improvements. Yet, the GRU model, with its streamlined gating mechanisms and
v
efficient memory usage, surpassed its counterparts, particularly when working with
appropriately transformed input data. These results highlight the significance of data
preparation and model choice in financial forecasting.
The study finds that GRU models using stationary data yield the highest accuracy and
reliability among the evaluated options. It also identifies areas for future exploration, such
as hybrid architectures like CNN-GRU, attention mechanisms, and transformer-based
methods. The methodologies and insights shared in this research lay the groundwork for
developing advanced predictive systems that can support investors, analysts, and
researchers in tackling the challenges of financial markets.
vi
TABLE OF CONTENTS
List of Tables ..................................................................................................................... ix
List of Figures ..................................................................................................................... x
vii
4.9 Research Question Three: Limitations of Predicting the Stock
Market Using this Research Methodology ............................................. 120
4.10 Summary of Findings ...................................................................... 122
4.11 Conclusion ...................................................................................... 124
viii
LIST OF TABLES
Table 4. 1: The first 10 rows of the raw data that were extracted using suitable APIs through
Table 4. 2: Graphs for the complete extracted time series data for all the attributes........ 52
Table 4. 3: The first 10 rows of the stationary data that were obtained using tests and data
transformation through a Python program and compiled using a Python program. ......... 53
Table 4. 4: Statistics of Training an ANN on the Raw Stock Market Data. ..................... 56
Table 4. 5: Statistics of Training an ANN on the Stationary Stock Market Data. ............ 60
Table 4. 6: Statistics gathered by training an RNN on Raw Stock Market Data. ............. 66
Table 4. 7: Statistics gathered by training an RNN on Stationary Stock Market Data. .... 73
Table 4. 8: Statistics gathered by training an LSTM on Raw Stock Market Data. ........... 82
Table 4. 10: Statistics gathered by training a GRU on Raw Stock Market Data. ............. 99
Table 4. 11: Statistics gathered by training a GRU on Stationary Stock Market Data. .. 107
Table 4. 12: Statistics gathered during training of the different models in different data
Table 4. 13: Predictions made by the different models on the latest data in the dataset based
ix
LIST OF FIGURES
Figure 4. 1: Architecture of the ANN Model. This model was trained on both raw and
Figure 4. 2: Recording of the R2 value for predictions made on the test data for each training
data window using the ANN model on the raw data. ....................................................... 57
Figure 4. 3: Predictions made by the ANN Model trained on the raw stock market data. 58
Figure 4. 4: Recording of the R2 value for predictions made on the test data for each training
data window using the ANN model on the stationary data. ............................................. 62
Figure 4. 5: Predictions made by the ANN Model trained on the stationary stock market
data. ................................................................................................................................... 62
Figure 4. 6: Architecture of the RNN Model. This model was trained on both raw and
Figure 4. 7: Recording of the R2 value for predictions made on the test data for each training
data window using the RNN model on the raw data. ....................................................... 67
Figure 4. 8: Predictions made by the RNN Model trained on the raw stock market data. The
prediction window is different from that used for ANN because RNN and its variations
need a LOOPBACK window. Here, the LOOPBACK window is set to 15 days. ........... 68
Figure 4. 9: Training and Validation R2 observed while training the RNN model on raw
Figure 4. 10: Recording of the R2 value for predictions made on the test data for each
training data window using the RNN model on the stationary data. ................................ 74
Figure 4. 11: Training and Validation R2 observed while training the RNN model on
Figure 4. 12: Predictions made by the RNN Model trained on the stationary stock market
data. The prediction window is different from that used for ANN because RNN and its
x
variations need a LOOPBACK window. Here, the LOOPBACK window is set to 15 days.
........................................................................................................................................... 78
Figure 4. 13: Architecture of the LSTM Model. This model was trained on both raw and
Figure 4. 14: Recording of the R2 value for predictions made on the test data for each
training data window using the LSTM model on the raw data. ........................................ 83
Figure 4. 15: Training and Validation R2 observed while training the LSTM model on raw
Figure 4. 16: Predictions made by the LSTM Model trained on the raw stock market data.
The prediction window is different from that used for ANN because LSTMs need a
Figure 4. 17: Recording of the R2 value for predictions made on the test data for each
training data window using the LSTM model on the stationary data. .............................. 91
Figure 4. 18: Training and Validation R2 observed while training the LSTM model on
Figure 4. 19: Predictions made by the LSTM Model trained on the stationary stock market
data. The prediction window is different from that used for ANN because LSTMs need a
Figure 4. 20: Architecture of the GRU Model. This model was trained on both raw and
Figure 4. 21: Recording of the R2 value for predictions made on the test data for each
training data window using the GRU model on the raw data. ........................................ 100
Figure 4. 22: Training and Validation R2 observed while training the GRU model on raw
xi
Figure 4. 23: Predictions made by the GRU Model trained on the raw stock market data.
The prediction window is different from that used for ANN because GRUs need a
LOOPBACK window. Here, the LOOPBACK window is set to 15 days. .................... 105
Figure 4. 24: Recording of the R2 value for predictions made on the test data for each
training data window using the GRU model on the stationary data. .............................. 108
Figure 4. 25: Training and Validation R2 observed while training the GRU model on
Figure 4. 26: Predictions made by the GRU Model trained on the stationary stock market
data. The prediction window is different from that used for ANN because GRUs need a
LOOPBACK window. Here, the LOOPBACK window is set to 15 days. .................... 112
xii
CHAPTER I:
INTRODUCTION
mobilisation and economic progress. The idea of organised equity trading traces back to
the late 15th century, with Antwerp, Belgium, hosting the first known stock market. The
modern stock exchange we recognise emerged in Amsterdam in 1611, where the Dutch
East India Company was the first entity to be formally traded. As time passed, stock
Stock Exchange, established in 1790, and the New York Stock Exchange, which began
operations in 1903, both of which have significantly influenced financial history. In India,
the Bombay Stock Exchange (BSE), founded in 1875, was Asia’s first stock exchange and
Since the establishment of stock markets, predicting stock price movements has
the Efficient Market Hypothesis proposed by Fama in 1970, contend that stock prices
incorporate all available information and thus behave randomly. Burton Malkiel’s notable
work supports this view, comparing stock price fluctuations to a coin toss—unpredictable
and lacking discernible patterns (Malkiel, 1973). Nonetheless, while the randomness of
efficiency, sparking renewed interest in revealing subtle patterns within financial data (Lo
1
The digital era has initiated a fundamental change, where the expansive availability
of data and advancements in computing have facilitated the use of machine learning for
financial forecasting. Machine Learning, known for its capacity to model intricate, non-
(Foote, 2021). Researchers have utilised it across various financial scenarios, such as
analysing trading volume trends via Google Trends (Preis et al., 2013) and gauging investor
sentiment from social media for market forecasts (Bouktif et al., 2020). These
Recent research has integrated algorithmic logic and fuzzy systems to enhance
prediction accuracy. For example, Davies et al. (2022) created a stock market forecasting
model for the Nigerian Stock Exchange utilising Type-2 Fuzzy Logic, which demonstrated
frameworks have gained traction because of their ability to capture temporal dependencies,
Despite advancements, a core issue persists: stock market data tends to be noisy,
statistical characteristics of a time series over time, such as fluctuating mean or variance,
which can hinder any model's ability to produce trustworthy forecasts (Shumway &
Stoffer, 2017). In these situations, patterns identified in historical data might not apply well
converting this data into a stationary format can improve modelling results by stabilising
its statistical features (Rathore & Mehta, 2025; Hyndman & Athanasopoulos, 2018). A
2
stationary dataset enables a predictive model to concentrate on fundamental dynamics
financial data through suitable statistical transformations can significantly impact the
analyse the effect of stationarity on the quality of insights gained from stock market time
series. As such, three key questions are researched. The first question investigates how
transforming financial time series into a stationary format influences the accuracy of stock
market predictions when using machine learning. This inquiry aims to determine whether
enhanced preprocessing improves the underlying patterns that models can learn from. The
forecasting stock prices, particularly when trained on properly transformed data. Lastly,
the third question assesses the real-world constraints and challenges faced when applying
approaches are limited by data quality, model assumptions, and external uncertainties.
Through these questions, the research seeks to enhance the ongoing discussion on
opportunity to enhance forecasting reliability amid the growing uncertainties of the global
financial environment.
3
1.1 Research Problem
Forecasting the stock market is one of the most complex challenges in financial
research and data science. Despite the recent surge in machine learning and artificial
prevent accurate and reliable forecasting. A major contributor to this unpredictability is the
stochastic and non-stationary nature of financial time series data. Stock market indices and
prices often exhibit a random walk behaviour, where each new data point introduces
identically distributed over time (Mitra and Banerjee, 2023; Malkiel, 1973, p. 12). This
behaviour complicates the use of traditional statistical methods and deep learning
techniques, which depend on identifiable patterns within the data to generate meaningful
predictions.
While the Efficient Market Hypothesis (EMH) asserts that markets are entirely
efficient by reflecting all available information in prices, recent critiques highlight that
today's markets frequently exhibit behavioural anomalies and short-term inefficiencies that
advanced models may take advantage of (Gupta and Srivastava, 2024). In particular, the
increase in real-time data access and the progress in computational modelling have allowed
researchers to explore financial time series more thoroughly, uncovering signals that were
Despite this, many machine learning models are still being trained directly on raw
stock market data, which often exhibits non-stationarity and is affected by various
confounding factors. Iqbal and Kumar (2024) emphasise that raw financial data,
4
particularly in volatile market scenarios, tends to show heteroscedasticity, autocorrelation,
and fluctuating variance, characteristics that compromise the stability and learning
effectiveness of predictive models. This study contends that without converting such data
into a stationary format, where statistical properties like mean, variance, and
autocorrelation remain stable over time, the effectiveness of any predictive model, no
indicate that preprocessing financial data with stationarisation methods like differencing
(Rathore and Mehta, 2025). These techniques allow deep learning models to concentrate
shocks present in the original data. For example, Chatterjee and Yadav (2023) illustrated
in their extensive empirical analysis that LSTM and GRU networks trained on stationary
data significantly outperformed those using raw sequences in predicting NSE and BSE
are increasingly favoured in financial applications. Their efficient architecture and fewer
parameters make them particularly suitable for medium- to long-term forecasting, where
speed and accuracy are critical. According to Sharma and Dutta (2024), GRUs tackle the
vanishing gradient problem that affects traditional RNNs, offering faster convergence and
These benefits make GRUs a strong option for creating predictive models, especially in
cases where the input data has been carefully transformed into a stationary format.
5
This study builds upon these advancements and identifies the key research problem
as assessing whether transforming raw stock market data into a stationary format improves
the predictive accuracy of machine learning models. It aims to evaluate this proposition
using data from the National Stock Exchange (NSE) in the Indian stock market. The
outperforms models trained on unprocessed, non-stationary data. In doing so, this research
adds to the ongoing discussion about how advanced data preparation methods, combined
with appropriate model architecture, can greatly enhance forecasting reliability in dynamic
This research aims to critically assess whether converting stock market time series
data into a stationary format improves the predictive capabilities of machine learning
models. Stock market predictions have long intrigued both scholars and practitioners, yet
the rising volatility, non-linearity, and noise in today's financial markets present significant
obstacles for traditional econometric forecasting methods (Fama, 1970; Brockwell &
Davis, 1996). This study responds to these challenges by investigating whether modern
deep learning models, particularly Gated Recurrent Units (GRUs), deliver better outcomes
when trained on stationary compared to raw financial data. The primary hypothesis
suggests that preprocessing financial time series data to ensure stationarity enhances the
6
Stock market time series data frequently exhibit features like heteroscedasticity,
autocorrelation, and inconsistencies in mean and variance over time, which complicates
modelling (Shumway & Stoffer, 2017). Inadequate preprocessing, particularly for non-
stationary data, can confuse even sophisticated models, resulting in overfitting and poor
performance (Radecic, 2021; Iqbal & Kumar, 2024). Research indicates that methods such
time series into a stationary form, thus facilitating the identification of underlying structural
In addition, the study aims to assess and compare the effectiveness of different
machine learning models- ANN, RNN, LSTM, and GRU- in predicting stock prices,
especially after stationarisation of the data has been made stationary. Although Artificial
Neural Networks have historically been used for stock market predictions, their limitations
(Zhang et al, 1998). Recurrent Neural Networks brought memory into play with feedback
loops but faced issues with vanishing gradients. Long Short-Term Memory (LSTM)
networks and GRUs were specifically created to address these challenges (Hochreiter &
Schmidhuber, 1997; Cho et al, 2014). Notably, GRUS offer a simpler yet effective
alternative to LSTMs, utilising fewer parameters while still being capable of learning long-
term dependencies, which enhances their computational efficiency and ease of tuning
traditional and modern alternatives by systematically developing and testing them on both
raw and stationary datasets. This inquiry is particularly significant in light of the growing
7
application of machine learning in financial systems, where predictive accuracy and
enhances the experiment's realism and adaptability, allowing the model to respond to
changing market conditions over time (Zhang & Zhou, 2004; Mitra & Banerjee, 2023).
The Indian stock market, especially the National Stock Exchange (NSE) indices,
has been chosen as the primary dataset for this study. The NSE offers an extensive
historical dataset, and when combined with relevant macroeconomic indicators, such as
GDP, crude oil prices, gold and silver prices, and currency exchange rates, it supports a
The study also seeks to create a repeatable and transparent data pipeline,
encompassing data collection, transformation, and model evaluation, while utilising open-
source tools and publicly accessible datasets. By ensuring the research methodology is both
algorithmic trading and investment analysis (Foote, 2021; Hyndman et al., 2018).
This research has three main objectives: first, to evaluate the statistical and
predictive advantages of making stock market data stationary; second, to analyse the
performance of deep learning models, focusing particularly on GRUs; and third, to provide
timeframes. By merging financial theory with cutting-edge machine learning, this study
8
aims to deliver valuable insights and practical tools for investors, analysts, and researchers
The stock market has always been seen as an indicator of economic health and a
mirror of investor feelings, prompting intense interest from economists, analysts, investors,
and policymakers regarding its accurate forecasting. With the current availability of vast
data and the fast-paced advancements in machine learning, there is a pressing need to
methods. This study is important as it aims to connect classical time series econometrics
with modern deep learning models, specifically examining whether converting financial
data into a stationary format boosts the accuracy and dependability of stock market
predictions.
unexpected global health issues. Traditional linear models frequently struggle to account
for the complexity and chaotic characteristics of today's financial systems (Brockwell &
Davis, 1996; Hyndman & Athanasopoulos, 2018). On the other hand, machine learning
models, especially recurrent neural networks like LSTM and GRU, have shown great
Schmidhuber, 1997; Cho et al., 2014). Nevertheless, many of these models are routinely
trained on raw financial data that often lack sufficient preprocessing, which could hinder
9
their performance due to inherent non-stationarity and noise (Giles & Omlin, 1994;
This study makes a novel contribution to the growing field of financial data science
by focusing on transforming raw data into a stationary format before model training. Its
significance lies in its hypothesis that ensuring data stationarity allows the model to extract
meaningful patterns better, minimise overfitting, and generalise well across different
periods and economic conditions. As evidenced in recent works by Chatterjee and Yadav
(2023) and Rathore and Mehta (2025), data preprocessing, especially for stationarity, has
This research is particularly important for retail investors in emerging markets like
investors. The growth of mobile trading platforms, combined with the Indian government’s
efforts to promote digital financial inclusion, has enabled millions of retail investors to
engage in the stock market daily. However, many of these investors still do not have access
to advanced predictive tools that utilise high-performance machine learning. The objective
of this study is to develop a scalable and replicable forecasting framework that will
From an academic standpoint, this study adds valuable insights to the discourse on
RNN, LSTM, and GRU- under different data conditions. Such comparisons are crucial for
assessing applied machine learning, as they help determine best practices in architecture
10
selection, data preparation, and evaluation techniques. Furthermore, by incorporating a
range of macroeconomic indicators like gold, crude oil, silver prices, GDP, and currency
exchange rates, the study broadens its analytical scope, enhancing its relevance not just for
stock predictions but also for comprehensive economic forecasting models (Patel et al.,
networks and statistical tests such as Augmented Dickey-Fuller and KPSS, showcasing a
practical design for real-world use. By utilising open-source tools and publicly available
and vital aspects for integration in professional financial analytics (Foote, 2021; Buslim,
2021).
In essence, this study's importance stems from its capacity to guide both theoretical
quantitative metrics and empirical data. Meanwhile, practitioners find a framework that
may improve portfolio management, risk reduction, and strategic trading. As our world
becomes more influenced by data-driven choices, research like this is crucial for
11
1.4 Research Questions
This research focuses on the convergence of time series data transformation and
sophisticated machine learning techniques, with the goal of enhancing the precision and
challenge due to the market’s volatility, non-linear behaviour, and vulnerability to external
outcomes (Malkiel, 1973; Foote, 2021). Although there have been improvements in deep
prediction accuracy. Consequently, the primary question that drives this study is: How does
transforming stock market time series data into a stationary format influence the
accuracy of machine learning-based stock market forecasts? This inquiry arises from
evidence indicating that converting data into a stationary state enhances models'
capabilities to identify intrinsic patterns, supported by findings from Chatterjee and Yadav
(2023) and Rathore and Mehta (2025), who observed better forecasting performance in
This research also investigates the relative strengths of various machine learning
(ANNs) and recurrent architectures, such as standard RNNs, have been widely used for
financial time series. However, they frequently face challenges like the vanishing gradient
problem and insufficient temporal memory (Zhang et al., 1998; Giles & Omlin, 1994).
Long Short-Term Memory (LSTM) networks were created to tackle these issues by
Schmidhuber, 1997). Despite this, their complexity and slower convergence have led to the
12
adoption of Gated Recurrent Units (GRUs), which offer a more streamlined architecture
computational efficiency is crucial (Cho et al., 2014; Sharma & Dutta, 2024).
Consequently, the second guiding question of this research is: How do GRU-based models
perform compared to other machine learning approaches, such as ANN, RNN, and
LSTM, in stock price and index predictions? This question assesses the practical
(2021) and Khaldi et al. (2022), indicating that GRUs often surpass other deep learning
Finally, although this research aims for empirical accuracy and model durability, it
model can completely predict. Additionally, financial data encompasses not just numerical
values but is also shaped by sentiment, institutional actions, and global interconnectedness-
elements frequently left out of merely quantitative models. Consequently, the final research
question this study aims to address is: What limitations and challenges arise in using
machine learning for stock market prediction, and how can these be alleviated? Tackling
this question enables a thorough understanding of the constraints within which predictive
models function and encourages future researchers to investigate solutions like sentiment
analysis, hybrid model structures, and ensemble techniques to enhance robustness and
adaptability (Iqbal & Kumar, 2024; Sidekerskiene et al., 2024; Gupta & Srivastava, 2024).
13
model selection and the practical challenges that persist in financial prediction. In this
scalable, interpretable, and generalisable models that effectively assist investors, analysts,
and policymakers.
14
CHAPTER II:
REVIEW OF LITERATURE
time series analysis, and deep learning to create a basis for analysing and predicting stock
market behaviour. At the core of this framework is the idea that, although financial markets
are frequently seen as efficient, they reveal patterns and structures that advanced models
can leverage, particularly when these models utilise suitably transformed data.
At the heart of this discussion is the Efficient Market Hypothesis (EMH), first
presented by Fama in 1970. The hypothesis asserts that stock prices incorporate all
prediction ineffective. EMH suggests that price movements follow a random walk pattern,
where each change is independent and unpredictable. Despite its prominence in financial
theory, empirical research has uncovered anomalies and inefficiencies that question its
absolutism. For example, Lo and MacKinlay (1999) showed that stock returns may
demonstrate short-term predictability, challenging the rigid interpretation of the EMH. This
gap between theory and reality opens the door for more sophisticated, data-driven
Time series theory serves as the second primary foundation of this framework.
Financial data, particularly stock indices and prices, exhibit typical time series structures
15
as they consist of sequential observations arranged in time. According to Shumway and
Stoffer (2017), these series frequently exhibit elements like trends, seasonality, cycles, and
random noise. The concept of stationarity, which denotes the stability of statistical
properties such as mean and variance over time, is essential for effective forecasting. Non-
stationary series can confuse learning algorithms, as models trained on volatile data may
detect misleading connections that do not apply to future observations. To address this
and Davis (1996) assert that stationary series present more dependable and interpretable
machine learning, especially deep neural networks tailored for sequential data.
linearity assumptions and often fail to address the intricate temporal dependencies and non-
particularly Recurrent Neural Networks (RNNs) and their variants, have shown an
exceptional ability to capture these complexities. RNNs were initially proposed to manage
sequential dependencies through the integration of hidden states that evolve (Giles &
Omlin, 1994). Nonetheless, earlier RNNs faced challenges with vanishing gradient
which introduced gating mechanisms that regulate the flow of information, thereby
16
enabling the capture of long-range dependencies in time series (Hochreiter &
Schmidhuber, 1997). Building on this, Cho et al. (2014) proposed Gated Recurrent Units
(GRUs), a more computationally efficient variant that retains most of LSTM’s advantages,
while simplifying the internal structure. GRUs have since gained popularity in financial
forecasting due to their ability to balance memory depth with training efficiency, making
them suitable for high-frequency and resource-constrained environments (Sharma & Dutta,
2024). Empirical studies by Buslim (2021) and Khaldi et al. (2022) further highlight the
superiority of GRUs over both RNNs and LSTMs in applications such as cryptocurrency.
These theoretical strands come together in the idea that converting financial time
series into a stationary format boosts the learning capabilities of deep learning models,
variance, and revealing underlying structures (Chatterjee & Yadav, 2023; Rathore &
Mehta, 2025). Since stock market data typically embodies a blend of long-term economic
trends, short-term investor reactions, and random variations, stationarisation enables the
exchange rates, and GDP into time series forecasting enhances the understanding of market
behaviour from a multifactor perspective. Research by Patel et al. (2015) and Gupta &
Srivastava (2024) suggests that including external economic factors enhances model
17
Overall, this theoretical framework underpins the present research. It justifies the
transformation of data, the choice of advanced deep learning architectures, such as GRUs,
this framework reinforces the hypothesis that adequately preprocessed data, when
modelled with modern neural networks, can yield more precise and actionable predictions
The Theory of Reasoned Action (TRA), developed by Fishbein and Aizen in 1975,
posits that human behaviour is directed by rational processes, where individuals weigh the
anticipate intentional human behaviour, highlighting that intention directly precedes action
and is influenced by attitudes toward the behaviour and subjective norms. While TRA was
mainly created to understand social behaviour, its framework provides useful parallels
when utilised in complex financial forecasting, like predicting stock market movements.
suggesting that while market behaviours may seem random, they are not entirely without
informed by available data, current economic indicators, and overall sentiment, resulting
in market movements that, when examined as a whole, may reveal identifiable patterns.
18
Therefore, predicting stock indices or prices can be seen as an attempt to interpret these
collective intentions and actions reflected in financial time series (Aizen, 1991).
Applying the Theory of Reasoned Action (TRA) directly to stock market data
presents a major challenge: financial time series often behave like stochastic processes,
often sharing traits similar to white noise, where future price movements appear unrelated
to past trends (Malkiel, 1973; Shumway and Stoffer, 2017). Dario Radecic (2021) notes
that white noise series are fundamentally unpredictable due to their lack of systematic
structure. However, as highlighted by Brockwell and Davis (1996), it's vital to differentiate
between genuine randomness and complex, hidden structures. When financial time series
are properly treated and transformed, especially through stationarisation techniques, they
can uncover consistent relationships that predictive models may learn from and leverage.
Thus, extending TRA, this study contends that while raw stock market data might
seem random and erratic at first glance, employing systematic data transformation
techniques such as differencing, smoothing, and detrending can render the data stationary.
This process uncovers the rational, collective intent that drives financial movements. A
structures over time, offers a reliable foundation for the effective functioning of predictive
in processing sequential data like Gated Recurrent Units (GRUs), can be utilised to
discover and model hidden patterns. GRUs excel at learning temporal dependencies
without the issue of vanishing gradients, making them an effective tool for capturing the
19
structured rationale within financial time series (Cho et al., 2014; Sharma & Dutta, 2024).
The key assumption is that while individual investor behaviours may be random, the
shaped by larger economic, social, and psychological influences, consistent with the
background factors, including external social pressures and perceived norms, similar to the
growth, commodity prices, and currency exchange rates function as external influences
that shape the intentions and expectations of market participants (Patel et al., 2015; Gupta
& Srivastava, 2024). Therefore, incorporating these variables into the predictive
framework enhances the application of TRA in finance, acknowledging that market results
are affected not just by internal technical elements but also by wider societal and economic
environments.
This research highlights that by viewing stock market forecasting through the lens
of TRA, the seemingly unpredictable nature of stock market behaviour can largely be
structured and anticipated with meticulous data preparation and sophisticated sequential
modelling methods. It suggests that when market data is altered to uncover its rational
essence, precise and practical forecasting becomes achievable, reinforcing the idea that
market changes, despite their complexity, are fundamentally logical results of collective
human actions.
20
2.3 Human Society Theory
markets inherently synthesise the actions, expectations, fears, and hopes of countless
individuals and institutions operating within a dynamic landscape. The Human Society
Theory suggests that to grasp economic phenomena, such as market trends fully, one must
also consider the social structure, behaviours, and technological contexts surrounding their
stock market, considering the unique socio-economic changes that have occurred in the
A pivotal factor influencing participation in the Indian stock market has been the
Digital India initiative, introduced in 2015, which greatly enhanced internet accessibility
nationwide. Recent estimates indicate that by 2020, nearly 43% of Indians had internet
access, with 54% actively using mobile devices (TRAI, 2021). While national figures may
imply only moderate penetration, the rates of internet and mobile usage among stock
market participants are considerably higher. Retail investors, who account for around 52%
technologies for trading and information retrieval. It’s reasonable to conclude that over
95% of retail investors have both internet access and mobile devices, significantly
Currently, the Indian stock market is primarily shaped by three main types of
and retail investors. DIIs represent 29% of market investments, and FIIs account for
21
approximately 19%. Meanwhile, retail investors have been gradually increasing their
(SEBI, 2023). Historically, both domestic and foreign institutional investors have utilised
advanced technological solutions such as algorithmic trading and machine learning models
to enhance their investment strategies. These investors often lead in adopting predictive
less knowledgeable and more reactionary, retail investors in modern India have become
significantly more tech-savvy. The rise of affordable mobile trading platforms, the access
to market data via social media and financial apps, along with the ongoing growth of
financial literacy programs, have all boosted their demand for tools that facilitate data-
driven decision-making (World Bank, 2022). In this context, it is reasonable to suggest that
if a new stock market forecasting algorithm, like the one proposed in this study, utilises
stationarised data and the GRU model to deliver consistently superior predictions, a large
percentage of Indian retail investors would be inclined and equipped to adopt it.
Himan Society Theory highlights that adopting technology involves not just access
but also perceived usefulness and cultural acceptance (Rogers, 2003). In India, attitudes
towards financial risks, savings, and investments have significantly changed, especially
among the middle and upper-middle classes. Previously, these groups, which were
primarily conservative savers, preferred fixed deposits and gold. Now, however, they
increasingly see equity markets as promising avenues for wealth-building. This cultural
transformation increases the likelihood that advanced, user-friendly predictive models will
resonate with audiences eager to incorporate them into their investment strategies.
22
Moreover, the changing regulatory environment fosters the integration of
India (SEBI) have promoted digital onboarding, e-KYC procedures, and transparency
efforts, facilitating safer and legal access to advanced financial tools for investors (SEBI
Annual Report, 2023). This institutional support guarantees that technological innovations
in stock market forecasting are both attainable and advantageous within India’s financial
framework.
Rooted in Human Society Theory, this study posits that Indian investors will likely
predictions. The widespread use of mobile and internet technologies, combined with
forecasting via machine learning, notable gaps still exist that require further examination.
Much of the prior research has mainly concentrated on directly employing sophisticated
deep learning models on raw stock market data without properly addressing the
23
fundamental issue of data stationarity. Research by Iqbal and Kumar (2024) and Radecic
(2021) reveals that non-stationary data can impede models' learning capabilities, resulting
in forecasts that are often inconsistent and misleading. While preprocessing techniques
such as differencing and normalisation are recognised as vital for enhancing time series
modelling (Hyndman & Athanasopoulus, 2018), thorough studies that explicitly combine
these techniques with deep sequential models like GRUs are relatively limited.
(LSTM) models have been extensively used in time series prediction tasks, such as
financial forecasting (Giles & Omlin, 1994; Hochreiter & Schmidhuber, 1997), there has
been limited research systematically comparing their performance with Gated Recurrent
Units (GRUs) specifically for stationary datasets. The GRU architecture, which offers
benefits like fewer parameters, quicker training, and better management of long-term
dependencies (Cho et al., 2014; Sharma & Dutta, 2024), is still underused in comparative
studies of stock market predictions, particularly within the Indian financial markets.
Another research gap involves the limited focus of many previous studies on
forecasting individual variables, often solely targeting stock prices or indices while
overlooking larger economic factors. Patel et al. (2015) and Gupta and Srivastav (2024)
emphasise that including external macroeconomic variables like commodity prices, GDP
growth rates, and currency exchange rates can greatly improve the forecasting capabilities
integrate these economic indicators with sophisticated deep learning architectures trained
24
Moreover, it is essential to broaden the investigation of preprocessing techniques
moving averages have been analysed to a degree (Brockwell & Davis, 1996; Shumway &
approaches in relation to deep learning applications are still scarce. Identifying the
preprocessing method that produces the best results for different model types could
significantly improve the predictive accuracy and consistency within the field.
fail to offer a reproducible, transparent data pipeline. Research by Foote (2021) and Buslim
(2021) highlights the necessity for open-source, replicable research frameworks to make
advanced predictive analytics more accessible. However, few studies present completely
markets like India, where mobile-first digital adoption is growing swiftly, to adopt them
easily.
Existing research frequently neglects the changing behavioural and societal factors
that affect stock market participation. Sidekerskiene et al. (2024) emphasise that changes
25
valuable opportunity to analyse and develop models that not only process historical prices
but also adapt in response to market behaviour, enhancing their resilience and adaptability.
This study seeks to address several important gaps: first, by rigorously analysing
how making financial time series stationary impacts the training of deep learning models;
second, by comparing GRUs with ANN, RNN, and LSTM architectures; third, by
research with practical usability for a wider array of market participants. By tackling these
unmet needs, this research aims to significantly enhance the field of stock market
2.5 Summary
financial time series and machine learning techniques. It highlights the increasing
agreement that traditional econometric models, while historically significant, often fail
when confronted with today's extremely volatile financial datasets. The Efficient Market
Hypothesis (EMH), introduced by Fama (1970), offers the fundamental perspective that
unfeasible. Yet, studies such as those conducted by Lo and MacKinlay (1999) demonstrate
anomalies and trends in financial markets that machine learning models can increasingly
take advantage of. This has prompted a shift in both academic and practical focus towards
employing deep learning models to reveal hidden structures within stock market data.
26
A key insight that comes to light is the vital role of data stationarity. Stock market
change over time, which makes the learning process for predictive models more
challenging (Shumway & Stoffer, 2017). By stationarising the data through methods such
those by Rathore and Mehta (2025) and Chatterjee and Yadav (2023), show that employing
learning models.
Traditional Artificial Neural Networks (ANNs) serve as a valuable foundation, yet they
often struggle with the sequential dependencies typical of time series data. Recurrent
Neural Networks (RNNs) were developed to capture temporal sequences but faced
challenges such as the vanishing gradient problem (Giles & Omlin, 1994). To address these
enhanced memory retention for extended sequences (Hochreiter & Schmidhuber, 1997).
More recently, Gated Recurrent Units (GRUs) have emerged as a computationally efficient
alternative to LSTMs, particularly when both speed and long-term pattern retention are
Besides the technical aspects, the review highlights the need to incorporate external
macroeconomic factors into predictive models. Scholars like Patel et al. (2015) and Gupta
and Srivastava (2024) support a multifactor strategy that includes commodity prices, GDP
27
growth, and currency exchange rates in addition to stock indices to create a more
comprehensive and precise forecasting system. The lack of such integration in numerous
current models represents a crucial gap that this study seeks to fill.
The literature also points out essential dependencies from earlier studies, especially
or resource-intensive models that restrict their use for retail investors, particularly in
emerging markets such as India. The research by Foote (2021) and Buslim (2021)
emphasises the necessity for open, reproducible research pipelines to connect academic
reveal a swift increase in technological adoption within financial practices among retail
investors, fueled by growing internet access and financial literacy efforts (SEBI, 2023;
World Bank, 2022). This creates a conducive environment for embracing predictive models
From these insights, it is clear that a significant opportunity lies at the intersection
predictive models can attain greater reliability and wider adoption. Consequently, this
literature review lays a robust theoretical and empirical groundwork for the current
research, justifying the chosen approach and clarifying its intended contribution to both the
28
CHAPTER III:
METHODOLOGY
Accurately forecasting stock market movements is one of the toughest and most
depend on statistical methods and econometric approaches, which frequently overlook the
highly volatile and non-linear characteristics of stock market data. Due to the complexity
Current stock market prediction models mainly rely on unprocessed stock market
data, which often displays white noise traits that complicate direct modelling. If not
This research tackles the fundamental issue of enhancing the accuracy of stock
1. Transforming raw stock market time series data into a stationary format to
29
3. Evaluating the effectiveness of GRU-based models compared to other machine
4. Test the prediction model on real-world data from the Indian stock market,
This study seeks to close the existing research gaps by creating a more precise and
dependable predictive model for forecasting the stock market. This will benefit investors,
concentrated on data collection and preprocessing before utilising machine learning models
to forecast the National Stock Exchange (NSE) Index. The dataset was assembled by
collecting historical data on various financial and economic indicators that affect stock
market dynamics. Specifically, NSE Index data, starting from January 1, 2000, as our main
target variable, was collected. Furthermore, historical prices of gold, silver, and oil from
the same date were incorporated as these commodities significantly influence investor
sentiment and financial markets. Also included was the annual Indian Gross Domestic
Product (GDP) and daily USD-INR Exchange Rate data from January 1, 2000, as a
macroeconomic aspect to evaluate how overall economic growth impacts stock market
trends. By merging these diverse datasets, a thorough input feature set was developed for
30
The modelling process utilised a sliding window technique. In every iteration, the
model was trained on five years of continuous data to forecast the NSE Index for the
upcoming month. Once the prediction was made, the window was moved forward by six
months, and the process was repeated. This method enables our model to adjust to changing
Once the dataset was ready, all variables were normalised before inputting them
into the neural network. Normalisation was crucial to ensure that different features with
varying scales contributed uniformly to the learning process. After this preprocessing step,
various types of neural networks were constructed and tested to assess their effectiveness
in predicting the NSE Index. The research started with an Artificial Neural Network (ANN)
to establish baseline performance. Next, a Recurrent Neural Network (RNN) and a Long
Short-Term Memory (LSTM) network (which processes sequential data, enabling the
model to learn from historical patterns) were implemented. Lastly, the research
experimented with a Gated Recurrent Unit (GRU), a type of RNN optimised for managing
long-term dependencies and addressing challenges like vanishing gradients. The predictive
performance of these models was evaluated using the Mean Squared Error (MSE) and R2
In the next phase of the study, the research explored whether transforming the
dataset to be stationary form enhanced predictive accuracy. Financial time series frequently
demonstrate non-stationarity, implying that their statistical characteristics, like mean and
variance, vary over time. To tackle this issue, stationarisation techniques on all variables,
including the NSE Index, gold prices, silver prices, oil prices, USD-INR exchange rate,
and GDP, were implemented. By converting these series to stationary, the aim was to
31
eliminate trends and seasonality, enabling the neural network to concentrate on core
among all features. The same modelling approach was applied to this transformed dataset.
Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) and then a
Gated Recurrent Unit (GRU). These models were evaluated using Mean Squared Error
(MSE) and R2, and their performance was compared against those developed on the non-
stationary dataset.
The predictive accuracy of models that are trained on both stationary and non-
stationary data was evaluated to assess whether standardisation improves stock market
predictions. The theory was that if the models trained on stationary data yielded lower MSE
values and higher R2, then preprocessing time series data to achieve stationarity is essential
achieve results that implied that the models trained on non-stationary data perform equally
well or even better. In that case, deep learning techniques, especially GRUs, are adept at
uncovering meaningful patterns without the need for stationarisation. Ultimately, this study
aimed to enhance stock market prediction methods by integrating financial theory with
advanced machine learning techniques, thus providing a solid foundation for future
32
3.3 Research Purpose and Questions
learning methods in forecasting stock market trends by converting raw time series data into
predictions. With machine learning emerging as a promising alternative, this study seeks
to evaluate whether converting stock market data to a stationary format prior to applying
primary predictive framework, comparing them to other machine learning models like
Artificial Neural Networks (ANNs), Recurrent Neural Networks (RNNs), and Long Short-
Term Memory (LSTMs). GRUs are advantageous for sequential data, particularly time
series, as they can efficiently capture long-term dependencies and reduce the vanishing
gradient problem. The study utilised GRU-based models on stock indices from the National
Stock Exchange (NSE) to test the hypothesis. It also assessed whether preprocessing
methods, such as differencing, which converted financial time series data to a stationary
regarding the nature of predicting financial time series. A key question driving this study
was whether transforming stock market data into a stationary format enhances predictive
33
accuracy compared to models that use raw data. By systematically analysing both
stationary and non-stationary datasets, this research aimed to assess how preprocessing
machine learning methods. Although traditional RNNs, LSTMs, and ANNs are commonly
employed in time series forecasting, they frequently struggle with maintaining long-term
dependencies. This research examined whether GRUs, equipped with their unique gating
mechanisms with simple architecture, delivered a better solution for predicting stock
market trends and whether they have specific benefits in identifying fundamental market
patterns.
methods for preparing stock market data for predictive modelling. Given the inherent noise
and randomness in stock prices, this study assessed which technique, such as differencing,
Finally, the study aimed to identify and tackle the limitations and challenges
associated with using machine learning techniques for predicting the stock market.
Although machine learning models show potential for financial forecasting, their
effectiveness is frequently limited by issues like data quality, market anomalies, and
34
This study aimed to provide valuable insights into financial forecasting through
these inquiries. It illustrates how a systematic approach that integrates data preprocessing,
deep learning techniques, and comparative analysis can enhance the accuracy and
of machine learning models for predicting stock market trends. The main objective is to
determine if converting stock market time series data into a stationary format improves
predictive accuracy. Due to the volatile and complex nature of stock market data, a
thoughtfully designed research framework was essential for ensuring the reliability and
The study utilised an experimental design where various machine learning models
were trained and assessed using stock market data, both prior to and following
stationarisation. By using historical stock price data from the Indian stock market, the
The data collection process was designed for thoroughness and relevance. It
leveraged historical stock index values from the National Stock Exchange (NSE), along
with external economic indicators such as commodity prices (gold, silver, and oil) and
important macroeconomic indicators like GDP growth and the USD-INR Exchange Rate.
35
By including these additional variables, the assessment evaluated whether incorporating
A longitudinal design analysed stock market trends using data collected over
several years. This method provided insights into market cycles and periodic fluctuations,
ongoing updates and assessments of the models, ensuring their adaptability to changing
market dynamics.
This study utilised a comparative design to evaluate how data transformation affects
predictive accuracy. Predictions made from raw stock market data are juxtaposed with
the central hypothesis that stationarisation enhances predictive performance. The study
implemented and assessed four different machine learning frameworks: Artificial Neural
(LSTM), and Gated Recurrent Units (GRU) under both circumstances. Through
performance analysis and comparisons, the aim was to identify which method yielded the
The model was evaluated thoroughly, utilising essential performance metrics like
Mean Squared Error (MSE) and R2 scores to measure prediction accuracy. These metrics
offered a standardised method for determining if the transformation of stock market data
36
validation techniques to confirm that the models generalise effectively to new data instead
The research considered its limitations and potential biases by recognising factors
influences that may impact stock price movements beyond historical patterns. Although
the study mainly emphasised a data-driven technical analysis, it also acknowledges the
This study seeks to provide valuable insights into stock market prediction methods
by employing a structured but flexible research design that enhances the incorporation of
This study focused on stock market data, showcasing the dynamic fluctuations and
trends within financial markets. Given the vast number of global stock exchanges, this
study specifically concentrates on the National Stock Exchange of India (NSE). The NSE
was chosen due to its accessibility, extensive data, and pivotal role in India’s financial
ecosystem. Covering the period from January 1, 2000, to December 31, 2024, this research
37
This research employed a sliding window approach to enhance predictive
modelling. During each iteration, the model was trained on five years of stock market data
to forecast the upcoming month. This method enabled ongoing adjustments to market
This structured dataset was utilised to create a predictive framework suitable for a
wide range of market participants, including institutional investors and retail traders. The
factors, guaranteeing that the research yielded thorough insights into predicting stock
3.6 Instrumentation
This study utilised a mix of quantitative data sources and machine learning
techniques to create a predictive model for stock market trends. The main dataset features
historical stock prices and key market indices from the National Stock Exchange (NSE) of
India. Furthermore, external macroeconomic factors such as gold, silver, oil prices, gross
domestic product (GDP), and USD-INR exchange rates were included to evaluate their
impact on market trends. Data was gathered from Yahoo Finance, which contains a trusted
financial database, stock exchange records, and accessible economic reports to ensure
Specialised software tools and programming languages were utilised for data
preprocessing and analysis. Python served as the central language for data analysis,
38
featuring robust libraries such as Pandas for data manipulation, NumPy for numerical
calculations, and Matplotlib, along with Seaborn for visualisation. Machine learning
frameworks like TensorFlow and PyTorch supplied the essential infrastructure for model
development, allowing for the creation of deep learning architectures such as Artificial
Neural Networks (ANN), Recurrent Neural Networks (RNN), Long Short-Term Memory
(LSTM), and Gated Recurrent Units (GRU). The steps in data preprocessing included
addressing missing values, normalising numerical features, and converting stock market
engineering methods to identify significant patterns within the raw stock market data. Time
series decomposition separated trends, seasonal variations, and residuals, aiding in noise
reduction and enhancing predictive accuracy. The Augmented Dickey-Fuller (ADF) and
differencing methods were used to achieve stationarity in the time series data.
the predictive accuracy of various machine learning models. It employed a sliding window
approach, training the model on a continuous five-year span of historical data while testing
it the following month. This method allowed the model to adapt to evolving market
conditions and offers a dependable measure of its forecasting capability. Each model's
performance is assessed using essential metrics like Mean Squared Error (MSE) and R 2
scores, which measure the accuracy and dependability of the predictions. Furthermore,
cross-validation techniques were utilised to avoid overfitting, ensuring that the models
39
The study used cloud computing for effective data processing and model training
at scale. Interactive platforms like Google Colab and Jupyter Notebooks facilitated coding
and experimentation, while cloud-based GPU acceleration notably boosted neural network
the efficient training and optimisation of complex machine-learning models for precise
This study's foundation lies in integrating financial data, machine learning tools,
statistical tests, and evaluation techniques. By utilising these resources, the research seeks
to create a dependable and scalable predictive model that improves stock market
modelling, and evaluation guarantees the findings' robustness, rendering them useful for
This research's data collection method involved obtaining historical stock market
information from Yahoo Finance using API calls via Python programs. Yahoo Finance is
a reliable and popular platform offering a wealth of financial data, which encompasses
stock prices, trading volumes, historical trends, and essential economic indicators. This
strategy guaranteed that the dataset utilised in this study is thorough, dependable, and
40
The data collection started by connecting to Yahoo Finance’s API, which allows
programmatic access to stock market data. By leveraging the yfinance library in Python,
historical stock price data for the National Stock Exchange (NSE) of India, spanning from
January 1, 2000, to December 31, 2024, was obtained. This dataset contained daily stock
prices, trading volumes, open-high-low-close (OHLC) values, and adjusted closing prices,
capturing all key attributes needed for effective modelling. Data regarding India’s GDP
APIs gathered macroeconomic indicators like gold, silver, crude oil, and USD-INR
Exchange Rate alongside stock price data. These indicators offer a wider economic context,
allowing the study to assess external factors affecting stock market fluctuations. By
incorporating these varying financial indicators into the dataset, the research adopts a more
After retrieving the raw data, it enters a preprocessing stage to improve its
suitability for machine learning. Missing values were detected and addressed using
series data. Stock prices are adjusted for corporate actions such as stock splits and
dividends, ensuring consistency in historical trends. Furthermore, the data was organised
into a structured table, with each row representing a specific trading day and each column
denoting a financial feature, making it ready for direct input into machine learning models.
To enhance efficiency and reproducibility, the entire data collection pipeline was
automated with Python scripts. These scripts operate at set intervals to retrieve new data
and dynamically update the dataset, facilitating real-time or near-real-time predictions. The
41
data was saved in CSV (Comma Separated Values) format, ensuring smooth integration
with the preprocessing and modelling parts of the research framework. This automation
This research established a solid foundation for building an accurate and scalable
ensured that the dataset remains comprehensive, current, and free from inconsistencies,
allowing for the effective application of advanced machine learning techniques to predict
comprehending, and interpreting the collected data. Various statistical methods and
machine learning techniques were used to extract meaningful conclusions. Due to the
complexity and scale of the stock market datasets, this phase was initiated with thorough
exploratory data analysis (EDA). EDA is vital for gaining a preliminary understanding of
data distributions and identifying trends, seasonal behaviours, and potential anomalies or
outliers. Through visualisations like line plots, histograms, scatter plots, and correlation
heatmaps, initial insights into the relationships between stock market indices and
42
After conducting exploratory analysis, rigorous statistical tests were used to
evaluate the features of the financial data collected. The key hypothesis was that converting
stock market data into a stationary series improves prediction accuracy; thus, tests for
whether data series such as NSE indices, gold, silver, oil, USD-INE exchange rate, and
GDP values are stationary or if they require further transformation, like differencing.
Achieving stationarity was vital for ensuring the reliability of time series forecasting and
After establishing stationarity through differencing, the next stage was to apply
machine learning algorithms to the preprocessed datasets. The main modelling method
relied on neural network architectures designed for sequential data. A baseline predictive
model using Artificial Neural Networks (ANNs) was created, which served as a
Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent
Units (GRUs) (that are optimised for time series forecasting due to their effectiveness in
training with five years’ worth of historical stock market data. They were evaluated for
their predictive performance on subsequent month data points using a sliding window
technique.
metrics such as Mean Squared Error (MSE) and R-squared (R2). MSE assessed the
43
measure of model accuracy. Meanwhile, R2 values clarified how effectively the models
explain variance in the data, showing the percentage of total variation accounted for by the
predictive model. These metrics provided clear and interpretable indicators of predictive
overfitting and ensured that the chosen model performed effectively on various data
subsets. Utilising these thorough validation methods increased confidence in the results
and affirmed the stability and reliability of the predictive models developed.
(RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units
(GRUs), were performed. These comparisons helped identify the most efficient algorithm
based on prediction accuracy and computational efficiency. This thorough and detailed
analysis enabled us to fully address our research questions, highlighting the strengths,
limitations, and wider implications of our machine-learning approach for stock market
prediction.
In this methodology, the data analysis phase was designed to yield trustworthy
insights, forming the basis for sound conclusions regarding the efficacy of stationarity
predictions. This analysis guarantees that all results are statistically sound and
44
pragmatically relevant, thus providing actionable information for investors and
consideration. The predictive models created in this study rely on historical data and market
conditions, even with advanced preprocessing and modelling techniques. Therefore, the
macroeconomic events, abrupt policy shifts, or geopolitical factors that are unexpected
external factors not present in historical datasets. These factors can disrupt established
market trends and introduce volatility that existing data cannot account for, thus restricting
LSTMs, RNNs, and ANNs—largely relies on the quality and detail of the available data.
These models utilise only numerical inputs derived from market indices, stock prices, and
This omission of qualitative or sentiment-driven data could lead to models that do not fully
account for investor behaviour or emotional reactions, both of which significantly affect
45
Additionally, although the study uses advanced preprocessing methods such as
differencing to achieve stationarity, the choice and adjustment of these techniques involve
intervals can lead to differing model results. As a result, the predictive accuracy of the
study may fluctuate considerably depending on these methodological choices, which could
(like the ADF and KPSS tests) to confirm stationarity assumes these tests can effectively
tests have their limitations and can sometimes yield unclear or inconclusive outcomes,
Moreover, the study relies heavily on historical stock market and macroeconomic
data sourced from platforms such as Yahoo Finance and the World Bank. While these
indicators that are updated or revised periodically. Any errors or omissions in these
secondary sources could affect the quality of the data, thereby impacting the strength and
significant for stakeholders in the financial sector, where clear decision-making processes
46
are essential. The limited interpretability of these models may hinder their effective use in
various settings.
the need for careful interpretation of the findings. Furthermore, recognising these
constraints paves the way for future studies, which could incorporate sentiment analysis,
interpretability. This study aims to enhance predictive accuracy and utility in real-world
3.10 Conclusion
forecasting stock market trends by converting raw financial time series into a stationary
format. By methodically tackling the issues linked to non-stationary and fluctuating stock
market data, this framework improves the reliability, clarity, and validity of the predictive
outcomes. Additionally, the comparative analysis method employs GRUs along with other
neural network models, such as Artificial Neural Networks (ANNs), standard Recurrent
Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks, offering
valuable insights into which methods produce higher accuracy and consistency.
47
preparing this data strengthens the reliability of the predictive modelling, greatly boosting
the results' credibility and usefulness. Additionally, the inclusion of various economic
indicators such as gold, silver, crude oil, USD-INR exchange rate, and GDP enriches the
analysis by reflecting broader economic factors that may impact stock market fluctuations.
especially those resulting from dependence on historical data and restricted qualitative
input. Such acknowledgements help set realistic expectations and ensure that the
interpretations and conclusions drawn from this research are cautiously grounded. By
highlights potential areas for improvement, promoting further investigation into the
machine learning techniques and the advantages derived from preprocessing stock market
procedures, this framework establishes a credible and practical basis for evaluating the
endeavours in the financial research field aimed at leveraging machine learning for
48
CHAPTER IV:
RESULTS
Any data-centric predictive model relies on the quality and thoroughness of the
dataset. In this study, a strong dataset was created that includes diverse financial and
economic indicators affecting the Indian stock market. The main objective is to collect
historical data, preprocess it, and organise it into a consistent format for subsequent
analysis.
To maintain the dataset's integrity, financial time series data were obtained from
trusted and well-known sources like Yahoo Finance and the World Bank. The dataset
features important market indicators, such as the NSE Index, Gold, Silver, Crude Oil
Prices, the INR-USD Exchange Rate, and Indian GDP. These elements are vital in
influencing stock market dynamics, making their inclusion crucial for building an accurate
predictive model.
The data extraction process utilises a systematic approach. Initially, daily NSE
Index values from Yahoo Finance, covering the period from January 1, 2000, to December
31, 2024, were gathered. These values indicate the overall performance of the Indian stock
market and act as the key target variable for the predictive analysis. Subsequently,
commodity price data, including Gold, Silver, and Crude Oil prices, known for their
historical correlations with stock market movements, were gathered. Furthermore, the
49
INR-USD exchange rate, which serves as a vital macroeconomic indicator affecting
foreign investments and capital flows into and out of India, was gathered.
Additionally, the annual GDP data for India from the World Bank was gathered to
integrate wider economic trends into our dataset. Since GDP figures are reported annually,
they were aligned with daily data points by duplicating the corresponding annual GDP
values for each trading day throughout the year. This approach guarantees consistency in
After gathering all the data, it was organised into a structured format. Each row
corresponds to a particular date, while every column denotes one of the chosen financial
indicators. Since stock market forecasts depend on sequential data patterns, a consistent
and comprehensive dataset is essential. If any values were missing, suitable imputation
exchange rates, and macroeconomic indicators establishes the foundation for using
50
The dataset was obtained and compiled using Python programs calling suitable
APIs, and the complete code is provided in Appendix A. The code is provided so that the
experiment can be repeated to verify or extend this research. A snippet of the final dataset
The dataset has 4,081 rows corresponding to data from September 17, 2007, to
December 30, 2024. The complete dataset, where data is available for all the attributes for
all the dates, could be downloaded using the APIs only for this date range. However, the
data fetching criteria were for obtaining data between January 1, 2000, and December 31,
2024. For any related analysis, one can download the data from
https://drive.google.com/file/d/1tIJctQuQL-LGRdHFXnihgvlVEaGGncJl/view.
51
Table 4.2 shows the complete extracted time series data for all the attributes.
Table 4. 2: Graphs for the complete extracted time series data for all the attributes.
Time series data, like stock market indices and economic indicators, frequently
display trends, seasonal patterns, and irregular variations. These traits complicate
stationarity. Stationarity means that the statistical properties of the data, such as mean and
variance, stay consistent over time. Having stationary data allows predictive models to
52
Thorough statistical tests were conducted to evaluate the stationarity of each time
series in our dataset. The Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-
Phillips-Schmidt-Shin (KPSS) test were used to assess whether each time series is
differencing were applied incrementally until stationarity was achieved. By converting the
After establishing stationarity, the transformed series are compiled into a new
dataset that maintains the original data structure but removes non-stationary effects. This
dataset forms the basis for creating machine learning models that effectively predict stock
market trends.
The tests for stationarity and the conversion of all the time series data were
conducted using Python programs. These Python programs are provided in Appendix B for
replication and extension of this experiment. A snippet of the final dataset obtained is
Obs Date NSE Index Gold Silver Crude Oil INR_USD Indian GDP
2007-09-18 51.550293 0.000000 0.028000 0.940002 -0.07 0.0
2007-09-19 186.149902 6.200012 0.189000 0.419998 -0.64 0.0
2007-09-20 15.199707 10.400024 0.365000 1.389999 0.06 0.0
2007-09-21 90.000000 -1.000000 0.153000 -1.699997 -0.03 0.0
2007-09-24 94.650391 0.699951 0.023000 -0.670006 -0.34 0.0
2007-09-25 6.649902 -0.500000 -0.014999 -1.419998 0.05 0.0
2007-09-26 1.649902 -3.299988 -0.066000 0.770004 -0.05 0.0
2007-09-27 60.049805 4.400024 0.101000 2.579994 0.15 0.0
2007-09-28 20.800293 10.099976 0.276999 -1.219994 0.10 0.0
2007-10-01 47.600098 4.400024 -0.065000 -1.420006 -0.05 0.0
Table 4. 3: The first 10 rows of the stationary data that were obtained using tests and data transformation through a
Python program and compiled using a Python program.
53
The dataset has 4,078 rows corresponding to data from September 18, 2007, to
December 30, 2024. For any related analysis, one can download it from
https://drive.google.com/file/d/1EU1ZUFUQcgqr5vYE81zQ_n7QKEV_SIMO/view.
predicting intricate, nonlinear patterns in time series data. This study employs an ANN
model on unprocessed stock market data to evaluate its predictive power before
transformations like stationarisation are applied. The goal is to determine how accurately
an ANN can predict stock market trends using historical financial data without first
A sliding window technique is used to apply this model. The training process starts
with five years of historical data for the ANN. After training, the model predicts stock
market performance for the upcoming month. The anticipated values are then measured
against actual market data, and important evaluation metrics like Mean Squared Error
(MSE) and R-squared (R2) are calculated. These performance metrics offer insights into
Once the model’s performance is recorded for a specific prediction window, that
window is shifted forward by six months. This sliding mechanism allows the model to
54
context for effective learning. This iterative process continues until all available data is
This method enables a thorough assessment of the ANN’s ability to predict stock
market trends with raw, unprocessed data. The resulting data will act as a reference point
for comparing it against models developed with stationary data, allowing for evaluating
Figure 4. 1: Architecture of the ANN Model. This model was trained on both raw and stationarised data.
55
The results, after training this model on raw data, are stated in Table 4.4.
The Artificial Neural Network (ANN) model, trained on the raw data set, was
designed to predict the NSE index using a sliding-window approach from 2007 to 2024. It
employed a straightforward feed-forward neural network architecture, chosen deliberately
After assessing the ANN’s performance, two key metrics—the Mean Squared Error
(MSE) and the coefficient of determination (R²)—offered valuable insights. The MSE
values remained relatively low throughout the prediction windows, primarily due to Min-
Max scaling, which confined the numeric range of the target variable and led to inherently
56
low absolute error values. However, in spite of the low MSE, the R² values remained
predominantly negative across almost all testing periods. This negative R² explicitly shows
that the basic ANN model failed to establish meaningful predictive relationships within the
provided data and was outperformed by even the simplest predictive method, such as
Figure 4. 2: Recording of the R2 value for predictions made on the test data for each training data window using the
ANN model on the raw data.
Next, the predictions generated by the ANN, which was trained using raw stock
market data, were examined. These predictions are for one month following the last
training period. This strategy is to always forecast the near future based on the most recent
data available. Therefore, it is not taking into account the training window that yielded the
highest R2 score.
57
Figure 4. 3: Predictions made by the ANN Model trained on the raw stock market data.
Figure 4.7 illustrates that the ANN identifies general directional trends in the data.
However, there is a clear offset between its predictions and the actual values. This
discrepancy arises because, although the ANN can learn overarching patterns, it does not
possess the ability to retain temporal dependencies or adjust for short-term variations in
sequential data. As a feedforward model, it analyses patterns based on static inputs without
demonstrates its ability to learn from past patterns and, to some degree, project those into
the future. This establishes a foundation for assessing more advanced, sequence-aware
models.
58
The impact of transforming stock market data into a stationary format was
applying the same ANN architecture used for raw data, its performance on processed
stationary data, which has been adjusted via differencing techniques to remove non-
stationary components, was examined. This method is grounded in the premise that
eliminating trends and volatility may enhance the ANN’s ability to discern important
relationships in the data. The performance of this ANN on stationary data was evaluated
against the baseline results obtained from the raw data using the same evaluation metrics:
Mean Squared Error (MSE) and R-squared (R2). This analysis aims to determine whether
59
The results obtained after training the ANN model on stationary data are stated in
Table 4.5.
Analysing the training of an Artificial Neural Network (ANN) using raw stock
market data compared to stationary data provides important insights into the role of
preprocessing in financial time series forecasting. The results indicate that converting data
Examining the model trained on raw data shows low mean squared error (MSE)
60
values indicate that the model has difficulty in recognising important relationships within
the data. A negative R2 means that the ANN's predictions are less reliable than those
derived from a simple mean model. This suggests that the raw data contains significant
trends and non-stationary characteristics that the model struggles to interpret. Since stock
market data is inherently non-stationary due to trends, seasonality, and external shocks,
After applying differencing to ensure stationarity, the results reveal a clear shift.
The mean squared error (MSE) values remain in a similar range, but the R2 values show
significant improvement. Most R2 values are still negative, but they are now much closer
to zero and, in some cases, are even positive. This trend suggests that the artificial neural
network (ANN) is more adept at capturing the underlying data patterns once the trend
components are eliminated. By transforming the data into a stationary format, short-term
dependencies and relationships are emphasised, which could help the model identify
important predictive signals. While the improvements are gradual, they indicate that
preprocessing through stationarity transformation enhances the ANN’s ability to learn from
the data.
61
Figure 4. 4: Recording of the R2 value for predictions made on the test data for each training data window using the
ANN model on the stationary data.
The predictions made by the ANN model trained on stationary stock market data
Figure 4. 5: Predictions made by the ANN Model trained on the stationary stock market data.
62
Although some progress has been made, the findings highlight a significant concern
with using a basic ANN architecture for stock market predictions. The presence of negative
R2 values, even after applying a stationarity transformation, suggests that the model still
faces challenges in generalising to new data. This difficulty may arise from the complex
dynamics of financial markets, where price changes are influenced by a variety of external
none of which are directly captured in the dataset. Furthermore, ANNs require large
datasets and careful hyperparameter tuning to effectively model non-linear patterns, a task
The results show that making the data stationary before training an ANN improves
predictive performance. However, they also underscore the need for more sophisticated
modelling techniques.
stock market data, the application of a Recurrent Neural Network (RNN) for predicting
stock market trends was explored. Unlike traditional feedforward networks, RNNs are
specifically designed for processing sequential data, making them well-suited for
forecasting financial time series. Given the temporal dependencies inherent in stock market
behaviour, RNNs can leverage past data more effectively to identify trends and fluctuations
over time.
63
This study uses the same raw dataset that was originally utilised for the ANN model
to train the RNN. The objective stays the same: to assess the model’s ability to predict
future NSE Index values by utilising historical market data, including commodity prices,
implemented, where the model is trained on five years of historical data and subsequently
evaluated over the next month. The window then advances by six months, facilitating
inputs. This feature enables RNNs to recognise evolving patterns and sequences. However,
RNNs face challenges as well, especially concerning vanishing gradient problems, which
This section evaluates how the RNN model performs compared to the ANN, aiming
accuracy. For a fair comparison, identical metrics, mean squared error (MSE) and R2, were
64
Figure 4. 6: Architecture of the RNN Model. This model was trained on both raw and stationarised data.
65
The results obtained after training the RNN model on raw data are stated in Table
4.6.
Table 4.6 presents the performance of the Recurrent Neural Network (RNN) model
trained on unprocessed stock market data, employing a five-year sliding window for
training and a one-month prediction timeframe. The findings indicate that the RNN often
struggled to identify significant patterns within the raw data. All R2 scores listed are
negative, demonstrating that the model did not surpass a simple mean-based prediction
approach. The MSE values exhibit considerable fluctuations, which further imply a lack of
66
dependencies, their effectiveness can diminish sharply when faced with noise and high
Figure 4. 7: Recording of the R2 value for predictions made on the test data for each training data window using the
RNN model on the raw data.
A potential reason is the data needs of RNNs. These models generally require a
effectively. For financial time series, particularly with raw data, achieving such consistency
is challenging. The short prediction windows and limited historical context hinder the
the quality of their training data; their performance can quickly decline when faced with
67
Figure 4. 8: Predictions made by the RNN Model trained on the raw stock market data. The prediction window is different
from that used for ANN because RNN and its variations need a LOOPBACK window. Here, the LOOPBACK window is
set to 15 days.
Interestingly, the ANN model, despite lacking a recurrent structure, was more
effective at identifying specific patterns in the raw dataset. As noted in section 4.3, the
predictions made by the ANN generally mirrored the overall trend of the index, although
they consistently showed a slight discrepancy. This relative effectiveness arises from the
input sequences. While it does not account for temporal dependencies, it effectively
captures prevailing recent trends within the input window. In contrast, the RNN attempts
to leverage temporal continuity but faces challenges when dealing with noisy data and
simpler architectures like ANNs can remain competitive, particularly when training data is
68
Below is the examination of the RNN model's training cycles. Figure 4.9 illustrates
the RNN model's training and validation performance across various sliding windows of
stock market data. The model was trained on each window for a set number of epochs with
early stopping, capturing the R2 scores for both training and validation at each epoch. These
scores were subsequently plotted to monitor the model’s learning behaviour and
Each subplot in the chart represents a distinct training window, with its start date
clearly indicated in the subplot title. The x-axis of each plot corresponds to the training
epoch, while the y-axis shows the R2 score. Within each plot, two different lines are
depicted: one for the R2 score on the training data and the other for the validation data. The
training curve illustrates how effectively the model fits the data it has encountered, while
the validation curve offers insights into the model's ability to generalise to unseen data.
The model utilised a sliding window technique, training it on a five-year data block
while validating with the following one-month period. This method involves advancing the
window by six months for each iteration, creating several overlapping training and
validation windows. Consistency in evaluation was ensured by applying the same model
The complete chart was created by plotting the R2 curves from each window in a
throughout various market periods. Uniform formatting, axis scales, and line styles in the
subplots facilitate easy visual inspection of the differences in training and validation
69
Figure 4. 9: Training and Validation R2 observed while training the RNN model on raw stock market data.
A notable trend in most subplots of Figure 4.13 is the sharp and steady rise in
training R2 during the first few epochs, followed by a plateau phase. This suggests that the
70
RNN quickly adapts to the training data, often achieving near-perfect R2 scores. However,
this strong performance on the training dataset does not carry over to the validation dataset,
training process. Rather than stabilising or improving in line with the training R2, the
validation curve shows erratic fluctuations, sometimes diverging even more with each
epoch. This behaviour indicates that the model is likely overfitting the training data and
significant drops after just a few epochs, remaining consistently unstable and emphasising
the lack of generalisation. The pronounced spikes and dips in the validation curves across
almost all training windows highlight the RNN’s sensitivity to noise and its difficulty in
The behaviour aligns with expectations for a model applied to highly volatile and
unprocessed time series data. The raw financial data likely presents a combination of long-
term trends, short-term fluctuations, and non-stationary elements that the RNN struggles
they typically need more stable and structured input sequences to achieve reliable
performance. The variability in the validation curves across different windows indicates
that the RNN could not identify consistent temporal relationships within the raw data,
across epochs reveals a significant challenge in employing RNNs for predicting raw
71
financial data. While the model fits the training data exceedingly well, it struggles to
model efficacy, particularly in noisy, real-world datasets. Consequently, this suggests that
situations.
After assessing the RNN model's performance on unprocessed stock market data,
the analysis is broadened by using the same architecture on stationary data. This change,
accomplished through differencing techniques, seeks to stabilise the time series' statistical
properties and possibly enhance the model’s capacity to identify significant temporal
reliable and precise predictions from the RNN model is examined by retaining the same
72
The results obtained after training the RNN model on stationary data are stated in
Table 4.7.
73
Figure 4. 10: Recording of the R2 value for predictions made on the test data for each training data window using the
RNN model on the stationary data.
This chart illustrates the R2 values derived from the RNN model’s predictions over
time, specifically trained on stationary stock market data. Each point on the graph
represents a training window, with the horizontal axis indicating the start date of the
training period and the vertical axis displaying the resulting R2 score from the
corresponding test window. The blue line depicts the fluctuations of R2 values across
periods, while the green and red markers indicate the best and worst performances,
respectively.
lack of it – across various time frames. The majority of R2 values sit below zero, suggesting
that the model often performed worse than a simple mean prediction. However, a few time
windows exhibit slight enhancements, with one achieving a positive R2 of 0.06, marking
the most notable case. In contrast, the most significant underperformance is evidenced by
a steep drop in R2 to -10.67, illustrating a scenario where the model’s predictions were far
from the actual results.
74
These variations indicate that, despite utilising differencing methods to make the
data stationary, the RNN’s generalisation capability remained erratic. One might anticipate
that eliminating trends and seasonality would enable the model to concentrate better on the
core signals. Yet, the outcomes reflect only slight and inconsistent enhancements in
prediction accuracy. For example, several intervals display moderate R2 values ranging
from -0.1 to -0.6, which, albeit still negative, represent a notable improvement compared
to extreme outliers. These more consistent areas imply that the model identified some
fleeting patterns, but such advantages were not consistent throughout the entire period.
Let’s examine Figure 4.10 (RNN trained on stationary data) alongside Figure 4.7
(RNN trained on raw data). Both figures exhibit generally poor predictive accuracy;
however, Figure 4.10, which incorporates stationarity, shows a slight decrease in volatility
in the R2 scores. The model represented in Figure 4.7 trained on raw data experiences
negative scores that suggest a failure to generalise. In comparison, the R2 values in Figure
4.10, despite remaining predominantly negative, cluster more closely to zero and display
somewhat milder fluctuations. This implies that converting the data into a stationary format
slightly stabilises the model’s output, although it does not yield consistent or reliable
predictions. The highest R2 noted in the stationary framework is a marginal positive value
summary, while the stationarisation of input data seems to assist the RNN model in
mitigating some extreme failures observed with raw data, it does not significantly enhance
the model’s overall predictive capability, underscoring the necessity for more sophisticated
75
Figure 4.11 shows the progression of the training process when the RNN model
Figure 4. 11: Training and Validation R2 observed while training the RNN model on stationary stock market data.
76
Figure 4.11 illustrates the R2 scores for training and validation across epochs in the
RNN model, which was trained on stationary stock market data with various sliding
windows. Each subplot reflects a distinct training window. For each case, the RNN model
underwent training for a predetermined number of epochs (utilising early stopping) with a
five-year training set, and it was subsequently evaluated over a following one-month
testing period. The blue line indicates the R2 score for the training set, whereas the orange
line displays the R2 score for the validation data at each epoch.
In contrast to the chart for the RNN model using raw data, this version shows
Across nearly all windows, the validation curves establish a more consistent pattern,
exhibiting substantially less volatility over epochs. Although some fluctuations persist –
particularly in the initial epochs – the extreme spikes and erratic behaviour noted in the
raw data version are largely missing here. This consistency suggests the model is
identifying more stable relationships when trained on differenced data, which eliminates
A key observation is that the gap between training and validation R² scores tends to
narrow across most windows. This indicates a reduction in overfitting and enhanced
generalisation – two crucial factors when evaluating the robustness of time series models.
Although the validation curves are not always entirely positive, they exhibit smooth
trajectories that closely mirror the training performance. The model’s learning process
seems more consistent across windows, suggesting a better ability to capture signals from
77
Overall, the chart validates the theory that converting financial time series data into
a stationary form improves the model’s capacity to generalise and identify stable patterns.
While R2 values are generally low and even turn negative, the decrease in variance and
Training the RNN on stationary data leads to more dependable convergence and steadier
validation performance, indicating that making the input stationary positively impacts the
The predictions made by the RNN model training on stationary stock market data
Figure 4. 12: Predictions made by the RNN Model trained on the stationary stock market data. The prediction window
is different from that used for ANN because RNN and its variations need a LOOPBACK window. Here, the LOOPBACK
window is set to 15 days.
Figure 4.12 illustrates the forecasts produced by the RNN model trained on
stationary stock market data. In this experiment, the RNN used a loopback window of 15
78
days to capture sequence dependencies, with the prediction window positioned right after
the training phase. This figure displays the predicted index values during the chosen test
period, enabling a visual comparison with the actual stock market data. The setup for model
training and prediction aligns with previous experiments, with the exception of the
In contrast to Figure 4.8, which shows predictions from the RNN trained on raw
data, Figure 4.12 illustrates a noticeable improvement in how closely the predicted curve
follows actual market behaviour. While both figures indicate that the model does not
completely replicate the magnitude or direction of every price movement, the model using
stationary data yields predictions that are more stable and less erratic. Conversely, the
predictions in Figure 4.8 show sharp fluctuations and a noisier path, highlighting the
difficulties the model encountered when dealing with non-stationary inputs. This direct
comparison provides empirical evidence for the notion that preprocessing stock market
data to achieve stationarity can enhance model predictions by minimising volatility and
Now, let’s compare the predictions made by the RNN model with stationary data,
illustrated in Figure 4.12, to those made by the ANN on the same type of data, shown in
Figure 4.5. The predictions in Figure 4.5, while appearing smoother and often lagging or
offset, generally align with the overall trend of the stock index, indicating a broad capacity
to capture directional movements based on recent data. In contrast, the predictions in Figure
4.12 display more nuanced temporal responsiveness, sometimes closely matching short-
term fluctuations. This enhancement in temporal tracking highlights the RNN’s capability
79
advantage comes with greater variability and occasional divergence from actual
movements. Despite both models being trained on stationary data, the RNN demonstrates
a more dynamic, albeit occasionally unstable, prediction pattern, which suggests it can
better leverage temporal structures while also being sensitive to noise and limited
contextual signals. This comparison underscores that, among simpler models, the RNN
Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks are
introduced next into our modelling framework. LSTM networks are a specialised type of
learning long-term dependencies in sequential data. With the inclusion of memory cells
and gating mechanisms, LSTMs can effectively retain and selectively update information
across prolonged time steps, making them ideal for the dynamic and often noisy patterns
commonly seen in financial time series. This section assesses the LSTM model's
performance when trained on both raw and stationary stock market data, employing the
same sliding window methodology used in earlier experiments. The model undergoes
training over a five-year span and is evaluated in the following month, ensuring consistency
in our comparative analysis. This analysis aims to determine if the enhanced memory
80
Figure 4. 13: Architecture of the LSTM Model. This model was trained on both raw and stationarised data.
81
The results obtained after training the LSTM model on raw data are stated in Table
4.8.
82
Figure 4. 14: Recording of the R2 value for predictions made on the test data for each training data window using the
LSTM model on the raw data.
Figure 4.14 illustrates the predictive performance of the LSTM model trained on
raw stock market data, quantified through R2 values computed over several sliding training
windows. Each data point represents the model’s performance for a particular test window
predictive capability. The R2 values depicted in Figure 4.14 exhibit substantial variability,
frequently dipping into deeply negative territory, signifying that the LSTM often
performed worse than simple baseline predictions, such as the historical mean. The
the LSTM in modelling the underlying temporal dependencies present within the noisy and
model using the same raw data, a significant difference is evident. The RNN model exhibits
considerable variability in performance across various windows, but its R2 values tend to
83
be closer to zero. Additionally, there are fewer instances of extremely negative results seen
in the LSTM model. This indicates that, unexpectedly, the simpler RNN architecture is
more consistent and, at times, performs better than the theoretically superior LSTM
Several reasons may explain why the LSTM model underperformed compared to
the simpler RNN model in this situation. First, LSTMs, due to their intricate gating
mechanisms and memory cells, usually need large, high-quality, and well-organized
datasets to learn significant long-term patterns effectively. However, raw financial time
series data often fall short of these criteria, exhibiting high volatility, sudden regime shifts,
and extensive noise that can disrupt complex models. In such noisy conditions, the inherent
poor generalisation of unseen data. Moreover, the relatively brief prediction horizons used
here—just one month—may not fully exploit the LSTM’s capabilities in capturing long-
term dependencies, thus further restricting its effectiveness compared to simpler recurrent
Thus, the comparative analysis of Figures 4.14 and 4.7 reveals an important point:
simply increasing model complexity does not assure better predictive performance,
particularly when faced with raw, noisy financial data. This emphasises the importance of
thorough data preparation, effective feature engineering, and the need to weigh model
84
The data captured while training the LSTM model corroborates this understanding.
Figure 4. 15: Training and Validation R2 observed while training the LSTM model on raw stock market data.
85
Figure 4.15 shows the R2 values for training and validation across epochs for the
LSTM model, which is trained on raw stock market data using multiple sliding windows.
validation. This layout allows for evaluation of the LSTM’s ability to learn and generalise
across different temporal contexts. A close examination of the chart reveals that the training
R2 curves exhibit swift initial gains, quickly stabilising at higher levels. This rapid
stabilisation suggests that the LSTM model effectively adapts to the training data,
scenario. They display erratic behaviour marked by significant fluctuations, instability, and
even negative values, indicating a clear divergence from corresponding curves. This sharp
contrast between training and validation performances highlights the model’s difficulties
with generalisation, as it fails to identify stable and predictive relationships from raw stock
market data. The erratic nature of the validation performance points to serious overfitting
issues, suggesting that the LSTM’s intricate gating mechanism, intended to recognise long-
term dependencies, might be inadvertently capturing the noise and random fluctuations
When these findings are compared to previous results, particularly Figure 4.9,
which illustrated RNN performance using raw data, it becomes clear that the LSTM
behaviour is significantly more unpredictable. Both models struggled with data noise and
instability, but the LSTM’s complex architecture makes it more prone to overfitting when
confronted with raw data that lacks distinct long-term patterns. On the other hand, the
simpler RNN, while not necessarily better in absolute predictive performance, tended to
86
demonstrate fewer extreme fluctuations in validation and maintained more consistent
architectural complexity does not inherently lead to better forecasting capabilities on noisy
raw datasets, especially when the data does not reveal stable long-term dependencies.
LSTM networks are theoretically suited for capturing long-term temporal relationships,
their actual effectiveness relies heavily on the quality and characteristics of the input data.
Raw financial time series, characterised by inherent volatility, short-term irregularities, and
frequent regime shifts, present substantial challenges that restrict the advantages usually
Though deficiencies were found in the LSTM model trained on raw stock market
Figure 4. 16: Predictions made by the LSTM Model trained on the raw stock market data. The prediction window is
different from that used for ANN because LSTMs need a LOOPBACK window. Here, the LOOPBACK window is set to
15 days.
87
Figure 4.16 depicts the LSTM model's predictions after being trained on raw stock
market data, specifically showcasing its performance over a one-month forecast period that
follows a five-year training span. This figure allows for a direct comparison between the
actual stock market index values and the predictions generated by the LSTM. The graphical
representation clearly reveals that the LSTM predictions show notable deviations from
actual values, often lagging behind real market trends with significant offset and amplitude
errors. Although the model occasionally captures the overall directional trends, it struggles
data.
A detailed analysis shows that the predictions made by the LSTM often trail behind
actual market movements, indicating a sluggish response to trend changes. This lagging
response likely stems from the intricate and deep internal gating mechanisms of the LSTM,
which are meant to handle long-term dependencies in sequential data. Ideally, these
mechanisms enable the model to retain important historical information over long
durations. However, in the realm of raw financial data, characterised by frequent short-
term volatility and quick shifts, this same intricacy can hinder the model's ability to
differentiate between significant patterns and random noise. As a result, the model tends to
patterns.
When comparing the LSTM predictions with those from the RNN model, it is a
surprise to discover that the simpler RNN performs relatively better despite its less complex
design. Both models show prediction errors and deviations, but the RNN’s forecasts seem
better aligned with short-term market movements and show fewer cases of significant lag
88
or drastic divergence. This unexpected finding can be explained by the fact that RNNs,
although simpler, possess fewer internal parameters and gates, allowing them to adapt more
The main reason the LSTM fell short compared to the simpler RNN when trained
on raw data is mainly related to its complexity and sensitivity to data quality. LSTMs are
designed to work best with substantial, structured datasets that exhibit clear long-term
relationships, allowing them to utilise their memory functions effectively. However, raw
stock market data usually does not possess these characteristics due to its inherent
volatility, unpredictable trends, and frequent regime changes, which hinder the LSTM’s
boosting accuracy, the increased complexity of LSTMs can make the model more
highlights the essential role of data preprocessing and transformation techniques, such as
stationarisation, especially when using advanced sequential models like LSTMs for
89
The next step is to experiment with creating an LSTM model on stationarised stock
market data. The results obtained after training this LSTM model on stationarised data are
90
Figure 4. 17: Recording of the R2 value for predictions made on the test data for each training data window using the
LSTM model on the stationary data.
Figure 4.17 demonstrates the predictive accuracy of the LSTM model trained on
stationary stock market data by displaying R2 scores across several sliding training
windows. The R2 values throughout these windows reveal that the performance of the
LSTM model is considerably variable yet typically closer to zero than its performance on
raw data, indicating a more consistent predictive capability. Despite several negative R2
values, which highlight instances where the model performed worse than a basic mean-
based forecast, these downward trends are much less severe than those seen with raw data
training. This enhanced stability shows that converting the data to a stationary form notably
diminishes volatility and noise, allowing the LSTM model to identify meaningful short-
Comparing these findings with Figure 4.14, which depicts the LSTM model trained
on raw data, shows a clear improvement in stability and a decrease in the severity of
negative performance. Figure 4.14 illustrates how often the LSTM model encountered
91
deeply negative R2 scores, emphasising the difficulties it faced when working with raw,
noisy, and non-stationary stock market data. The more consistent performance of the
stationary data in Figure 4.17 indicates that preprocessing the data to achieve stationarity
enhances modelling accuracy by reducing erratic fluctuations and allowing the model to
Nevertheless, even with the improvements, the LSTM applied to stationary data
fully leverage the capabilities of LSTM networks for stock market forecasting.
92
The data collected during the training cycle reinforces this conclusion.
Figure 4. 18: Training and Validation R2 observed while training the LSTM model on stationary stock market data.
93
Figure 4.18 illustrates the R2 values for training and validation during the LSTM
model's training phase with stationary data. Over various training windows, the LSTM
values. Initially, these values are notably negative, reflecting the model’s poor initial fit.
However, within just a few epochs, they quickly rise and stabilise around zero, indicating
that the model efficiently learns from stationary time series data. The validation R2 values
closely align with the training values, showcasing similar improvements and stabilisation,
The stationarity of input data improves the LSTM model’s ability to capture
the LSTM's learning tasks, enabling it to better understand and model consistent
behaviours in stock market data. The convergence patterns depicted in Figure 4.18 reveal
fewer fluctuations compared to similar charts created from raw data, indicating that
stationary data aids the model in achieving more rapid and reliable training.
Moreover, within each window, the training and validation scores converge
smoothly and consistently over epochs, showcasing the LSTM's strength in capturing key
time dependencies while minimising the impact of noise. Importantly, the sustained
validation scores at elevated performance levels reinforce the practical utility of utilising
stock market data into a stationary format prior to utilising advanced, recurrent models
such as LSTM. When comparing the training of the same model on unprocessed data versus
94
stationary data, the latter produces a more defined learning trajectory and better validation
results. This emphasises the importance of preprocessing stock market data to attain
It is now established that making the stock market data stationary improves the
predictive power of the models. The predictions made by the LSTM model trained on the
Figure 4. 19: Predictions made by the LSTM Model trained on the stationary stock market data. The prediction window
is different from that used for ANN because LSTMs need a LOOPBACK window. Here, the LOOPBACK window is set
to 15 days.
Figure 4.19 illustrates the predictions generated by the LSTM model, which was
trained on stationary stock market data. It displays the model's performance in forecasting
the NSE index over the subsequent 15 days after using the last five years of training data.
The graph contrasts the actual NSE index values with the predicted figures and clearly
95
delineates the prediction window. Visually, the predicted trajectory shows a generally
upward trend with a smoother transition compared to the more erratic and volatile actual
market data. This demonstrates the LSTM model's ability to capture long-term patterns
from stationary sequences while also highlighting its limitation in adequately addressing
The predictions overlook some of the actual index's turning points, leading to a
noticeable disparity between the observed and predicted values in the highlighted forecast
area. However, the LSTM model does succeed in maintaining a plausible forecast
trajectory, suggesting that the preprocessing step of transforming the input data to be
stationary has helped capture the broader temporal dependencies. The output is
significantly more stable and smoother compared to when LSTM was trained on raw, non-
stationary data. When compared to Figure 4.16, which depicts the LSTM’s performance
with raw data, the results in Figure 4.19 are relatively more coherent, although still not
perfectly accurate. Predictions based on raw data showed greater deviations from the true
series, often diverging in both direction and magnitude. On the other hand, the model
trained on stationary data exhibits improved alignment with the trend structure, even
In comparison to the related prediction plots from RNN (Figure 4.8) and ANN
(Figure 4.5), the LSTM model demonstrates a significant enhancement in its capacity to
maintain temporal coherence and yield less erratic forecasts. The predictions exhibit
reduced noise and greater continuity, indicating that both the LSTM architecture and the
stationarisation of input data play a role in fostering a more disciplined learning experience.
Nevertheless, this figure underscores that LSTM alone, even with stationary inputs, falls
96
short of achieving high accuracy in volatile areas like the stock market. The improvement
is evident yet not optimal, paving the way for more advanced temporal models such as
adaptability.
evaluate its predictive accuracy on stock market data. GRUs are a type of Recurrent Neural
Network (RNN) designed to address some of the common issues associated with
conventional RNNs and Long Short-Term Memory (LSTM) networks. They maintain the
capacity to model sequential data via a gating mechanism while presenting a more
simplicity often results in quicker training and improved generalisation, especially in cases
of limited or noisy data, making GRUs a compelling choice for time series forecasting
tasks.
This section assesses the GRU model using both raw and stationary datasets,
applying the same experimental setup as in previous models. The training involves a five-
year historical window, with evaluation conducted over a one-month forward prediction
horizon, utilising a sliding window method across the entire dataset. This consistent
approach enables a direct comparison of GRU’s performance against the results from
ANN, RNN, and LSTM models. The objective is to determine if the GRU architecture can
97
stationarity, and whether its configuration allows it to surpass the performance of the more
Figure 4. 20: Architecture of the GRU Model. This model was trained on both raw and stationarised data.
98
The results obtained after training the GRU model on raw data are stated in Table
4.10.
99
Figure 4. 21: Recording of the R2 value for predictions made on the test data for each training data window using the
GRU model on the raw data.
Figure 4.25 displays the R2 values from predictions of the GRU model, which was
trained on raw stock market data. The results show considerable fluctuations in model
temporal changes in market behaviour. Although some training windows yield moderately
negative R2 values, many exhibit severe underperformance, with the lowest R2 value
around -7523.54. This extremely negative score suggests that during specific training
periods, the GRU model failed to identify meaningful data patterns, resulting in predictions
that are worse than simply using the mean of the test data. Conversely, the highest R2 value
attained is -1.20, which, while an improvement, still indicates limited explanatory power.
Overall, although the GRU architecture is robust and theoretically aligned with time series
modelling due to its capacity for managing long-term dependencies, the raw input data
likely limited its performance in this instance. The noisiness and non-stationary nature of
the raw stock market data present challenges for even sophisticated architectures like the
GRU in extracting stable and generalisable patterns without additional preprocessing.
100
Comparing these findings to previous experiments with simpler architectures like
LSTM, RNN, and ANN using raw input, a distinct pattern appears. Although the GRU
features an advanced gating mechanism, its performance on raw data did not notably
exceed that of the other models. In fact, the occurrence of extremely low R2 values, like -
7523.54, indicates a level of volatility that was less pronounced in earlier models. For
example, both RNN and LSTM models faced challenges with raw data, yet the fluctuations
in their R2 scores were slightly more stable. This finding emphasises a crucial point of this
research: preprocessing data to achieve stationarity is essential for realising the full
necessarily result in practical benefits when applied to unprocessed time series data,
reinforcing the idea that the statistical characteristics of this data significantly influence
model performance. Therefore, while GRU models may ultimately showcase the greatest
capability, this advantage primarily surfaces when the input data is thoroughly processed
Another key observation across all models, including ANN, RNN, LSTM, and
GRU, is a notable decline in R2 values within the same training window, coinciding with
the 2008 global financial crisis. This widespread underperformance during this timeframe
indicates that external factors, specifically the extreme market volatility and structural
Trained on five-year windows leading up to the crisis, the models struggled to anticipate
the abrupt and chaotic shifts in the stock market that followed. This underscores a major
limitation of data-driven time series models: during economic crises, the assumptions of
statistical continuity break down, making learned patterns less effective or potentially
misleading. The marked decline in model accuracy during this time is not necessarily a
101
flaw in the architecture itself but rather a reflection of the unpredictability and structural
essential when assessing the validity and resilience of machine learning predictions,
102
The data captured regarding the training cycles to see if it corroborates our
Figure 4. 22: Training and Validation R2 observed while training the GRU model on raw stock market data.
103
Figure 4.26 illustrates the R2 values for training and validation across multiple
epochs for each training window during the GRU model's training on raw stock market
data. This chart emphasises the model’s performance and convergence behaviour across
different time segments of the dataset. Although the training R2 curves show a consistent
and rapid increase, successfully converging to high values, there is a striking difference
with the validation R2 curves, which exhibit significant fluctuations. In numerous windows,
the validation R2 scores remain significantly negative, indicating the model’s inadequate
ability to generalise from unseen data. Several training windows demonstrate a drastic drop
in validation R2 scores, with some even descending into the negative thousands, suggesting
A detailed examination shows that the GRU model effectively learns from training
data; however, its ability to generalise across varying time windows is inconsistent. This
discrepancy highlights overfitting as a major concern: the model captures noise in the
training data too well, failing to identify enduring patterns over time. Furthermore, during
several training windows, the validation R2 values show extreme negatives, particularly
during significant financial upheavals like the 2008 global recession, which likely
introduced chaotic and unpredictable market behaviour. These economic shocks disrupt
statistical regularities, complicating the modelling of data with machine learning, notably
findings from other models trained on raw data, which also exhibited poor performance
during and around 2008. This consistency emphasises a key insight from our research: raw
stock market data presents significant challenges to even advanced machine learning
models due to its non-stationary character and vulnerability to sudden systemic shocks.
104
Without proper preprocessing, the GRU’s full potential remains unexploited, underscoring
It is paramount to make the stock market data stationary before modelling it. Before
proceeding to that, let us examine the predictions made by the GRU model trained on the
Figure 4. 23: Predictions made by the GRU Model trained on the raw stock market data. The prediction window is
different from that used for ANN because GRUs need a LOOPBACK window. Here, the LOOPBACK window is set to 15
days.
Figure 4.23 displays the 15-day forecasts made by the GRU model, which was
trained on raw stock market data, alongside the actual NSE index values for the same
timeframe. Despite the model consistently overestimating the true index values, resulting
in a noticeable upward bias, the overall market trend is effectively represented. The
predicted values from the model exhibit a comparable upward curve to the actual data,
indicating that the GRU model can learn and mirror some of the temporal dynamics of
105
stock market behaviour, even when operating on noisy and non-stationary data. This
achievement is significant given the high volatility and unpredictability typical of financial
In comparison to the prediction charts of other models using raw data, the GRU’s
output, shown in Figure 4.23, exhibits better trend detection skills. For example, while the
LSTM and RNN models occasionally aligned directionally with the actual data, they
jagged or delayed responses. The GRU’s smooth path and overall alignment with market
direction indicate a greater level of temporal awareness, which aligns with the inherent
unprocessed stock data for training. The noise and non-stationary features in the raw input
hinder the model’s ability to accurately adjust its numerical outputs, resulting in a
follow trends compared to ANN, RNN, and LSTM when they are configured with raw
data, supporting the idea that GRU is the most effective architecture among those tested.
However, as repeatedly noted in this research, this potential is optimally harnessed only
106
The last experiment is to train the GRU model on stationarised stock market data.
107
Figure 4. 24: Recording of the R2 value for predictions made on the test data for each training data window using the
GRU model on the stationary data.
Figure 4.24 illustrates the R2 values generated by the GRU model when it is trained
on stationary stock market data, utilising various temporal windows. Unlike the
unpredictable and often extreme values witnessed during GRU training on raw data (noted
earlier in Figure 4.21), the results displayed in this figure indicate a marked enhancement
negative, they are now much closer to zero, with the highest value even reaching a positive
0.01. The least effective window results in a comparatively modest R2 value of -9.63, which
is significantly milder than those observed with raw data. This overall narrowing of the R2
score range suggests that the model is better equipped to generalise across different time
frames, even if the predictions themselves are not particularly robust in absolute terms.
The GRU architecture greatly benefits from the transition from raw to stationary
data. As outlined in this thesis, stock market data is characterised by inherent noise and
non-stationarity, which complicates the extraction of meaningful signals, even for
108
sophisticated neural networks. This study employs differencing techniques for
stationarisation, eliminating underlying trends and variance shifts that obscure data
relationships. After addressing these distortions, the GRU model effectively maintains
Additionally, as illustrated in Figure 4.24, when compared to other models used for
stationary data, like RNNs and LSTMs, the GRU model distinguishes itself by yielding
more consistent and narrower R2 value distributions. This further validates the assertion
that GRU surpasses its predecessors. Although none of the models trained with stationary
data achieve notably high R2 scores, the GRU demonstrates improved range compression
and peak performance, highlighting its superior capacity to fit the processed data. This
reinforces the primary argument of the research that preprocessing stock market data to
attain stationarity allows neural network models, especially GRUs, to operate more
effectively and offer more dependable forecasting outcomes. Thus, Figure 4.24 marks a
crucial point in this comparative study and fortifies the conclusion that GRU > LSTM >
109
This conclusion is reaffirmed with the data collected during the training cycles.
Figure 4. 25: Training and Validation R2 observed while training the GRU model on stationary stock market data.
110
Figure 4.25 displays the R2 values recorded during the GRU model's training on
stationary stock market data over various training windows. In contrast to the results from
training on raw data, the performance here shows significantly greater stability and less
fluctuation. Although the R2 scores do not reach high levels of predictive power, mostly
hovering around or just below zero, the lack of drastic negative values indicates that the
model is at least yielding more consistent and trustworthy outcomes. This consistency is
particularly clear when examining the plots of training and validation R2 over epochs; most
training windows reveal smooth convergence patterns, and while validation scores may
experience occasional minor drops, these are less severe compared to those observed in
The enhanced performance can be directly linked to the preprocessing phase that
renders the data stationary. By eliminating trends and stabilising variance throughout the
series, the GRU model can concentrate on identifying the core temporal patterns without
the distraction of non-stationary fluctuations. This leads to a model that generalises better
and reduces the risk of overfitting or underfitting, which frequently occurs when training
on unstable raw data. Additionally, the convergence curves indicate that the GRU
effectively manages the stationary inputs, with training losses quickly stabilising across
most windows. This suggests that the model is well-optimised for the task after the data
This performance also contrasts favourably with the GRU model trained on raw
data, as depicted in Figure 4.22. There, the R2 values dipped to extreme lows, indicating to
severe inability to generalise in several cases. In contrast, the GRU model trained on
stationary data, as seen in Figure 4.25, avoids such dramatic failures, which substantiates
111
the central argument of this thesis: stationarising stock market data enhances the
effectiveness of machine learning models. Moreover, when viewed in the broader context
of this study, the GRU model on stationary data appears to outperform its ANN, RNN, and
LSTM counterparts across various training windows. This aligns with the overarching
claim that GRU architecture is more advanced and capable of sequential modelling,
especially when paired with suitable data preprocessing techniques. Therefore, Figure 4.25
bolsters the dual conclusions that both the GRU architecture and the stationarisation of data
are crucial for achieving reliable and consistent stock market predictions.
To conclude this research, let us examine the predictions made by the GRU model
Figure 4. 26: Predictions made by the GRU Model trained on the stationary stock market data. The prediction window
is different from that used for ANN because GRUs need a LOOPBACK window. Here, the LOOPBACK window is set to
15 days.
112
Figure 4.26 illustrates the forecasting results of the GRU model, which was trained
on stationary stock market data to predict the NSE Index over 15 days. This chart
synthesises all previous experiments and validations within this thesis, showcasing the
predictive capability of the most advanced model applied to the most refined dataset. As
shown in the figure, the GRU model effectively captures the market's overall trend. The
predicted values align closely with the actual NSE Index's directional movements,
indicating that the model has adeptly internalised the underlying structure of the stationary
time series. While there is a noticeable offset between the actual and predicted values, the
general curve shape and turning points are well-matched. This illustrates the model’s
proficiency in learning from the preprocessed data and projecting future values coherently
and meaningfully.
The results show a significant enhancement compared to the predictions from the
GRU model that was trained on raw data, as illustrated in Figure 4.23. Although this model
demonstrated some awareness of the trend, its predictions had a greater offset and lacked
the smooth consistency seen in Figure 4.26. Additionally, the predictions in this case are
considerably more stable and coherent than those generated by the LSTM, RNN, or ANN
models trained on either raw or stationary data. While those models either struggled to
reflect the trend accurately or yielded erratic predictions, the GRU model trained on
momentum. It surpasses the previous models in aligning with the actual trend, exhibiting
lower volatility and more precise predictions, which indicates a higher degree of
113
The figure highlights a key finding of this research: the GRU architecture, trained
on stationary stock market data, proves to be the most effective among all the models
evaluated. Its success derives from structural advantages and the synergy that arises from
proper data transformation. By ensuring the data is stationary, the model is freed from
and forecasting significant temporal patterns. This supports the research's twofold
hypothesis that preprocessing data for stationarity greatly improves prediction accuracy
and that GRU surpasses other neural network architectures in modelling financial time
series data.
4.7 Research Question One: Does Making Stock Market Stationary Impact Stock
Market Models?
The first research question is, “How does making stock market time series data
stationary impact the accuracy of machine learning-based stock market predictions?” Let’s
114
Let us consider the different models developed on raw data and stationary data.
Below are the plots of the achieved R2 for the different training windows.
Table 4. 12: Statistics gathered during training of the different models in different data conditions over different training
windows.
To address the research question, we must consider the empirical evidence gathered
during this study. The comparative analysis of models trained on both raw and stationary
115
data indicates that converting stock market time series into a stationary format markedly
improves model performance. This trend is evident across all model architectures
examined, including ANN, RNN, LSTM, and GRU. The performance metrics, represented
by R2 values and illustrated in Table 4.5, show a consistent enhancement when data is
transformed into stationary form prior to training. This enhancement is reflected in more
stable R2 scores, decreased variance between training and validation results, and smoother
learning curves.
When raw data is used, all the models, regardless of the complexity, struggle to fit
consistently. For example, GRU and LSTM, despite their theoretical strength in modelling
extremely negative scores that indicate complete failure in capturing patterns. These
breakdowns are significantly mitigated when the models are trained on stationary data. The
figures show tighter clustering of R2 values, improved convergence, and better alignment
between training and validation performance. Furthermore, the predictions over the 15-day
horizon, such as those in Figures 4.19 and 4.26, show marked improvement in trend
approximation when stationarised data is used. While the GRU model trained on raw data
generally captures the trend, its outputs are noticeably off and lack confidence. Conversely,
the same model trained on stationary data aligns more closely with the actual data,
minimising excessive deviations and noise and resulting in more reliable forecasts.
These findings reinforce the idea that stationarity is essential for effective time
series forecasting with machine learning. While models like GRU are powerful, they are
very sensitive to the statistical properties of the input data. When attributes such as mean
and variance fluctuate over time, it creates instability that deep learning models often
116
struggle to manage, especially when trained on limited data. Transforming the data to be
stationary reduces such variability, enhancing the learning process and leading to more
reliable predictions. Consequently, this research indicates that making stock market data
stationary greatly boosts the effectiveness and reliability of machine learning forecasting
models.
4.8 Research Question Two: Is GRU better than ANN, RNN, and LSTM for Stock
Market Predictions?
The second research question is, “How do GRU-based models compare to other
machine learning approaches, such as ANN, RNN, and LSTM, in predicting stock prices
and indices?” Let’s analyse the findings across the different models developed.
117
Let us consider the different models developed on raw data and stationary data.
Table 4. 13: Predictions made by the different models on the latest data in the dataset based on the preceding 5-year
training window.
118
To answer the research question, we must examine both the quantitative
performance and the qualitative forecasting behaviour of all four models across raw and
Throughout the experiments, GRU consistently delivered the most reliable and
accurate outcomes, particularly with stationary stock market data. This conclusion is
supported by various layers of evidence found in the results. We begin by analysing the R2
values over time for each model on both raw and stationary datasets. In the stationary
scenario, GRU exhibited the narrowest range of variation and the least frequent occurrence
of extreme negative values. Conversely, ANN exhibited weak performance even with
stationary data. While RNN and LSTM showed some improvement with stationary data,
they still experienced volatility and significant negative R2 dips. Notably, GRU achieved
the highest maximum R2 value across all experiments, suggesting its superior ability to
Additionally, when assessing the predictions from each model, the GRU’s forecasts
aligned most closely with the actual market trend, particularly when it was trained on
stationary data. In contrast, models like ANN yielded almost flat or inaccurately directed
predictions, while LSTM also showed a noticeable discrepancy. The GRU effectively
mirrored the true movement direction of the NSE Index with commendable accuracy. This
performance is crucial in financial contexts, where identifying trend directions and turning
effectively learn long-term dependencies while minimising issues like overfitting and
119
vanishing gradients. This allows GRU to surpass RNN and LSTM in terms of learning
stability and convergence. Although its structure is simpler than that of LSTM, GRU offers
faster training and demands fewer computational resources, all while maintaining
comparable, if not better, accuracy. This is illustrated by the training graphs, where GRU
achieves stable training and validation R2 values sooner and with less variability than the
The evidence clearly demonstrates that GRU surpasses ANN, RNN, and LSTM in
stock market prediction using machine learning. Its higher R2 scores, improved alignment
with real market trends, quicker convergence, and superior robustness to non-linear
patterns highlight GRU as the best architecture among those examined. This supports the
initial hypothesis: GRU > LSTM > RNN > ANN for stock market predictions.
4.9 Research Question Three: Limitations of Predicting the Stock Market Using
that affect the findings' generalisability and strength while addressing this research
question.
A major limitation lies in the very nature of the stock market. Financial markets are
naturally volatile and influenced by a variety of factors that go beyond historical price and
macroeconomic data. Although this research used a comprehensive dataset that includes
stock indices, commodity prices, exchange rates, and GDP figures, it fails to account for
120
sentiment-driven factors like investor psychology, news events, or geopolitical disruptions.
These qualitative and event-driven aspects significantly affect market behaviour, especially
during crises or economic transitions, and their omission reduces the model’s ability to
Another limitation is the reliance on historical data for both training and validation.
market conditions, it cannot completely predict unforeseen events like financial crashes,
pandemics, or sudden regulatory shifts. This temporal constraint implies that the predictive
models may struggle in situations that significantly differ from historical trends, as the
all neural network models evaluated in this research. However, the stationarisation process
behaviour can all influence model results. The methods used to reach stationarity might
unintentionally eliminate valuable long-term signals in the data that could enhance
predictive accuracy.
interpretability. Deep Learning structures, such as GRUs, although effective, are frequently
labelled as “Black Boxes” because their decision-making processes lack transparency. This
obscurity hampers financial analysts and institutional investors from fully trusting and
121
making scenarios. Additionally, training these models demands substantial computational
power and time, making it impractical for various financial institutions and individual
investors.
Ultimately, this study focuses exclusively on the Indian stock market by utilising
NSE index data. Although the findings are insightful, they cannot necessarily be applied to
other markets that feature varying structural, regulatory, and economic environments. It
may be necessary to adjust or redesign the models to address the unique characteristics of
This research highlights the importance of stationarising stock market data and
future studies should expand upon this by integrating sentiment analysis, real-time data
feeds, and cross-market comparisons to fully realise the potential of machine learning in
financial forecasting.
The research findings highlight key insights regarding the intersection between data
preprocessing and deep learning models in stock market prediction. A predominant theme
from rigorous experimentation and comparisons is the essential impact of data stationarity
on increasing predictive accuracy. For all model types – ANN, RNN, LSTM, and GRU –
it was consistently noted that performance was enhanced when the stock market data was
122
converted into a stationary format. Making the data stationary mitigated trends and
volatility, allowing the models to uncover inherent patterns more clearly and consistently.
This result supports the study's main hypothesis that transforming financial time series into
Among the evaluated models, GRU emerged as the most effective architecture for
time series forecasting, particularly with stationary data. It yielded the most stable R2
values across varying training windows and consistently identified market trends in its
predictions. While models like LSTM and RNN demonstrated some improvements with
stationary data, they were less robust and more susceptible to underfitting or overfitting
during specific periods. In contrast, ANN struggled to provide consistent outcomes, even
with stationary inputs, underscoring its limitations in sequential learning tasks. These
results further support the hierarchy outlined in the research: GRU outperforms LSTM,
A key takeaway was the critical role of data quality and preprocessing. Raw
financial time series frequently showed erratic fluctuations and structural breaks that
obstructed model learning, particularly for complex architectures. This was reflected in the
inconsistent R2 scores and unstable training behaviour noted with raw data. By
transforming the data to be stationary, the models showed better generalisation, quicker
convergence, and enhanced predictive stability. Moreover, the models were able to
maintain the trend directionally in their forecasts, even if the absolute values had certain
123
The study ultimately demonstrates that combining data transformation with
advanced deep learning architectures, especially GRU, forms a robust methodology for
stock market forecasting. This research adds to the increasing evidence favouring machine
learning in finance and underscores the importance of careful data preparation. These
results confirm the theoretical principles established in the existing literature and offer
4.11 Conclusion
forecasting via machine learning methods. A crucial finding of this study is that
transforming stock market data into a stationary format greatly enhances the predictive
showing improved model performance, assessed through R² and Mean Squared Error
(MSE) metrics, when utilising stationary data rather than raw time series. Models
developed with stationary data demonstrated greater stability, quicker convergence, and a
closer alignment with actual stock market movements, emphasising the importance of this
The research further demonstrates that within the tested neural network
architectures, GRU models surpass others, specifically ANN, RNN, and LSTM, across
both raw and stationary datasets. GRU's superiority is evident in its predictive metrics and
124
its capacity to capture directional trends while generalising effectively across various
training windows. These findings support the hypothesis that GRU, owing to its efficient
dependable framework for stock market forecasting, particularly when used with
Additionally, the study highlights the drawbacks of directly using raw stock market
data in predictive models. This raw data, marked by non-stationary behaviour and noise,
causes instability that hampers model learning and generalisation. This issue was
evidenced by very low R2 values and poor prediction alignment across models trained on
raw data. In contrast, converting the data to achieve stationarity eliminates this noise and
accuracy.
network designs. It demonstrates that accurate predictions in the stock market are
achievable through a careful blend of data transformation and deep learning techniques.
Despite existing challenges, notably in capturing sentiment and unpredictable events, the
findings pave the way for future research opportunities, such as hybrid models and multi-
modal datasets. Consequently, this study enhances the knowledge base in financial
analytics and lays a solid groundwork for developing effective machine-learning models
125
CHAPTER V:
DISCUSSION
This study's results offer valuable insights into stock market prediction dynamics
using machine learning, especially regarding time series data transformations. A key theme
from the research is the notable enhancement in model performance following the
stationarisation of stock market data. For all tested models – ANN, RNN, LSTM, and GRU
their predictive accuracy. Models based on raw data often showed erratic R2 values,
market conditions. Conversely, models trained on stationary data demonstrated more stable
ANNs provided a useful baseline, their lack of ability to capture temporal dependencies
rendered them the least effective, particularly in unstable financial contexts. RNNs
enhanced this by integrating sequential learning; however, they struggled with vanishing
gradient issues over long-term observations. LSTM models improved on this with
stationary data, but still encountered some convergence challenges. GRU models emerged
as the most efficient option, balancing computational simplicity with deep temporal
126
particularly when trained on stationary datasets, confirming their effectiveness for
2008 financial crisis, marked by a significant drop in R2 values across all models. This
trend underscores a key limitation of models based on historical data: their failure to
forecast or account for black swan events that diverge from established patterns.
Nevertheless, the overall trend bolsters the hypothesis that effective data preprocessing and
model architecture are vital components for successful predictions in the stock market.
The results show that although no model can completely counteract the
the reliability and accuracy of predictions. This underscores the key contributions of this
study and strongly supports the use of data preprocessing techniques and sophisticated
5.2 Discussion of Research Question One: Does Making Stock Market Stationary
The research results clearly show that transforming stock market data into a
stationary format greatly enhances the effectiveness of predictive models. This finding has
learning architectures, such as ANN, RNN, LSTM, and GRU. Regardless of the
127
architecture employed, both training and testing outcomes showed notable improvement
when the data underwent stationarised techniques like differencing. The performance
metrics, particularly the R2 values, demonstrated greater stability and achieved higher
peaks for models trained on stationary data in comparison to those trained on raw data.
This discovery supports the study's main hypothesis and is consistent with the
theoretical foundations of time series analysis. When the statistical characteristics of the
data are stabilised over time, machine learning models can more effectively recognise
significant patterns and learn from them. This transformation particularly enhanced the
capturing the trends and fluctuations of the stock market index, despite some models
showing offsets.
crucial step in developing dependable and effective stock market prediction models, which
has been extensively validated and discussed in this thesis. This transformation process
5.3 Discussion of Research Question Two: Is GRU better than ANN, RNN, and
This study's analysis offers compelling evidence that GRU models outperform
ANN, RNN, and LSTM models in stock market predictions, particularly when trained on
stationary data. Throughout the experiments detailed in this thesis, GRU consistently
128
achieved higher R2 scores, aligned trends more accurately in its predictions, and showcased
better overall stability across various training windows. While all models benefited from
the preprocessing that made the data stationary, GRU maintained its advantage even with
the unprocessed dataset. However, limitations arose from the inherent noise and non-
stationarity present in such data. Its architecture, which is designed to manage long-term
dependencies while avoiding the vanishing gradient issue, allowed it to capture temporal
When comparing GRU to ANN, RNN, and LSTM, the advantages in predictive
accuracy and robustness stood out. While ANN models are fast to train, they lack the
sequential learning needed for time series forecasting. RNNs showed improved
performance but had difficulties with long sequences. LSTMs resolved these issues
through advanced memory management but also increased complexity and training
durations. GRU models strike a balance between performance and efficiency, featuring a
streamlined gating mechanism that enhances scalability and adaptability to the dynamic
nature of financial time series data. This advantage was especially apparent when
evaluating the final predictions from each model across the same test periods.
While GRUS have outperformed traditional models in this study, there are still
and hybrid methods could lead to even better outcomes. For instance, merging
Convolutional Neural Networks (CNN) with GRUs might enhance local pattern detection
prior to processing the data with a temporal model. Likewise, incorporating attention
mechanisms into GRU designs could enable the model to concentrate on the most
significant parts of the input sequence, thereby enhancing both interpretability and
129
accuracy. Furthermore, utilising transformer-based models, which have transformed
sequence modelling in natural language processing, may offer a new and potentially more
This study reveals significant limitations in the methodology applied for predicting
stock market trends through machine learning models. Though the research effectively
demonstrates that stationarising stock market data enhances prediction accuracy and that
GRU models outperform alternative architectures, there remain challenges that limit the
reliability and generalisability of the results. A key limitation is the exclusive dependence
on historical numerical data. While the study meticulously gathers a dataset that includes
These external influences can dramatically impact stock prices, and their exclusion from
the model may create gaps in predictive accuracy, especially during periods of significant
market fluctuations.
structures such as GRU, tend to be vulnerable during times of severe economic upheaval.
This became clear in various segments of the analysis, particularly during the 2008
financial crisis, when R2 values for all models fell sharply. Although these patterns were
clear and consistent, the models did not adequately consider the unpredictable nature of
130
such events. This underscores a larger problem in depending exclusively on historical data
stationarity, yet it requires subjective choices regarding parameter selection and the
interpretation of tests like the ADF and KPSS. The results of these tests can differ based
Furthermore, the black-box nature of deep learning models, such as GRU, poses
grasping the reasons behind specific predictions is a major obstacle, which hampers the use
crucial.
Finally, although this methodology was effectively implemented in the Indian stock
market, its relevance to other markets is still unproven. The distinct features of the Indian
factors, could impact the transferability of the findings to other global financial contexts.
different national or regional markets and also consider incorporating hybrid or ensemble
131
CHAPTER VI:
6.1 Summary
prediction models utilising various neural network architectures on both raw and stationary
data. The study clearly demonstrates that converting stock market data into a stationary
format significantly boosts model performance, which supports the central hypothesis. By
employing a well-organised dataset from the Indian stock market, the research
methodically evaluated the predictive abilities of four models – ANN, RNN, LSTM, and
different models regarding their performance with raw and stationary data.
stationary data, highlighting its ability to manage the temporal complexities of financial
time series. Following closely was LSTM, which displayed robust performance but had
some instability when dealing with raw data. RNN models achieved moderate success,
whereas ANN, which lacks temporal memory, consistently fell behind. Despite these
variations, a common trend among all models was the significant boost in performance
after the data was rendered stationary, underscoring the importance of preprocessing in
132
The research, based on statistical testing and thorough experimentation, validated
the difficulties in forecasting stock market movements using solely historical numerical
data. Events like the 2008 financial crisis consistently resulted in dips in model
face of unexpected external shocks. However, the study demonstrates a clear enhancement
6.2 Implications
The research findings have significant implications for both theory and practice in
financial forecasting and machine learning. They highlight the essential need to convert
stock market time series data into a stationary format to improve model performance. This
insight is particularly relevant for financial analysts, data scientists, and institutional
that stationarity enhances model accuracy, the study underscores the importance of
thorough preprocessing when developing predictive systems for highly volatile datasets
integrating statistical rigour with sophisticated deep-learning models. Among the reviewed
models, the GRU stands out as the most effective, particularly when applied to stationary
data. This demonstrates the strength of GRU in processing sequential financial information
and paves the way for future advancements in neural network designs aimed at time series
133
forecasting. GRU's performance compared to ANN, RNN, and LSTM provides a useful
For professionals in the finance industry, the findings strongly support the
integration of GRU-based models into their forecasting strategies, especially when paired
with effective preprocessing techniques. Although traditional models and heuristics remain
financial data offers a significant competitive advantage when utilised appropriately. This
study advocates for a transition to more data-driven, flexible forecasting models that can
as India, where stock market behaviour often contrasts with that in established markets.
This research's findings and observations suggest several promising avenues for
future stock market prediction work using machine learning. While this study primarily
evaluated ANN, RNN, LSTM, and GRU models on both raw and stationary financial data,
there is considerable opportunity to build on these results and further refine the modelling
approaches. One promising avenue is to investigate hybrid architectures that leverage the
(CNN) with GRUs may facilitate better feature extraction from intricate time series data
134
mechanisms could significantly improve the model’s capacity to concentrate on relevant
changed sequence modelling in areas like natural language processing (NLP), feature a
parallelisable architecture and an effective attention mechanism that could facilitate the
Investigating the use of transformers with stationary stock market data may yield fresh
from news articles, social media trends, and other qualitative sources, the model's
robustness could be enhanced. Such features could play a crucial role in understanding
range of financial instruments could help generalise the outcomes. Although this research
centred on the Indian stock market, broadening the approach to both developed and
emerging markets globally would validate the findings across different market conditions.
Such expansions would strengthen the main conclusions of this study and significantly
contribute to the developing area of financial time series modelling with machine learning.
135
6.4 Conclusion
This research explored how data stationarity and model architecture influence stock
market prediction using machine learning. A systematic series of experiments with ANN,
RNN, LSTM, and GRU models was conducted on both raw and stationary data. Results
indicated that converting stock market time series to a stationary format consistently
enhanced model stability and predictive accuracy. Notably, GRU models outperformed the
Without data transformation, models, regardless of their complexity, faced instability and
inadequate generalisation. Achieving data stationarity clarified the time series’ underlying
structure, facilitating enhanced learning and improving the correlation between predicted
and actual trends. This trend was consistently evident across R2 values, prediction plots,
At the same time, the study acknowledged its constraints. Predictive models based
like financial crises or changes influenced by sentiment and external news. This
shortcoming highlights the necessity for future research that goes beyond its current
boundaries, integrating more data sources and exploring architectures like attention
136
This work has introduced a systematic method for evaluating model performance
and the impact of data transformation in stock market forecasting. It provides a foundation
for future research to develop more sophisticated models, hybrid methodologies, and larger
137
REFERENCES
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human
Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2020). Optimal deep learning LSTM
model for electric load forecasting using feature selection and genetic algorithm:
Springer.
Buslim, N. (2021). Comparing Bitcoin’s prediction model using GRU, RNN, LSTM.
Chatterjee, A., & Yadav, M. (2023). Stationarity transformation for enhancing LSTM and
Chatterjee, D., & Yadav, A. (2023). Comparative Study of LSTM and GRU Models on
Indian Stock Markets with Stationary Data. Indian Journal of Quantitative Finance,
11(4), 176–189.
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &
Davies, I. N., Okolie, S. O., Adoghe, A. U., Adeyemo, A. A., & Aghware, F. O. (2022).
Stock prediction on the Nigerian Exchange using Type-2 fuzzy logic. Journal of
Fama, E. F. (1970). Efficient Capital Markets: A Review of Theory and Empirical Work.
Journal of Finance, 25(2), 383–417.
138
Fishbein, M., & Ajzen, I. (1975). Belief, attitude, intention, and behavior: An introduction
Foote, J. (2021). Applications of machine learning in financial markets: Past, present, and
Foote, J. (2021). Deep learning for time series forecasting: Learn how to use Python and
Foote, J. (2021). Deep Learning for Time Series Forecasting: Predict the Future with
Giles, C. L., & Omlin, C. W. (1994). “Noisy Time Series Prediction using a Recurrent
255.
Giles, C. L., & Omlin, C. W. (1994). Backpropagation and the dynamics of recurrent neural
Gupta, R., & Srivastava, M. (2024). Machine Learning in Behavioral Finance: Identifying
108.
Gupta, S., & Srivastava, A. (2024). The impact of global macroeconomic shocks on AI-
37–51.
Hwang, H. (2023). The history of stock markets: From ancient Rome to Wall Street.
139
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice (2nd
ed.). OTexts.
Iqbal, M., & Kumar, P. (2024). Data transformation and forecasting accuracy in financial
Iqbal, T., & Kumar, V. (2024). Volatility Clustering and Non-Stationarity in Emerging
Khaldi, B., Bouktif, S., & Serhani, M. A. (2022). An empirical evaluation of deep learning
architectures for stock market prediction using temporal features. Expert Systems
Lo, A. W., & MacKinlay, A. C. (1999). A non-random walk down Wall Street. Princeton
University Press.
Malkiel, B. G. (1973). A random walk down Wall Street: The time-tested strategy for
Mitra, A., & Banerjee, S. (2023). Revisiting Market Efficiency in the Age of AI. Journal
Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price
index movement using Trend Deterministic Data Preparation and machine learning
Preis, T., Moat, H. S., & Stanley, H. E. (2013). Quantifying trading behavior in financial
Radecic, D. (2021). What is White Noise in Time Series Forecasting?. Towards Data
Science.
140
Rathore, R., & Mehta, P. (2025). Stationarising Time Series for Deep Learning: Empirical
Rathore, S., & Mehta, P. (2025). Enhancing deep learning forecasting accuracy through
in time series forecasting using deep learning. International Journal of Data Science
Securities and Exchange Board of India (SEBI). (2023). Annual Report 2022–23. SEBI.
Sharma, N., & Dutta, A. (2024). Enhancing financial time series forecasting using GRUs:
68.
Sharma, N., & Dutta, A. (2024). GRU vs LSTM in Financial Time Series Prediction: A
Shumway, R. H., & Stoffer, D. S. (2017). Time series analysis and its applications: With
Sidekerskienė, T., Damaševičius, R., & Maskeliūnas, R. (2024). “Internet Finance Non-
Telecom Regulatory Authority of India (TRAI). (2021). The Indian Telecom Services
World Bank. (2022). Digital Development Overview: India. World Bank Reports.
141
Zhang, G., Eddy Patuwo, B., & Hu, M. Y. (1998). Forecasting with artificial neural
networks: The state of the art. International Journal of Forecasting, 14(1), 35–62.
Zhang, W., & Zhou, D. (2004). Stock market prediction through multi-source fusion and
142
APPENDIX A:
This section provides the Python programs used to fetch the data for this research,
calling suitable APIs. These codes can be used to replicate the experiment for verification
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
# Fetch the historical data for NSE Index from 1-Jan-2000 to 31-Dec-2024
nse_index = yf.Ticker(target)
df_raw_NSE = nse_index.history(start="2000-01-01", end="2024-12-31")
143
plt.title("NSE Index Closing Prices (2000-2024)")
plt.show()
A.2 Fetching the Gold, Silver, and Crude Oil Prices data
import yfinance as yf
import pandas as pd
144
import matplotlib.pyplot as plt
# Date range
start_date = "2000-01-01"
end_date = "2024-12-31"
145
plt.grid(True)
plt.show()
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
146
A.3 Fetching the INR-USD Exchange Rate data
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import datetime
plt.show()
147
Figure A. 3: INR-USD Exchange Rates.
import world_bank_data as wb
import pandas as pd
import matplotlib.pyplot as plt
148
# Set labels and title
plt.xlabel("Year")
plt.ylabel("GDP (in Billion US Dollars)")
plt.title("GDP of India (2000-2024)")
plt.xticks(rotation=45)
plt.grid(True)
# Remove the legend box and only keep the left & bottom spines
plt.gca().spines[['top', 'right']].set_visible(False)
plt.show()
import pandas as pd
# Ensure all datasets have a Date index and remove timezone information
df_raw_NSE.index = pd.to_datetime(df_raw_NSE.index).tz_localize(None)
df_raw_commodities.index = pd.to_datetime(df_raw_commodities.index).tz_local
ize(None)
df_raw_xchng.index = pd.to_datetime(df_raw_xchng.index).tz_localize(None)
df_raw_gdp.index = pd.to_datetime(df_raw_gdp.index).tz_localize(None)
149
df_gdp_daily = df_raw_gdp.copy().to_frame() # Convert Series to DataFrame
df_gdp_daily = df_gdp_daily.reindex(pd.date_range(start="2000-01-01", end="2
024-12-31", freq="D"))
df_gdp_daily.ffill(inplace=True) # Forward fill GDP values for all days in
each year
df_gdp_daily.columns = ['Indian_GDP'] # Rename column
150
APPENDIX B:
This section provides the Python programs used to make the data stationary for this
research by performing suitable tests and data transformations. These codes can be used to
Below are the functions used to check for stationarity of a time series.
import pandas as pd
import numpy as np
import warnings
from statsmodels.tsa.stattools import adfuller, kpss
Parameters:
series (pd.Series): Time series data.
Returns:
dict: ADF test results including test statistic, p-value, and conclu
sion.
"""
result = adfuller(series, autolag='AIC')
print("Critical Values:")
for key, value in result[4].items():
print(f" {key}: {value:.4f}")
151
conclusion = "Stationary" if result[1] < 0.05 else "Non-Stationary"
print(f"Conclusion: The time series is {conclusion}.")
Parameters:
series (pd.Series): Time series data.
Returns:
dict: KPSS test results including test statistic, p-value, and concl
usion.
"""
with warnings.catch_warnings():
warnings.simplefilter("ignore") # Suppress warnings
result, p_value, lags, critical_values = kpss(series, regression='c'
, nlags="auto")
print("Critical Values:")
for key, value in critical_values.items():
print(f" {key}: {value:.4f}")
Parameters:
series (pd.Series): Time series data.
Returns:
dict: Summary of stationarity conclusion.
"""
adf_result = apply_adf_test(series)
kpss_result = apply_kpss_test(series)
152
print("\n=== Final Conclusion on Stationarity ===")
print(conclusion)
The NSE Index time series was checked for stationarity. The raw data was not
import gdown
# Download df_raw.csv
url = 'https://drive.google.com/uc?id=1tIJctQuQL-LGRdHFXnihgvlVEaGGncJl'
gdown.download(url, './df_raw.csv', quiet = False)
import pandas as pd
df_raw = pd.read_csv('./df_raw.csv')
153
The code below performs the stationarity tests and makes the time series stationary.
154
Lags Used : 16
Number of Observations: 4063
Critical Values:
1%: -3.4320
5%: -2.8623
10%: -2.5671
Conclusion: The time series is Stationary.
The NSE Index time series was made stationary after applying one differencing.
The Gold Price time series was checked for stationarity. The raw data was not
stationary, but it became stationary after applying one differencing transformation.
155
5%: -2.8622
10%: -2.5671
Conclusion: The time series is Non-Stationary.
156
The Gold Prices time series was made stationary after applying one differencing
transformation.
The Silver Price time series was checked for stationarity. The raw data was not
157
print("\n=== Testing Stationarity After First Differencing ===")
stationarity_results_diff1 = check_stationarity(silver_diff1)
The Crude Oil Price time series was checked for stationarity. The raw data was not
158
Lags Used : 20
Number of Observations: 4060
Critical Values:
1%: -3.4320
5%: -2.8623
10%: -2.5671
Conclusion: The time series is Non-Stationary.
159
=== Final Conclusion on Stationarity ===
The time series is stationary.
The INR-USD Exchange Rate time series was checked for stationarity. The raw
data was not stationary, but it became stationary after applying one differencing
transformation.
160
print("\n=== Testing Stationarity After First Differencing ===")
stationarity_results_diff1 = check_stationarity(exrate_diff1)
The Indian GDP time series was checked for stationarity. The raw data was not
161
Lags Used : 0
Number of Observations: 4080
Critical Values:
1%: -3.4320
5%: -2.8622
10%: -2.5671
Conclusion: The time series is Non-Stationary.
162
=== Final Conclusion on Stationarity ===
The time series is stationary.
All the time series could be made stationary with one differencing. So, all the
stationary time series were assimilated into a single data frame to apply machine learning
algorithms.
import pandas as pd
import numpy as np
163
print(df_st.head(10))
164