0% found this document useful (0 votes)

31 views11 pages

SYSS Forecasting Classification

Uploaded by

fatemeh1725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views11 pages

SYSS Forecasting Classification

Uploaded by

fatemeh1725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/309492895

Forecasting to Classiﬁcation: Predicting the direction of stock market price

using Xtreme Gradient Boosting

Working Paper · October 2016

DOI: 10.13140/RG.2.2.15294.48968

CITATIONS READS

51 20,986

4 authors, including:

Shubharthi Dey Snehanshu Saha

PES Institute of Technology BITS Pilani, K K Birla Goa
1 PUBLICATION 51 CITATIONS 274 PUBLICATIONS 1,664 CITATIONS

SEE PROFILE SEE PROFILE

Suryoday Basak
Pennsylvania State University
47 PUBLICATIONS 811 CITATIONS

SEE PROFILE

All content following this page was uploaded by Snehanshu Saha on 28 October 2016.

The user has requested enhancement of the downloaded file.

Forecasting to Classification: Predicting the direction of
stock market price using Xtreme Gradient Boosting

∗ †
Shubharthi Dey Yash Kumar
PESIT South Campus PESIT South Campus
Bangalore Bangalore
Karnataka, India Karnataka, India
deysubharthi15@[Link] yash.kumar1396@[Link]
‡ §
Snehanshu Saha Suryoday Basak
PESIT South Campus PESIT South Campus
Bangalore Bangalore
Karnataka, India Karnataka, India
snehanshusaha@[Link] suryodaybasak@[Link]

ABSTRACT 1. INTRODUCTION
Stock market prediction is the art of determining the fu- The stock market has always been the area of interest for
ture value of a company stock or other financial instrument people around the world. Perception and experience are
traded on an exchange. It had been a real challenge for ana- put to good use by non-practitioners to invest in a particu-
lysts and traders to predict the trends of stock market due to lar company hoping for multiplied return. The stock market
its uncertain nature. Stock prices are likely to be influenced trend prediction had also been the area of keen interest of
by the factors like product demand, sale, manufacture, in- statisticians and computer scientists simply for the reason
vestor’s sentiments, ruling government, recession etc. The that the area throws complex modeling questions. There
successful prediction of a stock’s future price could yield sig- exist methods or algorithms which could predict the stock
nificant profit. The main aim of this paper is to design an ef- valuation with a fair degree of accuracy. However, a question
ficient model which will accurately predict the trend of stock still persists: if a person decides to buy shares from a partic-
market using eXtreme Gradient Boosting(XGBoost) which ular company, what is the probability that it turns out to be
has proved to be an efficient algorithm with over 87% of ac- a successful expedition or mere failure? An informed guess
curacy for 60 day and 90 day periods and it has proved to works on a broader basis i.e. considering the production sale
be much better when compared to traditional non-ensemble and demand of the organization in the present scenario, it
learning techniques. The prediction problem has been recon- may be fine to invest in aparticular stock. However, it is
structed as a classification problem and XGBoost turned out too much to expect this to work in complex situations ig-
to be significantly better than the algorithms found in litera- noring certain nunaced concepts and factors that govern the
ture. The proposed model outperforms all existing forecast- market. For example, the political situation in a country
ing models in literature and is able to forecast on long-term is inefficient and too volatile to handle the economy of the
basis, an added feature absent in literature. country. A fall in the economy triggers fall in the stock value
of a company. Due to these minute and chaotic parameters,
the prediction becomes increasingly difficult. The traders
Keywords tend to invest in a firm which has potential or history of
XGBoost, ensemble learning, Exponential smoothing good return based on the current situation. However, there
always exists a possibility that a company which appears to
∗First Author and Second author share equal credit. have incurred a loss may be the potential firm which the
†First Author and Second author share equal credit. traders/investors may have continued faith in.
‡Dr. Saha is the corresponding author. Over the past years, there had been enormous research in
this field pivoted around statistical machine learning. Dif-
§Suryoday is with the Center of Applied Mathematical Mod-
ferent predictive models and algorithms to a certain degree
eling. of accuracy, have been proposed and tested. Implementa-
tion of machine learning techniques is an evolving concept.
This is somewhat different from traditional forecasting and
diffusion methods. Early models used in stock forecast-
ing involved statistical methods such as time series model
and multivariate analysis (Gencay (1999), Timmermann and
Granger(2004), Bao and Yang(2008)). Our paper mainly fo-
cuses on the machine learning approach of analysis as it is
evident that the historical data set obtained (say from the
date when the company existed) is impossible to analyze
without data mining methods. The prediction as a result of 3 contains key definitions. Section 4 describes Data prepro-
our proposed algorithm may help people to decide whether cessing and feature extraction in detail. Section 5 describes
to invest in a particular company taking into consideration the methods and algorithm used. The results of the applied
the chaos and volatility of stock. We adopted a machine algorithm are obtained and analyzed in section 6. The next
learning approach different from the commonly practiced section is a comparative study establishing the superiority
ones such as Support Vector Machine, Neural Network and of our proposed algorithm. The authors conclude with a
Naive Bayesian Classifier, Linear Discriminant Analysis etc. comprehensive summary of the work.
Next section discusses related literature.
3. KEY DEFINITIONS
2. LITERATURE SURVEY Time Series Data: A time series is a series of data points
Recent years witnessed considerable traction in the field of indexed (or listed or graphed) in time order. Most com-
Stock market prediction specially from the Machine Learn- monly, a time series is a sequence taken at successive equally
ing point of view. Prediction of stock market behavior is spaced points in time. Thus it is a sequence of discrete-time
used to determine the future trends. Prediction is usually data.
accomplished by analyzing historic time series data. Note: We used the Apple,Yahoo data set. This is a time
Several algorithms have been used in stock prediction such series data set which is further smoothed exponentially as
as SVM, Neural Network, Linear Discriminant Analysis, Lo- discussed in section 4.
gistic Regression, Linear Regression, KNN and Naive Bayesian Gradient Boosting: Gradient boosting is a machine learn-
Classifier. It was found that logistic regression was one of ing technique for regression and classification problems, which
the best with a success rate of 55.65%. Dai and Zhang produces a prediction model in the form of an ensemble of
(2013) used the training data from 3M Stock data. Data weak prediction models, typically decision trees. It builds
contains daily stock information ranging from 1/9/2008 to the model in a stage-wise fashion like other boosting meth-
11/8/2013 (1471 data points). Multiple algorithms were ods do, and it generalizes them by allowing optimization of
chosen to train the prediction system. These algorithms an arbitrary differentiable loss function.
include Logistic Regression, Quadratic Discriminant Analy- Technical Indicators: Technical Indicators are important
sis and SVM. The algorithms were applied to the next day parameters that are calculated from time series stock data
model which predicted the outcome of the stock price on the that aim to forecast financial market direction. They are
next day and long term model, which predicted the outcome tools which are widely used by investors to check for bearish
of the stock price for the next n days. The next day predic- or bullish signals.
tion model produced accuracy results ranging from 44.52% Relative strength Index: Relative strength index(RSI)
to 58.2%. SVM reported the highest accuracy of 79.3%, For is calculated as
the long term prediction where time window taken was [Link] 100
one of the published paper [11] on ANN(artificial neural net- RSI = 100 −
(1 + RS)
work),authors used for the forecast the direction of Japanese
stock market gave an accuracy of 81.27%. In the published average gain over past 14 days
paper on [10] Random Forest(RF), an ensemble technique is RS =
average loss over past 14 days
used to predict the stock market prices( Trend up or Down)
and returned the highest accuracy of 78.81% for the ith day The RSI is classified as a momentum oscillator, measuring
on the direction of movement in the daily TSE(Tehran Stock the velocity and magnitude of directional price movements.
Exchange) index. Momentum is the rate of the rise or fall in price. The
Ensemble learning algorithms remains largely unexplored RSI computes momentum as the ratio of higher closes to
however. The focus of the is to implement the ensemble lower closes: stocks which have had more or stronger posi-
learning technique and to discuss its advantages over non- tive changes have a higher RSI than stocks which have had
ensemble techniques. We will be using an ensemble learning more or stronger negative changes.
method known as Xgboost to build our predictive model. The RSI is most typically used on a 14-day timeframe, mea-
Our model has been trained for 60, 90 and 120 days respec- sured on a scale from 0 to 100, with high and low levels
tively and the results were impressive. Moreover, majority marked at 70 and 30, respectively. Shorter or longer time-
of the the related work focused on the time window of 10 to frames are used for alternately shorter or longer outlooks.
44 days on an average as majority of the authors had pre- More extreme high and low levels-80 and 20, or 90 and 10 -
ferred to use metric classifiers on time series data which is occur less frequently but indicate stronger momentum.
not smoothed. Therefore, those models are unable to learn Stochastic Oscillator: Stochastic oscillator is given by
from the data set when it comes to predicting for a long term C − L14
window. Our model underscores that it firstly smooths the %k = 100 ∗
H14 − L14
data to begin with being a non-metric classifier, it is capable
of accurately predicting in a long term time window. where, C=current closing price; L14 = Lowest low over past
The paper will highlight certain critical aspects largely ig- 14 days; H14 = Highest high over past 14 days
nored by most of the literature. These include analyzing The term stochastic refers to the point of a current price in
the non-linearity in features used for analysis and futility relation to its price range over a period of time. This method
of employing linear classifiers, Long-run predictions running attempts to predict price turning points by comparing the
into 90 days where related manuscripts considered the time closing price of a security to its price range.
window up to 44 days. Significant improvement in accuracy William % R: William’s %R is give by
obtained by our approach embellish the claims. H14 − C
The remainder of the paper is organized as follows. Section %R = −100 ∗
H14 − L14
where, C = Current Closing Price; L14 = Lowest Low over as:
the past 14 days; H14 = Highest High over the past 14 days.
Williams %R ranges from -100 to 0. When its value is above S0 = Y0 ; St = α ∗ Yt + (1 − α) ∗ St−1
-20, it indicates a sell signal and when its value is below -80, where α is the smoothing factor, 0 < α < 1. Larger val-
it indicates a buy signal. ues of α reduce the level of smoothing. When α = 1, the
Moving Average Convergence Divergence(MACD): smoothed statistic becomes equal to the actual observation.
The formula for calculating MACD is: The smoothed statistic, St can be calculated as soon as two
observations are available. This smoothing removes random
M ACD = EM A12 (C)-EM A26 (C)
variation or noise from the historical data allowing the model
to easily identify long term price trend in the stock price be-
SignalLine = EM A9 (M ACD) havior. Technical indicators are then calculated from the
where, MACD = Moving Average Convergence Divergence; exponentially smoothed time series data which are later or-
C = Closing Price series; EM An = n day Exponential Mov- ganized into feature matrix. The target to be predicted in
ing Average. the ith day is calculated as follows:
When the MACD goes below the SignalLine, it indicates a targeti = Sign(closei+d −closei )
sell signal. When it goes above the SignalLine, it indicates
a buy signal. where d is the number of days after which the prediction is
Price Rate of Change: It is calculated as follows: to be made. When the value of targeti is +1, it indicates
that there is a positive shift in the price after d days and
C(t)-C(t-n)
P ROC(t) = -1 indicates that there is a negative shift after d days. The
C(t-n) targeti values are assigned as labels to the ith row in the
where, PROC(t) = Price Rate of Change at time t; C(t) = feature matrix.
Closing price at time t. It measures the most recent change We typically expect to have a huge data set in order to
in price with respect to the price in n days ago. enable the algorithm to recognize the pattern in the data
On Balance Volume: This technical indicator is used to set. This analysis becomes cumbersome with high possibil-
find buying and selling trends of a stock. ity of error. Machine Learning proposes solutions for han-
 dling such huge data in an efficient manner. There are a
OBV (t-1)+V ol(t) if C(t) > C(t-1)
 range of methods classified into metric and non-metric clas-
OBV (t) = OBV (t-1)-V ol(t) if C(t) < C(t-1) sifer based on their working principle.

OBV (t-1) if C(t) = C(t-1) XGBoost is an ensemble ML technique which is based on the
concept of Decision Tree with some advanced modifications
where, OBV(t) = On Balance Volume at time t; Vol(t) = that are designed to differentiate the performance of an XG-
Trading Volume at time t; C(t) = Closing price at time t. Boost model from that of a simple Decision tree model. We
Convex Hull: The convex hull of a set of points X is its present a brief overview of decision tree from the perspective
subset which forms the smallest convex polygon that con- of XGBoost.
tains all the points in X. A polygon is said to be convex Technical Indicators are important parameters that are cal-
if a line joining any two points on the polygon also lies on culated from time series stock data that aim to forecast fi-
the polygon. The significance of Convex Hull is explained nancial market direction. They are tools which are widely
in section 6.1 used by investors to check for bearish or bullish signals.
Linear Separability: Two sets of points X0 and X1 in n These technical indicators are calculated from time series
dimensional Euclidean space are said to be linearly separa- stock market data available for predicting the direction of
ble if there exists an n dimensional normal vector W of a stock market. The technical indicators used are RSI indi-
hyperplane and a scalar k, such that every point x ∈ X0 cator,Stocastic Oscillator, William % R, Moving Av-
gives W T x > k and every point x ∈ X1 gives W T x < k erage Convergence Divergence(MACD), Price Rate
Bagging: Bootstrap aggregating, also called bagging, is a Of Change,On Balance Volume and defined in the pre-
machine learning ensemble meta-algorithm designed to im- vious Section.
prove the stability and accuracy of machine learning algo- Technical indicators that are calculated using past observa-
rithms used in statistical classification and regression. It also tions, have been used as features. Thus, the order of the
reduces variance and helps to avoid overfitting. Although it dates become irrelevant when bagging is performed. We use
is usually applied to decision tree methods, it can be used these indicators that were calculated with t-n data and reuse
with any type of method. Bagging is a special case of the them to predict the t+1 event. An extreme example can be
model averaging approach. that we use features of day 3, features of day 30 to predict
day 45. However,doing that does not ignore information
embedded in the correlation of consecutive days. The fea-
4. DATA PREPROCESSING AND FEATURE tures of ith day is calculated using OHCLV data of the past
EXTRACTION n days. For example, for the sake of simplicity, let us as-
The data set is borrowed from yahoo finance, which included sume we have a technical indicator called F. This indicator
closing price, opening price, High, Low and Volume (Ap- is calculated using the closing price of the past n days i.e,
ple,Yahoo). The time series historical stock data is ex- Close(i-1), Close(i-2)... Close(i-n).
ponentially smoothed. Exponential smoothing applies Feature extraction is a mechanism that computes numeric
more weightage to the recent observation and exponentially or symbolic information from the observation. The main
decreasing weights to past observations. The exponentially task is to select or combine the features that preserve most
smoothed statistic of a series Y can be recursively calculated of the information and remove the redundant components in
order to improve the efficiency of the subsequent classifiers and non-redundant. Once, the feature is extracted and the
without degrading their performances. It is the process of feature matrix is prepared the data is subjected to train the
acquiring higher level information. The dimensionality of algorithm which is done under the process of ensemble learn-
the feature space may be reduced by the selection of sub- ing. After the model recognizes a pattern, 20% of the data
sets of good features. Feature extraction plays an important set is used to test the robustness of the model and finally
role in a sense of improving classification performance and the prediction made is checked for accuracy, specificity, sen-
reducing the computational complexity. It also improves sitivity which takes in the last block of the flowchart.
computational speed due to the fact that for less features, Surveying various machine learning algorithms was a key
less parameters have to be estimated. motivation even though some methods and algorithms could
Feature Extraction Algorithm: According to the pub- have been easily done with. This explains the reason for
lished paper[12] on Feature Selection, also known as the fea- describing methods such as SVM or LDA even though the
ture subset selection (FSS), or attribute selection (Attribute results are not very promising as they fall under the cate-
Selection) which is a method to select a feature subset from gory of metric classifiers as mentioned in [1]. We reiterate
all the input features to make the constructed model bet- that any learning method is as good as the data and with-
ter. In the practical application of machine learning, the out a balanced data set there could not exist any reason-
quantity of features is normally very large, in which there able scrutiny of the efficiency of the methods used in the
may exist irrelevant features, or the features may have de- manuscript or elsewhere. Non-metric classifiers which in-
pendence on each other. Feature Selection can remove irrel- clude Decision Tree, Boosted Trees, would bolster the logic
evant or redundant features, and thus decrease the number behind discouraging”Black-box” approaches to Data Analyt-
of features to improve the accuracy of the model. Also se- ics in the context of this problem or otherwise.
lecting the really relevant features can simplify the model, Before feeding the training data to the XgBoost model, the
and make the data generation process easy to understand two classes of data are tested for linear separability by
for the researchers. The XgBoost algorithm is able to rank finding their convex hulls. Linear Separability is a prop-
the various features based on their importance and based on erty of two sets of data points where the two sets are said
the rank the high or low is accomplished. to be linearly separable if there exists a hyperplane such
that all the points in one set lies on one side of the hyper-
5. METHOD plane and all the points in other set lies on the other side of
the hyperplane. The separability of the training set deter-
We begin by presenting a high level work flow diagram of
mines whether the hypothesis space can solve a particular
the method employed.
binary classification problem or not. The separability test
can provide a set of hypothesis (initial solutions) which can
be refined to minimize generalization error.
Data Collection

5.1 Test for linear separability

In order to check for Linear Separability, the convex hulls for
the two classes are constructed. If the convex hulls intersect
Exponential Smoothing
each other, then the classes are said to be linearly insepara-
ble. Principle component analysis is performed to reduce the
dimensionality of the extracted features into two dimensions.
This is done so that the convex hull can be easily visualized
in 2 dimensions. The convex hull test reveals that the classes
Feature Extraction
are not linearly separable as the convex hulls almost overlap.

Ensemble Learning

Stock Market Prediction

Fig.1:Illustration for proposed methodology

The first two blocks in the flowchart are Data Collection and Fig.2: Convex Hull Test for Linear Separability
Exponential Smoothing which has been discussed in section
4. The third block describes Feature Extraction is basi-
cally the selection of features from data set and building the The observation concludes that Linear Discriminant Analy-
derived values so that the data becomes more informative sis cannot be applied to classify our data and hence, provid-
ing a stronger justification to why Xgboost is used. Another tion gain due to a split can be calculated as follows
important reason is that since each decision trees in the al-
∆I(N ) = I(N ) − PL ∗ I(PL ) − PR ∗ I(PL )
gorithm operate on the random subspace of the feature space,
it leads to the automatic selection of the most relevant subset where I(N) is the impurity measure (Gini or Shannon En-
of features. tropy) of node N, PL is the proportion of the population in
After the analysis it is found that the data is not linearly sep- node N that goes to the left child of N after the split and
arable, hence the metric methods are not preferred. There- similarly, PR is the proportion of the population in node N
fore the implementation of non-metric methods come into that goes to the right child after the split. NL and NR are
picture which is discussed in the next section. the left and right child of N respectively.
Assume there are n data points D = {(xi , yi )}n i=1 and feature
vectors {xi }ni=1 with stated outcomes. Each feature vector
5.2 Non-Metric Classifiers is d- dimensional.
A few significant and often used non-metric classifier include: 1: We define a classification tree where each node is endowed
Decision Tree with a binary decision if xi <= k or not; where k is some
Random Forest threshold. The topmost node in the classification tree con-
Xtreme Gradient Boost tains all the data points and the set of data is subdivided
Decision Tree among the children of each node as defined by the classi-
Decision trees can be used for various machine learning ap- fication . The process of subdivision continues until every
plications. Decision tree constructs a tree that is used for node below has data belonging to one class only. Each node
classification and regression. But trees that are grown really is characterized by the feature, xi and threshold k chosen
deep to learn highly irregular patterns tend to over-fit the in such a way that minimizes diversity among the children
training sets. A slight noise in the data may cause the tree nodes. This is often referred as gini impurity.
to grow in a completely different manner. This is because 2: X = (X1 , ..., Xd ) is an array of random variables de-
of the fact that decision trees have very low bias and high fined on probability space called as random vectors. The
variance. Each of the nodes in tree is split on the basis of joint distribution of X1 , ..., Xd is a measure on µ on Rd ,
training set attributes. Every other node of the tree is then µ(A) = P (X ∈ A), A ∈ Rd where d = 1, ...., m. For exam-
split into child nodes based on certain splitting criteria or ple, Let x = (xi , ...., xd ) be an array of data points. Each
decision rule, which determines the allegiance of the particu- feature xi is defined as a random variable with some distri-
lar object (data) to the feature class. The leaf nodes must be bution. Then the random vector X has joint distribution
pure nodes; when any feature vector that is to be classified identical to the data points, x.
reaches a leaf node. Splitting is done on the basis of highest 3: Let us represent hk (x) = h(x|θk ) implying decision tree
importance of the attribute which is done using Gini impu- k leading to a classifier hk (x). Thus, a random forest is a
rity or Shannon entropy. Information gain is calculated classifier based on a family of classifiers h(x|θ1 ), ...., h(x|θk ),
and the attributes are selected. One significant advantage built on a classification tree with model parameters θk ran-
of decision trees is that both categorical and numerical data domly chosen from model random vector θ. Each classifier,
can be dealt with; a disadvantage is that decision trees tend hk (x) = h(x|θk ) is a predictor of the number of training
to over-fit the training data. In order to prevent over fitting samples. y =+ − 1 is the outcome associated with input data,
of the model pruning must be done, while constructing the x for the final classification function, f (x).
tree or after the tree is constructed. Next, we describe the working of the Xgboost learner by
Gini impurity is used as the function to measure the quality exploiting the key concepts defined above.
of split in each node. Gini impurity at node N is given by
X Xtreme Gradient Boost(XGBoost)
g(N ) = P (wi )P (wj ) XGBoost is another method which comes under non-metric
i6= j
classifier family which is based on the concept of Decision
where P (wi ) is the proportion of the population with class Tree, but there are significant differences between the two.
label i. Another function which can be used to judge the XGBoost is an ensemble of decision trees wherein weighted
quality of split is Shannon Entropy. It measures the disor- combinations of predictors is taken. XGBoost works on the
der in the information content. In Decision trees, Shannon same lines of Random Forest, but there is a difference in
entropy is used to measure the unpredictability in the in- working procedures. The similarities are that the features
formation contained in a particular node of a tree (In this extracted in both the cases is completely random in nature.
context, it measures how mixed the population in a node If n is the total number of attributes in the feature matrix
is). The entropy in a node N can be calculated as follows. then lets say m is the number of attributes which are finally
chosen to determine the split at each node. Here m is related
d
X to n in the following way
H(N ) = − P (wi )log2 (P (wi )) n
i=1 m=
3
where d is number of classes considered and P (wi ) is the XGBoost basically is a collection of weak classifier decision
proportion of the population labeled as i. Entropy is the trees and it primarily focuses to train the new decision tree
highest when all the classes are contained in equal propor- to learn from the errors committed by the previous tree[s].
tion in the node. It is the lowest when there is only one class The learning trees are trained sequentially. Initially, a re-
present in a node (when the node is pure). gression function is drawn which is fitted to the data set and
The best split is characterized by the highest gain in infor- due to random plotting of regression function errors occur
mation or the highest reduction in impurity. The informa- which are referred to as residual errors. Subsequently, a
plot of all the residual errors is considered and another re- ”+1+ and ”-1” has been removed as this is used for train-
gression function is made to fit the model, the residual errors ing. ID of the data set has been removed as it adds to
occurred in that case is taken care by the combination of the the noise and is not significant at all. Gneralized Linear
previous regression function and the current regression func- Models, for instance, assume that the features are uncorre-
tion. Hence continuing in this manner the regression func- lated. Assuming otherwise may sometimes make prediction
tion gets more and more complex in nature and the root less accurate, and most of the time make interpretation of
mean squared error is observed to be significantly reduced. the model almost impossible, under the GLM setting. Fortu-
The following are the basic steps followed while executing nately, boosted trees are very robust to these features. This
XGBoost eliminates the exercise of checking strong/weak correlation
of the features.
5.3 Algorithm Feature ranking & Feature correlation: Building a fea-
The following steps are recursively carried throughout the ture importance list is important as the ranks show the de-
process. creasing importance of features and may indicate/suggest
Step 1: Learn a regression Predictor. omitting a few based on Gain, Cover and Frequency. In this
Step 2: Compute the error residual. case, the Gain values are the same for all the six features.
Step 3: Learn to predict the residual. The error rate is Therefore pruning any of the features is not advised.
calculated using the parametersmentioned below. All the 6 technical indicators used are strongly correlated to
Error in prediction is given by each other. The correlation is confirmed by performing the
Chi-Squared Test. Higher Chi- Squared values imply better
J = (y, ŷ) correlation, p-values were always greater than 0.1.
where One of the limitations of XgBoost is its limited flexibility in
X handling non-numeric data. If there are categorical or ordi-
J(.) = (y[i] − ŷ[i])2 nal data, conversion to numeric data is necessary. The data
set under investigation has only numeric data, therefore it
ŷ can be adjusted in order to reduce the error by using the
did not affect the performance of XgBoost while pruning or
following formula
splitting.
y[i] = y[i] + αf [i] As figures 3 and 4 show (Please refer the next section, Re-
sults) train-RMSE error decreases. The model learns the
where data well without exhibiting random fluctuations in the er-
f [i] ≈ ∇J(y, ŷ) ror rate. Let us now proceed to the experimental results
which will confirm the theory and expectations from the
Each learner estimates the gradient of the loss function. predictive model.
Gradient Descent is used to take sequence of steps to re-
duce sum of predictors weighted by step size α. We present
the proposed algorithm for XGBoost below.
6. RESULT
The main aim of this paper is to predict the rise and fall of
Algorithm 1 Xtreme Gradient Boosting
the stock market. Hence as a measuring parameter we have
1: procedure XtremeGradientBoost(D) . D is the used +1 for indicating the rise in stock valuation in the
labeled training data future and -1 to indicate the fall in the prices. The follow-
2: Initialize model with a constant value ing results were observed after the computation of the data
n
set by using XGBoost. We obtain the root mean squared
X
F0 (x) = arg min L(yi , γ).
γ
i=1
error(RMSE) for the 60 day prediction and 90 day pre-
3: for do m = 0 to M diction for Apple Inc. Data set as
4: Compute the pseudo-residuals
5: Fit base learner to pseudo residuals
6: Ti = new DecisionTree()
7: f eaturesi = RandomFeatureSelection(Di )
8: Ti . train(Di ,f eaturesi )
9: Compute multiplier γm
10: Update the model
11: output Fm (x)

L(y, γ) is the differentiable loss function and hm (x) is the

base learner, connected by the following relation: Fm (x) =
Fm−1 (x) + γm hm (x).

5.4 Analysis & Data Discovery

The following aspects deserve some discussion in the context
of the data and algorithm applied to analyze the data.
Data Set Analysis: The data set doesn’t contain catego-
riacal and ordinal variables. The column containing labels, Fig. 3: RMSE plot for Apple Inc. data set
Fig.4 shows the reduction in RMSE for the 28 day, 60 day 6.1 Area Under ROC Curve
and 90 day prediction respectively for Yahoo!Inc. In statistics, a receiver operating characteristic(ROC),
Data set for each iteration: or ROC curve, is a graphical plot that illustrates the per-
formance of a binary classifier system as its discrimination
threshold is varied. The curve is created by plotting the true
positive rate (TPR) against the false positive rate (FPR) at
various threshold settings. The true-positive rate is also
known as sensitivity, recall or probability of detection[1] in
machine learning. The false-positive rate is also known as
the fall-out or probability of false alarm[1] and can be calcu-
lated as (1 - specificity). The ROC curve is thus the sensitiv-
ity as a function of fall-out. In general, if the probability dis-
tributions for both detection and false alarm are known, the
ROC curve can be generated by plotting the cumulative dis-
tribution function (area under the probability distribution
from -∞ to the discrimination threshold) of the detection
probability in the y-axis versus the cumulative distribution
function of the false-alarm probability in x-axis.
There are four possible outcomes from a binary classifier. If
Fig.4: RMSE plot for Yahoo! Inc. data set the outcome from a prediction is p and the actual value is
also p, then it is called a true positive (TP); however if the
actual value is n then it is said to be a false positive (FP).
It is clear from these graphs that there is a decreas-
Conversely, a true negative (TN) has occurred when both
ing trend of RMSE value as the number of iterations
the prediction outcome and the actual value are n, and false
increases.
negative (FN) is when the prediction outcome is n while the
The parameters that are used to evaluate the robustness of
actual value is p. To draw a ROC curve, only the true pos-
a binary classifier are accuracy, precision and recall (also
itive rate (TPR) and false positive rate (FPR) are needed
known as sensitivity and specificity). These parameters are
(as functions of some classifier parameter). The TPR defines
calculated by Confusion Matrix. The formula to calculate
how many correct positive results occur among all positive
these parameters are given below:
samples available during the test. FPR, on the other hand,
tp + tn defines how many incorrect positive results occur among all
accuracy =
tp + tn + f p + f n negative samples available during the test.
A ROC space is defined by FPR and TPR as x and y axes
tp
precision = respectively, which depicts relative trade-offs between true
tp + f p positive (benefits) and false positive (costs). Since TPR is
tp equivalent to sensitivity and FPR is equal to 1 - specificity,
recall = the ROC graph is sometimes called the sensitivity vs (1 -
tp + f n
specificity) plot. Each prediction result or instance of a con-
tn fusion matrix represents one point in the ROC [Link]
specif icity =
tn + f p diagonal divides the ROC space. Points above the diagonal
where, represent good classification results (better than random),
tp = Number of true positive values points below the line represent poor results (worse than ran-
tn = Number of true negative values dom).
fp = Number of false positive values We can infer that our model performed significantly well as
fn = Number of false negative values; the accuracy results compared to non-ensemble techniques as well as other en-
for the two data sets has been tabulated below: semble techniques for a short term window of 28 days as
well as long term window of 60 and 90 days respectively.
Results For the Apple data set used, the time window taken was
Days accuracy precision recall specificity 60 and 90 days for predicting the results which gave bet-
60 0.879918 0.773997 0.856182 0.890330 ter results compared to non-ensemble techniques and other
90 0.897095 0.756569 0.888198 0.901730 ensemble techniques. The above four curves showcase the
performance measure of the algorithm. The diagonal line in
Table 1: Accuracy results for 60 & 90 days for Apple Inc. the graph is the threshold line which basically shows that if
the performance curve is above the diagonal line or in other
words higher is the curve above the diagonal line more is the
Results predicted accuracy and vice versa.
Days accuracy precision recall specificity
28 0.9995 1.0 0.99916 1.0 7. DISCUSSION AND CONCLUSION
60 0.99918 0.99915 0.99915 0.99920 The algorithm used here has to be checked for its robustness.
90 0.99917 1.0 0.9982 1.0 this is accomplished by checking the accuracy in predicting
the result or it can be achieved by analyzing the receiver
Table 2: Accuracy Results for 28, 60 & 90 days for Yahoo! operating characteristics or the ROC curve. It is evident
Inc. that Xgboost has outperformed the metric as well as other
non-metric classifiers in accuracy. From Fig.9 and 10 it is stock forecasting. Many algorithms such as SVM, ANN etc.
clear our model gave higher accuracy. have been studied for robustness in predicting stock market.
According to the area under ROC curve obtained for 60 However, ensemble learning methods have remained unex-
and 90 day prediction from Fig.5 and Fig.6, results are ploited in this field. In this paper, we have used XGBoost
clearly better than the previously used machine learning classifier to build our predictive model and our model has
methods(AUC for 60 day is 0.8435109 and for 90 day pre- produced really impressive results. The model is found to
diction XGboost gives AUC of 0.7127071 ). Fig 7and Fig 8 be robust in predicting future direction of stock movement.
(AUC for 28 day is 0.94 and for 60 day prediction XGboost The robustness of our model has been evaluated by calcu-
gives an AUC of 1.0 ) show the same evidence. Also the 90 lating various parameters such as accuracy, precision, recall
day prediction on Yahoo! data set gave AUC of 1.0. Lo- and specificity. For all the datasets we have used i.e, Apple
gistic Regression poorly performs as compared to SVM and and Yahoo, we were able to achieve accuracy in the range 87-
Xgboost. Also from the bar graph, it is evident that SVM 99% for long term prediction. ROC curves were also plotted
performs well with an accuracy of nearly 80% but Xgboost to evaluate our model. The curves demonstrate the fidelity
however outrules SVM in terms of Accuracy(≈88%). of our model graphically.
In the published paper[10] on ANN(Artificial Neural Net- Our model can be used for devising new strategies for trad-
work) which predicted for ith day, results showed an accu- ing or to perform stock portfolio management, changing
racy of 81% which is less than the accuaracy obtained from stocks according to trends prediction. In future, we could
our given model. Likewise, in [11], Random Forest (RF) build boosted tree models to predict trends for really short
model could predict stock market movement direction with time window in terms of hours or minutes. Ensembles of
an accuracy of 78.81% which is still lesser than Xgboost’s different machine learning algorithms can also be checked
predicted accuracy. for its robustness in stock prediction. We also recommend
The remarkably high accuracy of prediction in the case of exploration of the application of Deep Learning practices in
Yahoo! Inc. data set could be a matter of concern. Natural Stock Forecasting involving learning weight coefficients on
suspicion about inherent bias in training data could arise. large, directed and layered graphs.
However, we have checked the data set and confirm the non-
existence of heavy bias of data set. The proportion of positive
and negative data are in range of 45:55.
The literature survey helps us conclude that Ensemble learn-
ing algorithms have remained unexploited in the problem of
stock market prediction. We have used an ensemble learning
method known as XGBoost to build our predictive model.
The comparative analysis testifies the efficacy of our model
as it outperforms the models discussed in the literature sur-
vey. We believe that this is due to the lack of proper data
processing in [07][15][16]. In this paper, we have performed
exponential smoothing which is a rule of thumb technique
for smoothing time series data. Exponential smoothing re-
moves random variation in the data and makes the learning
process easier. To our surprise, very few papers found in the
literature survey exploited the technique of smoothing. An-
other important reason could be the inherent non linearity
in data. This fact discourages the use of linear classifiers.
However in [15], the authors have used linear classifier al- Fig.5: ROC Curve for 60 days(Apple Inc.) AUC is
gorithm as the supervised learning algorithm which yielded 0.8435
a highest accuracy of 55.65%. We believe that the use of
SVM in [7] and [13] is ill-advised. Due to that fact that
the two classes in consideration (rise or fall) are linearly in-
separable, researchers are compelled to use SVM with non
linear kernels such as Gaussian kernel or Radial Basis Func-
tion. Despite many advantages of SVMs, from a practical
point of view, they have some [Link] these
arguments we can also conclude why the prediction by these
classifiers were limited to a maximum of 44 day time win-
dow which qualifies our learning model to surpass all these
metric classifiers in terms of long term prediction. An im-
portant practical question that is not entirely solved, is the
selection of the kernel function parameters - for Gaussian
kernels the width parameter σ - and the value of in the
loss insensitive function (Horvath (2003) in Suykens et
al.).
Predicting stock market due to its non linear, dynamic and
complex nature is really difficult. However in the recent
years, machine learning techniques have proved effective in Fig.6: ROC curve for 90 days(Apple Inc). AUC is 0.7127
Fig.10: Result for Long term Prediction on Yahoo! Inc. :
Fig.7: ROC curve for short term prediction of 28
XGBoost beats all other predictive algorithms reported in
days(Yahoo! Inc.). AUC is 0.94
literature by quite a margin.

The proposed model indicates, to the best of our knowledge,

the nonlinear nature of the problem and the futility of using
linear discriminant type machine learning algorithms. The
accuracy reported is not pure chance but is based solidly on
the understanding that the problem is not linearly separable
and hence the entire suite of SVM type classifiers or related
machine learning algorithms should not work very well. The
solution approach adopted is a paradigm shift in this class
of problems and minor modifications may work very well for
slight variations in the problem statement.

8. REFERENCES
[1]Das Shom Prasad,Padhy Sudersan, Support Vector Ma-
chines for Prediction of Future Prices in indian Stock Mar-
[Link] Journal of Computer Applications(0975-
8887),March 2012.

Fig.8: ROC curve for 60 days(Yahoo! Inc.). AUC is 1.0 [2]Chauhan Bhagwant,Bidave Umesh, Gangathade Ajit, Kale
Sachin Stock Market Prediction Using Artificial Neural Net-
[Link] Journal of Computer Science and Infor-
mation Technology,Vol 5(1),2014,904-907

[3]Kar Abhishek Stock Market Prediction using Artificial

Neural NetworkS.Y8021

[4]S. R. Y. Mayankkumar B Patel, Stock prediction using

artificial neural network, International Journal of Innovative
Research in Science, Engineering, and Technology, 2014.

[5]Mingyue iu and u Song .Predicting the Direction of Stock-

Market Index Movement Using an Optimized Artificial Neu-
ral Network Model .Published online 2016 May 19. doi:
10.1371/ [Link].0155133

[6]Elisa Siqueira, Thiago Otuki,Newton da Costa Jr Stock

Return and Fundamental Variables:A Discriminant Analysis
[Link] Mathematical Sciences, Vol. 6, 2012, no.
115, 5719 - 5733.
Fig.9: Result for Long term Prediction on Apple Inc. :
XGBoost beats all other predictive algorithms reported in [7]Yuqing Dai, Yuning Zhang (2013). Machine Learning in
literature. Stock Price Trend Forecasting. Stanford University.
[8]Hellstrom, T., Holmstromm, K. (1998). Predictable Pat-
terns in Stock Returns. Technical Report Series IMa-TOM-
1997-09 .

[9]R. Gencay, Linear, non-linear and essential foreign ex-

change rate prediction with simple technical trading rules,
Journal of International Economics, vol. 47,no., pp. 91-
107,19.

[10]A. Timmermann and C. W Granger, Efficient market

hypothesis and forecasting, International Journal of Fore-
casting, vol. 20,no., pp. 15- 27,2004.

[11] Sadegh Bafandeh Imandoust and Mohammad Bolan-

draftar. Forecasting the direction of stock market index
movement using three data mining techniques: the case of
Tehran Stock ExchangeS. Bafandeh Imandoust Int. Journal
of Engineering Research and Applications ISSN : 2248-9622,
Vol. 4, Issue 6( Version 2), June 2014, pp.106-117

[12]Yuqinq He, Kamaladdin Fataliyev, and Lipo Wang, Fea-

ture Selection for Stock Market Analysis.

[13]Phichhang Ou,Hengshan Wang. Prediction of Stock Mar-

ket Index Movement by Ten Data Mining Techniques.

[14]Khan,W., Ghazanfar,M.A., Asam,M., Iqbal,A., Ahmed,S.,

Javed Ali Khan. Predicting Trend In Stock Market Ex-
change Using Machine Learning Classifiers. [Link](Lahore),
28(2), 1363-1367, 2016

[15]Haoming Li, Zhijun Yang and Tianlun Li (2014). Algo-

rithmic Trading Strategy Based On Massive Data Mining.
Stanford University.

[16]Xinjie (2014). Stock Trend Prediction With Technical

Indicators using SVM. Stanford University.

View publication stats

Stock Market Data Analysis and Future ST
No ratings yet
Stock Market Data Analysis and Future ST
8 pages
Stock Price Prediction Using LSTM RNN and CNN-slid
No ratings yet
Stock Price Prediction Using LSTM RNN and CNN-slid
6 pages
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
No ratings yet
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
4 pages
Stock Market Prediction Using Big Data: February 2023
No ratings yet
Stock Market Prediction Using Big Data: February 2023
9 pages
SSRN 3358252
No ratings yet
SSRN 3358252
7 pages
Stock Paper Published
No ratings yet
Stock Paper Published
8 pages
SEMP-TA A Novel Stock Market Prediction Approach Based On Stacking Ensemble Machine Learning For Effective Trend Analysis
No ratings yet
SEMP-TA A Novel Stock Market Prediction Approach Based On Stacking Ensemble Machine Learning For Effective Trend Analysis
12 pages
The Implementation of Different Forecasting Techniques For Demand
No ratings yet
The Implementation of Different Forecasting Techniques For Demand
6 pages
Stock Market Prediction Using Artificial Neural Network: Nazish Nazir, Mudasirahma Dmutto
No ratings yet
Stock Market Prediction Using Artificial Neural Network: Nazish Nazir, Mudasirahma Dmutto
4 pages
Article: Uma Gurav
No ratings yet
Article: Uma Gurav
8 pages
IJRAR23D32411
No ratings yet
IJRAR23D32411
8 pages
Unveiling Future Trends For Predicting Online Smart Market Stock Prices Using Ensemble Neural Network
No ratings yet
Unveiling Future Trends For Predicting Online Smart Market Stock Prices Using Ensemble Neural Network
11 pages
A Refined Methodological Approach Long-Term Stock
No ratings yet
A Refined Methodological Approach Long-Term Stock
17 pages
Stock
No ratings yet
Stock
12 pages
Masters Dissertation Sidra Mehtab
No ratings yet
Masters Dissertation Sidra Mehtab
129 pages
TSP CMC 44330
No ratings yet
TSP CMC 44330
19 pages
Shah Etal 2018 - IEEE 2018
No ratings yet
Shah Etal 2018 - IEEE 2018
9 pages
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
No ratings yet
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
8 pages
Ijarcce 2019 8107
No ratings yet
Ijarcce 2019 8107
5 pages
A Prognosis Approach For Stock Market
No ratings yet
A Prognosis Approach For Stock Market
8 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
7 pages
Final Is2184
No ratings yet
Final Is2184
13 pages
Beukel Ba Bms
No ratings yet
Beukel Ba Bms
9 pages
Kenerl Based SVM Classification For Financial News
No ratings yet
Kenerl Based SVM Classification For Financial News
6 pages
Stock Price Prediction with ML Techniques
No ratings yet
Stock Price Prediction with ML Techniques
5 pages
IGRF Research
No ratings yet
IGRF Research
11 pages
Stock Prediction with AI Techniques
No ratings yet
Stock Prediction with AI Techniques
15 pages
Stock Market Prediction with ML Techniques
No ratings yet
Stock Market Prediction with ML Techniques
13 pages
Financial Time Series Forecasting Applying Deep Learning Algorithms
No ratings yet
Financial Time Series Forecasting Applying Deep Learning Algorithms
16 pages
Research Stock
No ratings yet
Research Stock
6 pages
4 Ijaema December 4812
No ratings yet
4 Ijaema December 4812
7 pages
Stock Market Prediction via ML
No ratings yet
Stock Market Prediction via ML
4 pages
Stock Market Analysis Using Classification Algorithm PDF
No ratings yet
Stock Market Analysis Using Classification Algorithm PDF
6 pages
Forward Forecast of Stock Price Using Sliding-Window Metaheuristic-Optimized Machine-Learning Regression
No ratings yet
Forward Forecast of Stock Price Using Sliding-Window Metaheuristic-Optimized Machine-Learning Regression
11 pages
Zhao Et Al 2023
No ratings yet
Zhao Et Al 2023
9 pages
Journal Pone 0284695
No ratings yet
Journal Pone 0284695
19 pages
Journal of Internet Banking and Commerce
No ratings yet
Journal of Internet Banking and Commerce
22 pages
2023 Predictionofwaterlevel ISTC
No ratings yet
2023 Predictionofwaterlevel ISTC
12 pages
Machine Learning Using Python Project Report: Stock Price Prediction Using ML
No ratings yet
Machine Learning Using Python Project Report: Stock Price Prediction Using ML
21 pages
Deep Reinforcement Learning For Stock Prediction
No ratings yet
Deep Reinforcement Learning For Stock Prediction
6 pages
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
No ratings yet
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
27 pages
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
4 pages
使用情感分析預測股價
No ratings yet
使用情感分析預測股價
30 pages
Stock Price Prediction via Sentiment Analysis
No ratings yet
Stock Price Prediction via Sentiment Analysis
17 pages
Sanjay Ghodawat Group of Institutions: Synopsis
No ratings yet
Sanjay Ghodawat Group of Institutions: Synopsis
7 pages
Tyagi Et Al. 2021
No ratings yet
Tyagi Et Al. 2021
19 pages
Big Data in Stock Market Trading
100% (1)
Big Data in Stock Market Trading
4 pages
IJRPR3112
No ratings yet
IJRPR3112
8 pages
REPORT - STOCK PRICE PREDICTION - New
No ratings yet
REPORT - STOCK PRICE PREDICTION - New
40 pages
WCE2008 pp1171-1175
No ratings yet
WCE2008 pp1171-1175
5 pages
Stock Price Forecasting with News Sentiment
No ratings yet
Stock Price Forecasting with News Sentiment
4 pages
Social Media Impact on Stock Prediction
No ratings yet
Social Media Impact on Stock Prediction
15 pages
Vazirani 2020
No ratings yet
Vazirani 2020
5 pages
Research Stock
No ratings yet
Research Stock
6 pages
Sentiment Score Prediction Nifty
No ratings yet
Sentiment Score Prediction Nifty
12 pages
Predicting The Direction of Stock Market Prices Using Tree Based
No ratings yet
Predicting The Direction of Stock Market Prices Using Tree Based
45 pages
Stock Prediction Using Machine Learning Algorithm
No ratings yet
Stock Prediction Using Machine Learning Algorithm
5 pages
Analysis of Stock Market Predictor Variables Using Linear Regression
No ratings yet
Analysis of Stock Market Predictor Variables Using Linear Regression
11 pages
Stock Market Analysis: July 2021
No ratings yet
Stock Market Analysis: July 2021
12 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
Updated AI-Driven CIBIL Score Analysis and Prediction System-3
No ratings yet
Updated AI-Driven CIBIL Score Analysis and Prediction System-3
7 pages
Integration of Natural Language Processing Methods and Machine Learning Model For Malicious Webpage Detection Based On Web Contents
No ratings yet
Integration of Natural Language Processing Methods and Machine Learning Model For Malicious Webpage Detection Based On Web Contents
11 pages
Computational Learning Theory Guide
No ratings yet
Computational Learning Theory Guide
24 pages
Breaking Bad: De-Anonymising Entity Types On The Bitcoin Blockchain Using Supervised Machine Learning
No ratings yet
Breaking Bad: De-Anonymising Entity Types On The Bitcoin Blockchain Using Supervised Machine Learning
10 pages
Tabular Data - Deep Learning Is Not All You Need
No ratings yet
Tabular Data - Deep Learning Is Not All You Need
13 pages
Final PPT Gruop 143k
No ratings yet
Final PPT Gruop 143k
26 pages
Cross Domain Sentiment Analysis
No ratings yet
Cross Domain Sentiment Analysis
17 pages
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
No ratings yet
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
37 pages
Paper 6
No ratings yet
Paper 6
15 pages
Executive PG in ML & AI by IIITB
No ratings yet
Executive PG in ML & AI by IIITB
26 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
2 pages
ML Intrusion Detection for IoT Security
No ratings yet
ML Intrusion Detection for IoT Security
24 pages
Health Science Reports - 2025 - Chowdhury - Comparison of Deep Learning and Gradient Boosting ANN Versus XGBoost For
No ratings yet
Health Science Reports - 2025 - Chowdhury - Comparison of Deep Learning and Gradient Boosting ANN Versus XGBoost For
10 pages
Plagiarism
No ratings yet
Plagiarism
24 pages
AI-Driven Cybersecurity for Smart Industries
No ratings yet
AI-Driven Cybersecurity for Smart Industries
12 pages
UPI (Report)
100% (1)
UPI (Report)
30 pages
Infant Mortality in Brazil A Survival Analysis Using Machine Learning Models7
No ratings yet
Infant Mortality in Brazil A Survival Analysis Using Machine Learning Models7
47 pages
23 Mar 23 Kotak Daily
No ratings yet
23 Mar 23 Kotak Daily
64 pages
Module 3,4,5 Assignment
No ratings yet
Module 3,4,5 Assignment
2 pages
AReviewon Weather Forecastingusing Machine Learningand Deep Learning Techniques
100% (1)
AReviewon Weather Forecastingusing Machine Learningand Deep Learning Techniques
6 pages
Python Core Interview Questions
No ratings yet
Python Core Interview Questions
43 pages
Explainable Machine Learning On New Zealand Strong Motion For PGV
No ratings yet
Explainable Machine Learning On New Zealand Strong Motion For PGV
9 pages
AI Crop Recommendation System
No ratings yet
AI Crop Recommendation System
3 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Scaler DSML GitHub Search
No ratings yet
Scaler DSML GitHub Search
7 pages
PublishedPaperNo.8 2022
100% (1)
PublishedPaperNo.8 2022
14 pages
Supervised Learning with ID3 in TANAGRA
No ratings yet
Supervised Learning with ID3 in TANAGRA
7 pages
Concrete Property Prediction via ML
No ratings yet
Concrete Property Prediction via ML
12 pages
Boosting
No ratings yet
Boosting
6 pages