SYSS Forecasting Classification
SYSS Forecasting Classification
net/publication/309492895
CITATIONS READS
51 20,986
4 authors, including:
Suryoday Basak
Pennsylvania State University
47 PUBLICATIONS 811 CITATIONS
SEE PROFILE
All content following this page was uploaded by Snehanshu Saha on 28 October 2016.
∗ †
Shubharthi Dey Yash Kumar
PESIT South Campus PESIT South Campus
Bangalore Bangalore
Karnataka, India Karnataka, India
deysubharthi15@[Link] yash.kumar1396@[Link]
‡ §
Snehanshu Saha Suryoday Basak
PESIT South Campus PESIT South Campus
Bangalore Bangalore
Karnataka, India Karnataka, India
snehanshusaha@[Link] suryodaybasak@[Link]
ABSTRACT 1. INTRODUCTION
Stock market prediction is the art of determining the fu- The stock market has always been the area of interest for
ture value of a company stock or other financial instrument people around the world. Perception and experience are
traded on an exchange. It had been a real challenge for ana- put to good use by non-practitioners to invest in a particu-
lysts and traders to predict the trends of stock market due to lar company hoping for multiplied return. The stock market
its uncertain nature. Stock prices are likely to be influenced trend prediction had also been the area of keen interest of
by the factors like product demand, sale, manufacture, in- statisticians and computer scientists simply for the reason
vestor’s sentiments, ruling government, recession etc. The that the area throws complex modeling questions. There
successful prediction of a stock’s future price could yield sig- exist methods or algorithms which could predict the stock
nificant profit. The main aim of this paper is to design an ef- valuation with a fair degree of accuracy. However, a question
ficient model which will accurately predict the trend of stock still persists: if a person decides to buy shares from a partic-
market using eXtreme Gradient Boosting(XGBoost) which ular company, what is the probability that it turns out to be
has proved to be an efficient algorithm with over 87% of ac- a successful expedition or mere failure? An informed guess
curacy for 60 day and 90 day periods and it has proved to works on a broader basis i.e. considering the production sale
be much better when compared to traditional non-ensemble and demand of the organization in the present scenario, it
learning techniques. The prediction problem has been recon- may be fine to invest in aparticular stock. However, it is
structed as a classification problem and XGBoost turned out too much to expect this to work in complex situations ig-
to be significantly better than the algorithms found in litera- noring certain nunaced concepts and factors that govern the
ture. The proposed model outperforms all existing forecast- market. For example, the political situation in a country
ing models in literature and is able to forecast on long-term is inefficient and too volatile to handle the economy of the
basis, an added feature absent in literature. country. A fall in the economy triggers fall in the stock value
of a company. Due to these minute and chaotic parameters,
the prediction becomes increasingly difficult. The traders
Keywords tend to invest in a firm which has potential or history of
XGBoost, ensemble learning, Exponential smoothing good return based on the current situation. However, there
always exists a possibility that a company which appears to
∗First Author and Second author share equal credit. have incurred a loss may be the potential firm which the
†First Author and Second author share equal credit. traders/investors may have continued faith in.
‡Dr. Saha is the corresponding author. Over the past years, there had been enormous research in
this field pivoted around statistical machine learning. Dif-
§Suryoday is with the Center of Applied Mathematical Mod-
ferent predictive models and algorithms to a certain degree
eling. of accuracy, have been proposed and tested. Implementa-
tion of machine learning techniques is an evolving concept.
This is somewhat different from traditional forecasting and
diffusion methods. Early models used in stock forecast-
ing involved statistical methods such as time series model
and multivariate analysis (Gencay (1999), Timmermann and
Granger(2004), Bao and Yang(2008)). Our paper mainly fo-
cuses on the machine learning approach of analysis as it is
evident that the historical data set obtained (say from the
date when the company existed) is impossible to analyze
without data mining methods. The prediction as a result of 3 contains key definitions. Section 4 describes Data prepro-
our proposed algorithm may help people to decide whether cessing and feature extraction in detail. Section 5 describes
to invest in a particular company taking into consideration the methods and algorithm used. The results of the applied
the chaos and volatility of stock. We adopted a machine algorithm are obtained and analyzed in section 6. The next
learning approach different from the commonly practiced section is a comparative study establishing the superiority
ones such as Support Vector Machine, Neural Network and of our proposed algorithm. The authors conclude with a
Naive Bayesian Classifier, Linear Discriminant Analysis etc. comprehensive summary of the work.
Next section discusses related literature.
3. KEY DEFINITIONS
2. LITERATURE SURVEY Time Series Data: A time series is a series of data points
Recent years witnessed considerable traction in the field of indexed (or listed or graphed) in time order. Most com-
Stock market prediction specially from the Machine Learn- monly, a time series is a sequence taken at successive equally
ing point of view. Prediction of stock market behavior is spaced points in time. Thus it is a sequence of discrete-time
used to determine the future trends. Prediction is usually data.
accomplished by analyzing historic time series data. Note: We used the Apple,Yahoo data set. This is a time
Several algorithms have been used in stock prediction such series data set which is further smoothed exponentially as
as SVM, Neural Network, Linear Discriminant Analysis, Lo- discussed in section 4.
gistic Regression, Linear Regression, KNN and Naive Bayesian Gradient Boosting: Gradient boosting is a machine learn-
Classifier. It was found that logistic regression was one of ing technique for regression and classification problems, which
the best with a success rate of 55.65%. Dai and Zhang produces a prediction model in the form of an ensemble of
(2013) used the training data from 3M Stock data. Data weak prediction models, typically decision trees. It builds
contains daily stock information ranging from 1/9/2008 to the model in a stage-wise fashion like other boosting meth-
11/8/2013 (1471 data points). Multiple algorithms were ods do, and it generalizes them by allowing optimization of
chosen to train the prediction system. These algorithms an arbitrary differentiable loss function.
include Logistic Regression, Quadratic Discriminant Analy- Technical Indicators: Technical Indicators are important
sis and SVM. The algorithms were applied to the next day parameters that are calculated from time series stock data
model which predicted the outcome of the stock price on the that aim to forecast financial market direction. They are
next day and long term model, which predicted the outcome tools which are widely used by investors to check for bearish
of the stock price for the next n days. The next day predic- or bullish signals.
tion model produced accuracy results ranging from 44.52% Relative strength Index: Relative strength index(RSI)
to 58.2%. SVM reported the highest accuracy of 79.3%, For is calculated as
the long term prediction where time window taken was [Link] 100
one of the published paper [11] on ANN(artificial neural net- RSI = 100 −
(1 + RS)
work),authors used for the forecast the direction of Japanese
stock market gave an accuracy of 81.27%. In the published average gain over past 14 days
paper on [10] Random Forest(RF), an ensemble technique is RS =
average loss over past 14 days
used to predict the stock market prices( Trend up or Down)
and returned the highest accuracy of 78.81% for the ith day The RSI is classified as a momentum oscillator, measuring
on the direction of movement in the daily TSE(Tehran Stock the velocity and magnitude of directional price movements.
Exchange) index. Momentum is the rate of the rise or fall in price. The
Ensemble learning algorithms remains largely unexplored RSI computes momentum as the ratio of higher closes to
however. The focus of the is to implement the ensemble lower closes: stocks which have had more or stronger posi-
learning technique and to discuss its advantages over non- tive changes have a higher RSI than stocks which have had
ensemble techniques. We will be using an ensemble learning more or stronger negative changes.
method known as Xgboost to build our predictive model. The RSI is most typically used on a 14-day timeframe, mea-
Our model has been trained for 60, 90 and 120 days respec- sured on a scale from 0 to 100, with high and low levels
tively and the results were impressive. Moreover, majority marked at 70 and 30, respectively. Shorter or longer time-
of the the related work focused on the time window of 10 to frames are used for alternately shorter or longer outlooks.
44 days on an average as majority of the authors had pre- More extreme high and low levels-80 and 20, or 90 and 10 -
ferred to use metric classifiers on time series data which is occur less frequently but indicate stronger momentum.
not smoothed. Therefore, those models are unable to learn Stochastic Oscillator: Stochastic oscillator is given by
from the data set when it comes to predicting for a long term C − L14
window. Our model underscores that it firstly smooths the %k = 100 ∗
H14 − L14
data to begin with being a non-metric classifier, it is capable
of accurately predicting in a long term time window. where, C=current closing price; L14 = Lowest low over past
The paper will highlight certain critical aspects largely ig- 14 days; H14 = Highest high over past 14 days
nored by most of the literature. These include analyzing The term stochastic refers to the point of a current price in
the non-linearity in features used for analysis and futility relation to its price range over a period of time. This method
of employing linear classifiers, Long-run predictions running attempts to predict price turning points by comparing the
into 90 days where related manuscripts considered the time closing price of a security to its price range.
window up to 44 days. Significant improvement in accuracy William % R: William’s %R is give by
obtained by our approach embellish the claims. H14 − C
The remainder of the paper is organized as follows. Section %R = −100 ∗
H14 − L14
where, C = Current Closing Price; L14 = Lowest Low over as:
the past 14 days; H14 = Highest High over the past 14 days.
Williams %R ranges from -100 to 0. When its value is above S0 = Y0 ; St = α ∗ Yt + (1 − α) ∗ St−1
-20, it indicates a sell signal and when its value is below -80, where α is the smoothing factor, 0 < α < 1. Larger val-
it indicates a buy signal. ues of α reduce the level of smoothing. When α = 1, the
Moving Average Convergence Divergence(MACD): smoothed statistic becomes equal to the actual observation.
The formula for calculating MACD is: The smoothed statistic, St can be calculated as soon as two
observations are available. This smoothing removes random
M ACD = EM A12 (C)-EM A26 (C)
variation or noise from the historical data allowing the model
to easily identify long term price trend in the stock price be-
SignalLine = EM A9 (M ACD) havior. Technical indicators are then calculated from the
where, MACD = Moving Average Convergence Divergence; exponentially smoothed time series data which are later or-
C = Closing Price series; EM An = n day Exponential Mov- ganized into feature matrix. The target to be predicted in
ing Average. the ith day is calculated as follows:
When the MACD goes below the SignalLine, it indicates a targeti = Sign(closei+d −closei )
sell signal. When it goes above the SignalLine, it indicates
a buy signal. where d is the number of days after which the prediction is
Price Rate of Change: It is calculated as follows: to be made. When the value of targeti is +1, it indicates
that there is a positive shift in the price after d days and
C(t)-C(t-n)
P ROC(t) = -1 indicates that there is a negative shift after d days. The
C(t-n) targeti values are assigned as labels to the ith row in the
where, PROC(t) = Price Rate of Change at time t; C(t) = feature matrix.
Closing price at time t. It measures the most recent change We typically expect to have a huge data set in order to
in price with respect to the price in n days ago. enable the algorithm to recognize the pattern in the data
On Balance Volume: This technical indicator is used to set. This analysis becomes cumbersome with high possibil-
find buying and selling trends of a stock. ity of error. Machine Learning proposes solutions for han-
dling such huge data in an efficient manner. There are a
OBV (t-1)+V ol(t) if C(t) > C(t-1)
range of methods classified into metric and non-metric clas-
OBV (t) = OBV (t-1)-V ol(t) if C(t) < C(t-1) sifer based on their working principle.
OBV (t-1) if C(t) = C(t-1) XGBoost is an ensemble ML technique which is based on the
concept of Decision Tree with some advanced modifications
where, OBV(t) = On Balance Volume at time t; Vol(t) = that are designed to differentiate the performance of an XG-
Trading Volume at time t; C(t) = Closing price at time t. Boost model from that of a simple Decision tree model. We
Convex Hull: The convex hull of a set of points X is its present a brief overview of decision tree from the perspective
subset which forms the smallest convex polygon that con- of XGBoost.
tains all the points in X. A polygon is said to be convex Technical Indicators are important parameters that are cal-
if a line joining any two points on the polygon also lies on culated from time series stock data that aim to forecast fi-
the polygon. The significance of Convex Hull is explained nancial market direction. They are tools which are widely
in section 6.1 used by investors to check for bearish or bullish signals.
Linear Separability: Two sets of points X0 and X1 in n These technical indicators are calculated from time series
dimensional Euclidean space are said to be linearly separa- stock market data available for predicting the direction of
ble if there exists an n dimensional normal vector W of a stock market. The technical indicators used are RSI indi-
hyperplane and a scalar k, such that every point x ∈ X0 cator,Stocastic Oscillator, William % R, Moving Av-
gives W T x > k and every point x ∈ X1 gives W T x < k erage Convergence Divergence(MACD), Price Rate
Bagging: Bootstrap aggregating, also called bagging, is a Of Change,On Balance Volume and defined in the pre-
machine learning ensemble meta-algorithm designed to im- vious Section.
prove the stability and accuracy of machine learning algo- Technical indicators that are calculated using past observa-
rithms used in statistical classification and regression. It also tions, have been used as features. Thus, the order of the
reduces variance and helps to avoid overfitting. Although it dates become irrelevant when bagging is performed. We use
is usually applied to decision tree methods, it can be used these indicators that were calculated with t-n data and reuse
with any type of method. Bagging is a special case of the them to predict the t+1 event. An extreme example can be
model averaging approach. that we use features of day 3, features of day 30 to predict
day 45. However,doing that does not ignore information
embedded in the correlation of consecutive days. The fea-
4. DATA PREPROCESSING AND FEATURE tures of ith day is calculated using OHCLV data of the past
EXTRACTION n days. For example, for the sake of simplicity, let us as-
The data set is borrowed from yahoo finance, which included sume we have a technical indicator called F. This indicator
closing price, opening price, High, Low and Volume (Ap- is calculated using the closing price of the past n days i.e,
ple,Yahoo). The time series historical stock data is ex- Close(i-1), Close(i-2)... Close(i-n).
ponentially smoothed. Exponential smoothing applies Feature extraction is a mechanism that computes numeric
more weightage to the recent observation and exponentially or symbolic information from the observation. The main
decreasing weights to past observations. The exponentially task is to select or combine the features that preserve most
smoothed statistic of a series Y can be recursively calculated of the information and remove the redundant components in
order to improve the efficiency of the subsequent classifiers and non-redundant. Once, the feature is extracted and the
without degrading their performances. It is the process of feature matrix is prepared the data is subjected to train the
acquiring higher level information. The dimensionality of algorithm which is done under the process of ensemble learn-
the feature space may be reduced by the selection of sub- ing. After the model recognizes a pattern, 20% of the data
sets of good features. Feature extraction plays an important set is used to test the robustness of the model and finally
role in a sense of improving classification performance and the prediction made is checked for accuracy, specificity, sen-
reducing the computational complexity. It also improves sitivity which takes in the last block of the flowchart.
computational speed due to the fact that for less features, Surveying various machine learning algorithms was a key
less parameters have to be estimated. motivation even though some methods and algorithms could
Feature Extraction Algorithm: According to the pub- have been easily done with. This explains the reason for
lished paper[12] on Feature Selection, also known as the fea- describing methods such as SVM or LDA even though the
ture subset selection (FSS), or attribute selection (Attribute results are not very promising as they fall under the cate-
Selection) which is a method to select a feature subset from gory of metric classifiers as mentioned in [1]. We reiterate
all the input features to make the constructed model bet- that any learning method is as good as the data and with-
ter. In the practical application of machine learning, the out a balanced data set there could not exist any reason-
quantity of features is normally very large, in which there able scrutiny of the efficiency of the methods used in the
may exist irrelevant features, or the features may have de- manuscript or elsewhere. Non-metric classifiers which in-
pendence on each other. Feature Selection can remove irrel- clude Decision Tree, Boosted Trees, would bolster the logic
evant or redundant features, and thus decrease the number behind discouraging”Black-box” approaches to Data Analyt-
of features to improve the accuracy of the model. Also se- ics in the context of this problem or otherwise.
lecting the really relevant features can simplify the model, Before feeding the training data to the XgBoost model, the
and make the data generation process easy to understand two classes of data are tested for linear separability by
for the researchers. The XgBoost algorithm is able to rank finding their convex hulls. Linear Separability is a prop-
the various features based on their importance and based on erty of two sets of data points where the two sets are said
the rank the high or low is accomplished. to be linearly separable if there exists a hyperplane such
that all the points in one set lies on one side of the hyper-
5. METHOD plane and all the points in other set lies on the other side of
the hyperplane. The separability of the training set deter-
We begin by presenting a high level work flow diagram of
mines whether the hypothesis space can solve a particular
the method employed.
binary classification problem or not. The separability test
can provide a set of hypothesis (initial solutions) which can
be refined to minimize generalization error.
Data Collection
Ensemble Learning
The first two blocks in the flowchart are Data Collection and Fig.2: Convex Hull Test for Linear Separability
Exponential Smoothing which has been discussed in section
4. The third block describes Feature Extraction is basi-
cally the selection of features from data set and building the The observation concludes that Linear Discriminant Analy-
derived values so that the data becomes more informative sis cannot be applied to classify our data and hence, provid-
ing a stronger justification to why Xgboost is used. Another tion gain due to a split can be calculated as follows
important reason is that since each decision trees in the al-
∆I(N ) = I(N ) − PL ∗ I(PL ) − PR ∗ I(PL )
gorithm operate on the random subspace of the feature space,
it leads to the automatic selection of the most relevant subset where I(N) is the impurity measure (Gini or Shannon En-
of features. tropy) of node N, PL is the proportion of the population in
After the analysis it is found that the data is not linearly sep- node N that goes to the left child of N after the split and
arable, hence the metric methods are not preferred. There- similarly, PR is the proportion of the population in node N
fore the implementation of non-metric methods come into that goes to the right child after the split. NL and NR are
picture which is discussed in the next section. the left and right child of N respectively.
Assume there are n data points D = {(xi , yi )}n i=1 and feature
vectors {xi }ni=1 with stated outcomes. Each feature vector
5.2 Non-Metric Classifiers is d- dimensional.
A few significant and often used non-metric classifier include: 1: We define a classification tree where each node is endowed
Decision Tree with a binary decision if xi <= k or not; where k is some
Random Forest threshold. The topmost node in the classification tree con-
Xtreme Gradient Boost tains all the data points and the set of data is subdivided
Decision Tree among the children of each node as defined by the classi-
Decision trees can be used for various machine learning ap- fication . The process of subdivision continues until every
plications. Decision tree constructs a tree that is used for node below has data belonging to one class only. Each node
classification and regression. But trees that are grown really is characterized by the feature, xi and threshold k chosen
deep to learn highly irregular patterns tend to over-fit the in such a way that minimizes diversity among the children
training sets. A slight noise in the data may cause the tree nodes. This is often referred as gini impurity.
to grow in a completely different manner. This is because 2: X = (X1 , ..., Xd ) is an array of random variables de-
of the fact that decision trees have very low bias and high fined on probability space called as random vectors. The
variance. Each of the nodes in tree is split on the basis of joint distribution of X1 , ..., Xd is a measure on µ on Rd ,
training set attributes. Every other node of the tree is then µ(A) = P (X ∈ A), A ∈ Rd where d = 1, ...., m. For exam-
split into child nodes based on certain splitting criteria or ple, Let x = (xi , ...., xd ) be an array of data points. Each
decision rule, which determines the allegiance of the particu- feature xi is defined as a random variable with some distri-
lar object (data) to the feature class. The leaf nodes must be bution. Then the random vector X has joint distribution
pure nodes; when any feature vector that is to be classified identical to the data points, x.
reaches a leaf node. Splitting is done on the basis of highest 3: Let us represent hk (x) = h(x|θk ) implying decision tree
importance of the attribute which is done using Gini impu- k leading to a classifier hk (x). Thus, a random forest is a
rity or Shannon entropy. Information gain is calculated classifier based on a family of classifiers h(x|θ1 ), ...., h(x|θk ),
and the attributes are selected. One significant advantage built on a classification tree with model parameters θk ran-
of decision trees is that both categorical and numerical data domly chosen from model random vector θ. Each classifier,
can be dealt with; a disadvantage is that decision trees tend hk (x) = h(x|θk ) is a predictor of the number of training
to over-fit the training data. In order to prevent over fitting samples. y =+ − 1 is the outcome associated with input data,
of the model pruning must be done, while constructing the x for the final classification function, f (x).
tree or after the tree is constructed. Next, we describe the working of the Xgboost learner by
Gini impurity is used as the function to measure the quality exploiting the key concepts defined above.
of split in each node. Gini impurity at node N is given by
X Xtreme Gradient Boost(XGBoost)
g(N ) = P (wi )P (wj ) XGBoost is another method which comes under non-metric
i6= j
classifier family which is based on the concept of Decision
where P (wi ) is the proportion of the population with class Tree, but there are significant differences between the two.
label i. Another function which can be used to judge the XGBoost is an ensemble of decision trees wherein weighted
quality of split is Shannon Entropy. It measures the disor- combinations of predictors is taken. XGBoost works on the
der in the information content. In Decision trees, Shannon same lines of Random Forest, but there is a difference in
entropy is used to measure the unpredictability in the in- working procedures. The similarities are that the features
formation contained in a particular node of a tree (In this extracted in both the cases is completely random in nature.
context, it measures how mixed the population in a node If n is the total number of attributes in the feature matrix
is). The entropy in a node N can be calculated as follows. then lets say m is the number of attributes which are finally
chosen to determine the split at each node. Here m is related
d
X to n in the following way
H(N ) = − P (wi )log2 (P (wi )) n
i=1 m=
3
where d is number of classes considered and P (wi ) is the XGBoost basically is a collection of weak classifier decision
proportion of the population labeled as i. Entropy is the trees and it primarily focuses to train the new decision tree
highest when all the classes are contained in equal propor- to learn from the errors committed by the previous tree[s].
tion in the node. It is the lowest when there is only one class The learning trees are trained sequentially. Initially, a re-
present in a node (when the node is pure). gression function is drawn which is fitted to the data set and
The best split is characterized by the highest gain in infor- due to random plotting of regression function errors occur
mation or the highest reduction in impurity. The informa- which are referred to as residual errors. Subsequently, a
plot of all the residual errors is considered and another re- ”+1+ and ”-1” has been removed as this is used for train-
gression function is made to fit the model, the residual errors ing. ID of the data set has been removed as it adds to
occurred in that case is taken care by the combination of the the noise and is not significant at all. Gneralized Linear
previous regression function and the current regression func- Models, for instance, assume that the features are uncorre-
tion. Hence continuing in this manner the regression func- lated. Assuming otherwise may sometimes make prediction
tion gets more and more complex in nature and the root less accurate, and most of the time make interpretation of
mean squared error is observed to be significantly reduced. the model almost impossible, under the GLM setting. Fortu-
The following are the basic steps followed while executing nately, boosted trees are very robust to these features. This
XGBoost eliminates the exercise of checking strong/weak correlation
of the features.
5.3 Algorithm Feature ranking & Feature correlation: Building a fea-
The following steps are recursively carried throughout the ture importance list is important as the ranks show the de-
process. creasing importance of features and may indicate/suggest
Step 1: Learn a regression Predictor. omitting a few based on Gain, Cover and Frequency. In this
Step 2: Compute the error residual. case, the Gain values are the same for all the six features.
Step 3: Learn to predict the residual. The error rate is Therefore pruning any of the features is not advised.
calculated using the parametersmentioned below. All the 6 technical indicators used are strongly correlated to
Error in prediction is given by each other. The correlation is confirmed by performing the
Chi-Squared Test. Higher Chi- Squared values imply better
J = (y, ŷ) correlation, p-values were always greater than 0.1.
where One of the limitations of XgBoost is its limited flexibility in
X handling non-numeric data. If there are categorical or ordi-
J(.) = (y[i] − ŷ[i])2 nal data, conversion to numeric data is necessary. The data
set under investigation has only numeric data, therefore it
ŷ can be adjusted in order to reduce the error by using the
did not affect the performance of XgBoost while pruning or
following formula
splitting.
y[i] = y[i] + αf [i] As figures 3 and 4 show (Please refer the next section, Re-
sults) train-RMSE error decreases. The model learns the
where data well without exhibiting random fluctuations in the er-
f [i] ≈ ∇J(y, ŷ) ror rate. Let us now proceed to the experimental results
which will confirm the theory and expectations from the
Each learner estimates the gradient of the loss function. predictive model.
Gradient Descent is used to take sequence of steps to re-
duce sum of predictors weighted by step size α. We present
the proposed algorithm for XGBoost below.
6. RESULT
The main aim of this paper is to predict the rise and fall of
Algorithm 1 Xtreme Gradient Boosting
the stock market. Hence as a measuring parameter we have
1: procedure XtremeGradientBoost(D) . D is the used +1 for indicating the rise in stock valuation in the
labeled training data future and -1 to indicate the fall in the prices. The follow-
2: Initialize model with a constant value ing results were observed after the computation of the data
n
set by using XGBoost. We obtain the root mean squared
X
F0 (x) = arg min L(yi , γ).
γ
i=1
error(RMSE) for the 60 day prediction and 90 day pre-
3: for do m = 0 to M diction for Apple Inc. Data set as
4: Compute the pseudo-residuals
5: Fit base learner to pseudo residuals
6: Ti = new DecisionTree()
7: f eaturesi = RandomFeatureSelection(Di )
8: Ti . train(Di ,f eaturesi )
9: Compute multiplier γm
10: Update the model
11: output Fm (x)
8. REFERENCES
[1]Das Shom Prasad,Padhy Sudersan, Support Vector Ma-
chines for Prediction of Future Prices in indian Stock Mar-
[Link] Journal of Computer Applications(0975-
8887),March 2012.
Fig.8: ROC curve for 60 days(Yahoo! Inc.). AUC is 1.0 [2]Chauhan Bhagwant,Bidave Umesh, Gangathade Ajit, Kale
Sachin Stock Market Prediction Using Artificial Neural Net-
[Link] Journal of Computer Science and Infor-
mation Technology,Vol 5(1),2014,904-907