Steps implementation of Time Series Data EDA
1. Data Ingesition
2. EDA of the Data
3. processing of Data
4. Model Building
5. Model Evalution
Data Ingestion Steps:-
1. import the required libraries such as numpy,pandas,matplotlib,seaborn,etc
2. Load the data
3. Load the time series data into a pandas dataframe
4. Set the datetime columns as the index of dataframe
5. Check datatype of the index and convert it into the dataframe if necessary
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv('TSLA.csv')
df
Date Open High Low Close Volume Dividends Stock Splits
0 2023-01-01 102.264052 102.844516 102.016732 102.375100 190884 0.0 0.0
1 2023-01-02 103.164210 103.568883 103.072105 103.268399 144529 0.0 0.0
2 2023-01-03 104.642948 104.945523 104.396706 104.661726 114590 0.0 0.0
3 2023-01-04 107.383841 107.749974 107.409781 107.514532 144406 0.0 0.0
4 2023-01-05 109.751399 109.687393 108.002799 109.147197 152652 0.0 0.0
... ... ... ... ... ... ... ... ...
360 2023-12-27 274.683259 274.739668 274.622839 274.681922 198906 0.0 0.0
361 2023-12-28 275.187029 275.220635 274.802580 275.070082 171058 0.0 0.0
362 2023-12-29 276.618878 277.740538 276.938281 277.099232 108824 0.0 0.0
363 2023-12-30 277.458843 278.365180 277.325499 277.716507 119610 0.0 0.0
364 2023-12-31 277.943161 278.736790 276.368373 277.682775 106382 0.0 0.0
365 rows × 8 columns
df.isnull().sum()
Date 0
Open 0
High 0
Low 0
Close 0
Volume 0
Dividends 0
Stock Splits 0
dtype: int64
Now we perform univariate analysis
df = df[['Date','Close']]
df
Date Close
0 2023-01-01 102.375100
1 2023-01-02 103.268399
2 2023-01-03 104.661726
3 2023-01-04 107.514532
4 2023-01-05 109.147197
... ... ...
360 2023-12-27 274.681922
361 2023-12-28 275.070082
362 2023-12-29 277.099232
363 2023-12-30 277.716507
364 2023-12-31 277.682775
365 rows × 2 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null object
1 Close 365 non-null float64
dtypes: float64(1), object(1)
memory usage: 5.8+ KB
df["Date"]=pd.to_datetime(df.Date)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null datetime64[ns]
1 Close 365 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB
stock_df=df.set_index("Date")
stock_df
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
... ...
2023-12-27 274.681922
2023-12-28 275.070082
2023-12-29 277.099232
2023-12-30 277.716507
2023-12-31 277.682775
365 rows × 1 columns
Why we convert this columns into index
1. Retrving of the data will be easy
2. Visualization will be easy
3. Whatever library we are using for the time series data like stats model scipy this library of data which is having index as a columns.
EDA of the Data
EDA of the Data
1. Summary Statistics such as mean,median,mode etc
2. Visualization the time series data
3. Stationarity check by using augmented dickey fuller test.
4. Check for Autocorrelation by using autocorrelation function (acf)
5. checking the Outlier
6. Check Partial autocorrelation function using ARIMA model.
Preprocessing of the data
1. fill the missing value (hear not required)
2. convert data into stationary time series
3. if necessary the normalized the data(hear(not required))
4. split the data into train and test.
5. clean the data by removing the outliers(hear not required)
stock_df.describe()
Close
count 365.000000
mean 199.661626
std 51.101389
min 102.375100
25% 147.327615
50% 205.663111
75% 238.942848
max 277.716507
stock_df.head()
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
plt.plot(stock_df)
plt.show()
plt.hist(stock_df)
plt.hist(stock_df)
plt.show()
sns.distplot(stock_df)
plt.show()
# plotting close price
plt.style.use('ggplot')
plt.figure(figsize=(18,8))
plt.grid(True)
plt.xlabel('Dates', fontsize = 20)
plt.xticks(fontsize = 15)
plt.ylabel('Close Prices', fontsize = 20)
plt.yticks(fontsize = 15)
plt.plot(stock_df['Close'], linewidth = 3, color = 'blue')
plt.title('Tesla Stock Closing Price', fontsize = 30)
plt.show()
# plotting close price
plt.style.use('ggplot')
plt.figure(figsize=(18,8))
plt.grid(True)
plt.xlabel('Dates', fontsize = 20)
plt.xticks(fontsize = 15)
plt.ylabel('Close Prices', fontsize = 20)
plt.yticks(fontsize = 15)
plt.hist(stock_df['Close'], linewidth = 3, color = 'blue')
plt.title('Tesla Stock Closing Price', fontsize = 30)
plt.show()
# Style and figure size
plt.style.use('ggplot')
plt.figure(figsize=(18, 8))
# Labeling
plt.xlabel('Dates', fontsize=20)
plt.xticks(fontsize=15)
plt.ylabel('Close Prices', fontsize=20)
plt.yticks(fontsize=15)
# Plotting the distribution (Kernel Density Estimate plot)
sns.kdeplot(stock_df['Close'], color='blue', linewidth=3)
# Title
plt.title('Tesla Stock Closing Price Distribution', fontsize=30)
plt.grid(True)
plt.show()
stock_df["Close"]
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
... ...
2023-12-27 274.681922
2023-12-28 275.070082
2023-12-29 277.099232
2023-12-30 277.716507
2023-12-31 277.682775
365 rows × 1 columns
dtype: float64
#Rolling mean which is windows size
rolemean=stock_df["Close"].rolling(120).mean()
rolemean
Close
Date
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 NaN
2023-01-05 NaN
... ...
2023-12-27 254.024091
2023-12-28 254.389435
2023-12-29 254.769750
2023-12-30 255.152673
2023-12-31 255.535278
365 rows × 1 columns
dtype: float64
#Rolling mean which is windows size
rolestd=stock_df["Close"].rolling(120).std()
rolestd
Close
Date
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 NaN
2023-01-05 NaN
... ...
2023-12-27 14.628715
2023-12-28 14.602063
2023-12-29 14.594201
2023-12-30 14.588376
2023-12-31 14.572035
365 rows × 1 columns
dtype: float64
plt.plot(stock_df.Close)
plt.plot(rolemean)
plt.plot(rolestd)
[<matplotlib.lines.Line2D at 0x7da8e8eca110>]
from statsmodels.tsa.stattools import adfuller
adft=adfuller(stock_df['Close'])
pd.Series(adft[0:4],index=["test stats","p value","lag","data points"])
test stats -1.893196
p value 0.335269
lag 0.000000
data points 364.000000
dtype: float64
# null hypotheseis=data is not stationary
# alternate hypothesis=data is stationary
# p value=0.335269
# p<0.05
# reject null hypothesis
# p>0.05
# accept null hypothesis
def test_stationarity(timeseries):
# Determining rolling statistics
rolmean = timeseries.rolling(48).mean() # rolling mean
rolstd = timeseries.rolling(48).std() # rolling standard deviation
# Plotting rolling statistics
plt.figure(figsize=(18, 8))
plt.grid('both')
plt.plot(timeseries, color='blue', label='Original', linewidth=3)
plt.plot(rolmean, color='red', label='Rolling Mean', linewidth=3)
plt.plot(rolstd, color='black', label='Rolling Std', linewidth=4)
plt.legend(loc='best', fontsize=20, shadow=True, facecolor='lightgray',edgecolor='k')
plt.title('Rolling Mean and Standard Deviation', fontsize=25)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show(block=False)
# Perform Dickey-Fuller test
print("Results of Dickey-Fuller Test:")
adft = adfuller(timeseries, autolag='AIC')
# Displaying the output of the Dickey-Fuller test
output = pd.Series(adft[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'
for key, value in adft[4].items():
output[f'Critical Value ({key})'] = value
print(output)
test_stationarity(stock_df.Close)
Results of Dickey-Fuller Test:
Test Statistic -1.893196
p-value 0.335269
#Lags Used 0.000000
Number of Observations Used 364.000000
Critical Value (1%) -3.448443
Critical Value (5%) -2.869513
Critical Value (10%) -2.571018
dtype: float64
#check the outliers
sns.boxplot(stock_df.Close)
<Axes: ylabel='Close'>
#Time series decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(stock_df[["Close"]],period=12)
result.plot()
plt.show()
result.seasonal
seasonal
Date
2023-01-01 -0.049962
2023-01-02 0.098094
2023-01-03 -0.012132
2023-01-04 0.071651
2023-01-05 0.282969
... ...
2023-12-27 -0.049962
2023-12-28 0.098094
2023-12-29 -0.012132
2023-12-30 0.071651
2023-12-31 0.282969
365 rows × 1 columns
dtype: float64
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
plot_acf(stock_df.Close)#this function provide the correlation value on differenct different lags
plot_pacf(stock_df.Close)
df_close=stock_df["Close"]
df_close
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
... ...
2023-12-27 274.681922
2023-12-28 275.070082
2023-12-29 277.099232
2023-12-30 277.716507
2023-12-31 277.682775
365 rows × 1 columns
dtype: float64
df_close
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
... ...
2023-12-27 274.681922
2023-12-28 275.070082
2023-12-29 277.099232
2023-12-30 277.716507
2023-12-31 277.682775
365 rows × 1 columns
dtype: float64
df_close=df_close.diff()
df_close=df_close.dropna()
test_stationarity(df_close)
Results of Dickey-Fuller Test:
Test Statistic -5.281090
p-value 0.000006
#Lags Used 7.000000
Number of Observations Used 356.000000
Critical Value (1%) -3.448853
Critical Value (5%) -2.869693
Critical Value (10%) -2.571114
dtype: float64
perform train test split of time series model
df_close[0:-60]#training data
Close
Date
2023-01-02 0.893299
2023-01-03 1.393326
2023-01-04 2.852806
2023-01-05 1.632665
2023-01-06 0.547885
... ...
2023-10-28 -0.800922
2023-10-29 2.113884
2023-10-30 0.623045
2023-10-31 -0.615555
2023-11-01 1.324551
304 rows × 1 columns
dtype: float64
df_close[-60:]#testing data
Close
Date
2023-11-02 -0.166372
2023-11-03 -1.044071
2023-11-04 -0.820504
2023-11-05 1.426642
2023-11-06 0.892228
2023-11-07 -0.178966
2023-11-08 1.583613
2023-11-09 -0.351853
2023-11-10 -0.247485
2023-11-10 -0.247485
2023-11-11 -0.109094
2023-11-12 0.598929
2023-11-13 0.740941
2023-11-14 0.324071
2023-11-15 0.351657
2023-11-16 0.286591
2023-11-17 -0.604990
2023-11-18 0.541170
2023-11-19 0.030313
2023-11-20 -0.329250
2023-11-21 -0.608250
2023-11-22 0.342926
2023-11-23 0.701763
2023-11-24 1.912550
2023-11-25 0.010383
2023-11-26 1.710726
2023-11-27 1.372687
2023-11-28 -0.686699
2023-11-29 0.954674
2023-11-30 -0.466640
2023-12-01 -2.042970
2023-12-02 0.638001
2023-12-03 -1.164504
2023-12-04 1.029772
2023-12-05 0.085211
2023-12-06 1.994400
2023-12-07 2.125879
2023-12-08 -0.086252
2023-12-09 -0.577820
2023-12-10 -0.268866
2023-12-11 -0.369897
2023-12-12 -0.074961
2023-12-13 0.421395
2023-12-14 0.703201
2023-12-15 1.114858
2023-12-16 0.921257
2023-12-17 -0.380721
2023-12-18 -1.068875
2023-12-19 1.768739
2023-12-20 0.322271
2023-12-21 -0.512357
2023-12-22 0.072897
2023-12-23 -1.323132
2023-12-24 0.106720
2023-12-25 -0.330315
2023-12-26 1.021908
2023-12-27 1.472730
2023-12-28 0.388160
2023-12-29 2.029151
2023-12-30 0.617275
2023-12-31 -0.033732
dtype: float64
#split data into train and training set
#split data into train and training set
train_data=df_close[0:-60]
test_data=df_close[-60:]
plt.figure(figsize=(18,8))
plt.grid(True)
plt.xlabel('Dates', fontsize = 20)
plt.ylabel('Closing Prices', fontsize = 20)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.plot(train_data, 'green', label='Train data', linewidth = 5)
plt.plot(test_data, 'blue', label='Test data', linewidth = 5)
plt.legend(fontsize = 20, shadow=True,facecolor='lightpink', edgecolor = 'k')
<matplotlib.legend.Legend at 0x7da92d05a890>
Model building in Time Series
#in this time we use arima model
stock_df["Close"]
Close
Date
2023-01-01 102.375100
2023-01-02 103.268399
2023-01-03 104.661726
2023-01-04 107.514532
2023-01-05 109.147197
... ...
2023-12-27 274.681922
2023-12-28 275.070082
2023-12-29 277.099232
2023-12-30 277.716507
2023-12-31 277.682775
365 rows × 1 columns
dtype: float64
365-60
305
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
history = [x for x in train_data]
history
[0.8932991355088831,
1.3933263239742928,
2.852806278873757,
1.6326649420308286,
0.5478846329143892,
0.7696058342437482,
0.9625287430472156,
0.018231373084361735,
1.1077005040734065,
0.7497050402641605,
1.972947678168694,
0.40951767041809717,
1.1636240679130196,
1.2304959594809617,
0.18371201236193713,
2.4325497305482457,
-0.26391312202788697,
1.442625668491928,
-0.5318829334557194,
-2.413729246967847,
1.6712015109161484,
0.9317498913066089,
0.0698322548046093,
2.807609973898053,
-1.0265658609344541,
0.43923686869243284,
0.5207027332374281,
1.6977267356932941,
2.4579251484588553,
0.3907230052264481,
1.0285667069413762,
-0.4168248436978672,
-1.31013559865616,
-0.06614521282753572,
0.7182963135563796,
1.4772153494880627,
1.995977100207341,
-0.2607738551311627,
0.18751835521305793,
-0.1470745941979601,
-0.9302395758612363,
-1.175458451390483,
2.2248009695441056,
-0.02044980691758269,
0.19315671831344616,
-0.6671134180505476,
1.1357477139460173,
-0.8821431823303101,
0.35407616995422586,
-0.8295849273184785,
0.9959651825627134,
-0.06403624432709876,
-0.1254060218069526,
0.20201766368549556,
0.8513926357980495,
0.44520452621046047,
0.3740700567433919,
0.532466961761088,
-0.12477596304930216,
-0.01570540903489359,
0.1639978537589286,
-0.7857140190270115,
-1.2328669657189266,
1.1792026695864877,
-0.017910958370322305,
-1.4507965353131453,
1.5357081015208678,
-0.61471958905355,
0.41149953226522484,
1.0698880810996627,
0.8176723681895055,
1.6438973978009699,
-0.5466558718802901,
0.5599350920078052,
-0.17443936153424033,
-0.5136371255328243,
0.187439400478155,
0.5121577171133822,
0.4071279719513825,
-0.760302556354219,
1.2999894672940684,
0.667526643148932,
-0.7276142970002581,
1.912569456755989,
2.027267021302805,
2.3918575744206407,
-0.13105016392063362,
-0.17900711779094536,
1.4184955199386025,
0.38286385364656894,
1.3036863601885216,
0.5610416187842588,
1.835680127192603,
0.8274719712972285,
0.962973437639846,
0.43310279806178187,
2.0598256806272843,
0.8728259721148106,
1.1014355498728605,
2.7299102840982528,
-1.5219601321712162,
-0.5815011359226219,
1.236853495737762,
-0.2699754161442627,
1.9296890641726065,
0.6184725435801113,
-0.29059930396928735,
2.217915352321569,
2.415112577533364,
2.2554566313530415,
1.5617323934664569,
-0.9518744601043636,
2.5516783572660984,
0.41447323494159605,
1.4482892620561643,
1.3321046160507137,
0.23776838206009643,
1.2299132591679154,
1.5062757615648366,
0.7259905772023387,
-0.14271730621246093,
0.5038740865232683,
1.8941510595801674,
-0.20365688303775187,
0.04264250908897793,
0.3792145940770979,
2.384819429402455,
1.2267840907422567,
1.0470496207943256,
-0.9197373312707668,
1.105424410204506,
-0.12503909465468155,
0.4347971674349935,
-0.00047127816949910084,
1.4605530632271666,
0.8567678108843779,
0.4095450043537312,
0.9541648088898853,
-0.9846694681493204,
-1.0661810820396056,
1.0496226968534188,
0.6124782628608898,
1.3059418277300097,
3.0336473041813576,
1.6702186490335293,
-0.47266247272744977,
1.2718941508587136,
-1.0736219471799018,
0.56404071487745,
0.050955787937795094,
2.648999286346992,
-0.3239580264257995,
-0.4083176648713902,
0.736210905074671,
-0.47007975917725275,
1.2075272192262787,
-0.4642814851631556,
-0.648229305801209,
0.42947268677431794,
-0.21222472933988,
2.547204410309206,
1.4523247896722182,
0.14151962258091544,
0.021271593676090106,
0.2571075913099321,
0.41571940289978215,
-0.7933694966874612,
0.8053587181372848,
1.0925945456034185,
1.7128518843369989,
0.7429764591254013,
1.1948999450876556,
-0.12492871305823883,
-0.3450670390670041,
0.7897461900880103,
-0.1497014015942284,
0.08206034496333814,
-0.01971566829175231,
0.138933152095575,
0.4939003011037073,
-0.5239700280359045,
-1.155146324585104,
-0.5870156275400689,
0.686234065534677,
-1.335096902104965,
-0.445184759171525,
0.7279536732392273,
-0.5440324664817808,
2.058874594525122,
-0.27255959791790474,
-0.2724127433110368,
0.8499787304334347,
-0.7095231836120774,
1.1477017576514754,
0.40817688196634094,
1.3651581417331329,
1.3606390307013214,
2.229787485292121,
1.8274741977712665,
0.26484370559887793,
0.5955447688372146,
1.5488637598278103,
1.334956406980524,
1.048843017829114,
-1.0411073790201328,
0.12284371446423847,
-0.3749746318142684,
1.3140512763593222,
0.3287323868915166,
1.1016466869869816,
0.5882608524204045,
1.3703320827427206,
-0.1815827758780415,
0.2385799079773676,
-0.47976109312293147,
0.6954842758049438,
0.6059251624713227,
3.2347373527486525,
0.33844309895349056,
-0.303456523336763,
-0.002355769474434055,
-0.04602240039588423,
1.1438055364243667,
-1.6396435925654487,
1.2086371235202478,
0.5362330707300487,
0.9004626050362674,
-0.7666321295496061,
1.1056824150820432,
-1.0717533413471472,
-0.35213855788359183,
0.10624865407280026,
0.7082703765054248,
-0.3099064479051208,
0.9823418343268884,
2.307936411124075,
-2.0564375154723677,
1.2698091022463132,
0.8851898436832926,
0.24098021966440797,
-0.08847856271148657,
0.2326345171115065,
0.30425645941497237,
0.004451985076087794,
-1.0164009236605978,
1.7421066978022566,
1.9739693751788252,
-0.38503585552368236,
-1.0979274424854566,
0.9869216436693193,
-0.021929141850222322,
0.6957551437645861,
0.10236768419537157,
1.0597859114010078,
1.3301609680348747,
-0.2621673797302151,
-1.0454573238503997,
-1.2060857036777577,
1.3796231147639162,
-0.9547226304205481,
0.27181288446479357,
0.3146558113507183,
-0.013781623356692307,
-1.435210642150338,
0.5538345451957127,
1.220740058501633,
0.6369303829316095,
0.25667955803731957,
0.4518904213170458,
0.8895449833110263,
-2.5488895093606345,
2.8701273263094436,
0.8968927087922793,
-0.08302734093248887,
-0.12824623694370985,
1.1801893787638278,
0.7215067046920751,
-2.1639138156295985,
2.479478937589903,
0.7481824800907759,
1.2967698907965826,
-0.12351585784816166,
1.714674382931122,
1.537481772986979,
0.9174169024166474,
-1.1684285505278353,
2.4679278706622654,
0.9080225036720435,
1.7723986293189284,
-0.24381132693795848,
-0.014754413257094257,
2.781816436072006,
-0.2887306275033268,
-0.20006864541986147,
2.056440550381069,
0.5849053174142966,
1.3754351704423016,
-0.32557474963380173,
1.0517465564922759,
-0.8009215689510256,
2.113884101388976,
0.6230446831372092,
-0.6155548195604865,
1.3245506334627066]
#train arima model and we pass data as a history
model=ARIMA(history,order=(1,1,1))
model=model.fit()
model.summary()
SARIMAX Results
Dep. Variable: y No. Observations: 304
Model: ARIMA(1, 1, 1) Log Likelihood -444.626
Date: Sat, 02 Nov 2024 AIC 895.251
Time: 00:05:55 BIC 906.392
Sample: 0 HQIC 899.708
- 304
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 -0.1192 0.054 -2.195 0.028 -0.226 -0.013
ma.L1 -0.9370 0.023 -41.528 0.000 -0.981 -0.893
sigma2 1.0933 0.091 11.989 0.000 0.915 1.272
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 0.09
Prob(Q): 0.91 Prob(JB): 0.95
Heteroskedasticity (H): 1.07 Skew: 0.00
Prob(H) (two-sided): 0.75 Kurtosis: 2.91
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
len(history)
304
model.forecast()
array([0.56016095])
mean_squared_error([test_data[0]],model.forecast())
0.527849948848813
np.sqrt(mean_squared_error([test_data[0]],model.forecast()))
0.7265328270964864
def train_arima_model(x, y, arima_order):
# prepare training dataset
# make predictions list
history = [x for x in x]
predictions = list()
for t in range(len(y)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit()
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(y[t])
# calculate out of sample error
rmse = np.sqrt(mean_squared_error(y, predictions))
return rmse
# evaluate different combinations of p, d and q values for an ARIMA model to get the best order for ARIMA Model
def evaluate_models(dataset, test, p_values, d_values, q_values):
dataset=dataset.astype('float32')
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
try:
rmse=train_arima_model(dataset, test, order)
if rmse<best_score:
best_score, best_cfg = rmse, order
print('ARIMA%s RMSE=%.3f' % (order,rmse))
except:
continue
print('Best ARIMA%s RMSE=%.3f'%(best_cfg,best_score))
p_values=range(0,3)
d_values=range(0,3)
q_values=range(0,3)
evaluate_models(train_data,test_data,p_values,d_values,q_values)
ARIMA(0, 0, 0) RMSE=0.932
ARIMA(0, 0, 1) RMSE=0.940
ARIMA(0, 0, 2) RMSE=0.940
ARIMA(0, 1, 0) RMSE=1.237
ARIMA(0, 1, 1) RMSE=0.933
ARIMA(0, 1, 2) RMSE=0.958
ARIMA(0, 2, 0) RMSE=2.140
ARIMA(0, 2, 1) RMSE=1.239
ARIMA(0, 2, 2) RMSE=0.938
ARIMA(1, 0, 0) RMSE=0.941
ARIMA(1, 0, 1) RMSE=0.941
ARIMA(1, 0, 2) RMSE=0.953
ARIMA(1, 1, 0) RMSE=1.097
ARIMA(1, 1, 1) RMSE=0.955
ARIMA(1, 1, 2) RMSE=0.968
ARIMA(1, 2, 0) RMSE=1.604
ARIMA(1, 2, 1) RMSE=1.098
ARIMA(1, 2, 2) RMSE=0.959
ARIMA(2, 0, 0) RMSE=0.940
ARIMA(2, 0, 1) RMSE=0.953
ARIMA(2, 0, 2) RMSE=0.913
ARIMA(2, 1, 0) RMSE=1.045
ARIMA(2, 1, 1) RMSE=0.960
ARIMA(2, 1, 2) RMSE=0.957
ARIMA(2, 2, 0) RMSE=1.303
ARIMA(2, 2, 1) RMSE=1.047
ARIMA(2, 2, 2) RMSE=0.965
Best ARIMA(2, 0, 2) RMSE=0.913
history=[x for x in train_data]
predictions=list()
for i in range(len(test_data)):
model=ARIMA(history,order=(2,0,0))
model=model.fit()
fc=model.forecast(alpha=0.05)#alpha=0.05 we set out confidence interval 95%
predictions.append(fc)
history.append(test_data[i])
print(f"my RMSE {np.sqrt(mean_squared_error(test_data,predictions))}")
my RMSE 0.9404476463697088
plt.figure(figsize=(18,8))
plt.grid(True)
plt.plot(range(len(test_data)), test_data,label='True Test Close Value',linewidth = 5)
plt.plot(range(len(predictions)), predictions, label = 'Predictions on test data', linewidth = 5)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.legend(fontsize = 20, shadow=True, facecolor='lightpink', edgecolor = 'k')
plt.show()
fc_series=pd.Series(predictions,index=test_data.index)
#plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train_data, label='Training', color = 'blue')
plt.plot(test_data, label='Test', color = 'green', linewidth = 3)
plt.plot(fc_series, label='Forecast', color = 'red')
plt.title('Forecast vs Actuals on test data')
plt.legend(loc='upper left', fontsize=8)
plt.show
matplotlib.pyplot.show
def show(*args, **kwargs)
Display all open figures.
Parameters
----------
block : bool, optional
Whether to wait for all figures to be closed before returning.
If `True` block and run the GUI main loop until all figure windows
are closed.
If `False` ensure that all figure windows are displayed and return
immediately. In this case, you are responsible for ensuring
that the event loop is running to have responsive figures.
Defaults to True in non-interactive mode and to False in interactive
mode (see `.pyplot.isinteractive`).
See Also
--------
ion : Enable interactive mode, which shows / updates the figure after
every plotting command, so that calling ``show()`` is not necessary.
ioff : Disable interactive mode.
savefig : Save the figure to an image file instead of showing it on screen.
Notes
-----
**Saving figures to file and showing a window at the same time**
If you want an image file as well as a user interface window, use
`.pyplot.savefig` before `.pyplot.show`. At the end of (a blocking)
from statsmodels.graphics.tsaplots
``show()`` the figure is closedimport plot_predict
and thus unregistered from pyplot. Calling
fig=plt.figure(figsize=(18,8))
`.pyplot.savefig` afterwards would save a new and thus empty figure. This
ax1=fig.add_subplot(111)#these are a forcasted next 60 days data
limitation of command order does not apply if the show is non-blocking or
plot_predict(result=model,start=1,end=len(df_close)+60,ax=ax1)
if you keep a reference to the figure and use `.Figure.savefig`.
plt.grid("both")
plt.legend(['Forecast', 'Close', '95% confidence interval'], fontsize = 20, shadow=True, facecolor='lightblue',
**Auto-show in jupyter notebooks**
plt.show()
The jupyter backends (activated via ``%matplotlib inline``,
``%matplotlib notebook``, or ``%matplotlib widget``), call ``show()`` at
the end of every cell by default. Thus, you usually don't have to call it
history= [x for x in train_data]
predictions = list()
conf_list = list()
for t in range(len(test_data)):
model=sm.tsa.statespace.SARIMAX(history, order = (0,1,0), seasonal_order = (1,1,1,3))
model_fit = model.fit()
fc=model_fit.forecast()
predictions.append(fc)
history.append(test_data[t])
print('RMSE OF SARIMA Model:', np.sqrt(mean_squared_error(test_data, predictions)))#my RMSE 0.9404476463697088
RMSE OF SARIMA Model: 1.2743650895214962
plt.figure(figsize=(18,8))
plt.grid(True)
plt.plot(range(len(test_data)), test_data,label='True Test Close Value',linewidth = 5)
plt.plot(range(len(predictions)), predictions, label = 'Predictions on test data', linewidth = 5)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.legend(fontsize = 20, shadow=True, facecolor='lightpink', edgecolor = 'k')
plt.show()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js