0% found this document useful (0 votes)
56 views16 pages

Infosys Stock Trend - Ipynb - Colab

Uploaded by

sudeha217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views16 pages

Infosys Stock Trend - Ipynb - Colab

Uploaded by

sudeha217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

!

pip install statsmodels --upgrade

Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-pa


Requirement already satisfied: numpy<3,>=1.22.3 in /usr/local/lib/python3.10/di
Requirement already satisfied: scipy!=1.9.2,>=1.8 in /usr/local/lib/python3.10/
Requirement already satisfied: pandas!=2.1.0,>=1.4 in /usr/local/lib/python3.10
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-p
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.10/dis
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-p
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packa

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.api import VAR
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from datetime import datetime
import statsmodels.api as sm

import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_predict

# Download Infosys data from Yahoo Finance


infy = yf.Ticker("INFY.NS")
infy_data = infy.history(period="max")
infy_data

Open High Low Close Volume Dividends

Date

1996-01-01
0.505942 0.507060 0.503456 0.507060 204800 0.0
00:00:00+05:30

1996-01-02
0.505942 0.508428 0.505010 0.505010 204800 0.0
00:00:00+05:30

1996-01-03
0.508428 0.508428 0.508428 0.508428 102400 0.0
00:00:00+05:30

1996-01-04
0.503456 0.505942 0.503456 0.505072 307200 0.0
00:00:00+05:30

1996-01-05
0.499105 0.499105 0.499105 0.499105 51200 0.0
00:00:00+05:30

... ... ... ... ... ... ...

2024-11-06
1764.000000 1827.199951 1762.650024 1823.699951 10421494 0.0
00:00:00+05:30

2024-11-07
1812.949951 1825.699951 1787.000000 1803.050049 4550965 0.0
00:00:00+05:30

2024-11-08
1818.000000 1840.599976 1813.150024 1829.949951 4210960 0.0
print("Statistical
00:00:00+05:30 Summary:")
print(infy_data.describe())
2024-11-11
1829.000000 1868.000000 1822.550049 1860.099976 3804234 0.0
00:00:00+05:30
Statistical Summary:
2024-11-12 Open High Low Close Volume \ 0.0
1871.099976 1881.000000 1861.000000 1868.800049 5012379
00:00:00+05:30
count 7252.000000 7252.000000 7252.000000 7252.000000 7.252000e+03
mean 384.448908 388.514027 380.261307 384.389611 1.438291e+07
7252 rows 472.494492
std × 7 columns 476.753623 468.130597 472.460419 1.557419e+07
min 0.485741 0.487296 0.453732 0.485741 0.000000e+00
25% 54.168951 54.909816 52.722418 53.748831 5.657421e+06
50% 215.128774 217.226623 212.346900 215.002235 8.855444e+06
75% 450.623885 455.175605 444.116287 450.242088 1.675935e+07
max 1944.855616 1969.030269 1929.777259 1945.943237 2.766150e+08

Dividends Stock Splits


count 7252.000000 7252.000000
mean 0.039321 0.002758
std 0.681979 0.081315
min 0.000000 0.000000
25% 0.000000 0.000000
50% 0.000000 0.000000
75% 0.000000 0.000000
max 21.000000 4.000000
# Plotting Infosys's stock closing prices over time
plt.figure(figsize=(12, 6))
plt.plot(infy_data['Close'], label='Infosys Close Price')
plt.title('Infosys Stock Price Over Time')
plt.xlabel('Date')
plt.ylabel('Close Price (INR)')
plt.legend()
plt.show()

infy_data['Close_pct_change'] = (infy_data['Close'].pct_change() * 100).round(2)

# Boolean index to filter rows where the Close value has dropped by more than 10% f
dropped_rows = infy_data[infy_data['Close_pct_change'] <= -10]
styled_table = dropped_rows.style\
.set_table_styles([{'selector': 'thead',
'props': [('background-color', '#333'),
('color', 'white')]}])\
.set_caption('Infosys stock dropped more than 10%')\
.set_properties(**{'text-align': 'center'})\
.format({'Close_pct_change': '{:,.2f}%'}) \
.applymap(lambda x: 'background-color: lightgreen' if x == dropped_rows['Close_
.applymap(lambda x: 'background-color: lightblue' if x == dropped_rows['Close_p

# Display the styled table


display(styled_table)
<ipython-input-11-29800fb9bc54>:1: FutureWarning: Styler.applymap has been depr
styled_table = dropped_rows.style\
Infosys stock dropped more than 10%

Open High Low Close Volume Dividends

Date

2000-05-08
87.020557 88.447455 76.303640 76.888565 48399424 0.000000
00:00:00+05:30

2001-03-02
58.209847 59.677747 49.655530 50.215355 74829568 0.000000
00:00:00+05:30

2001-04-11
40.645652 41.489443 32.552967 32.552967 41200640 0.000000
00:00:00+05:30

2001-04-12
29.948713 30.744924 27.344455 28.967245 80036608 0.000000
00:00:00+05:30

2001-04-27
36.833433 36.833433 32.461302 32.952785 36926080 0.117188
00:00:00+05:30

2001-06-15
38.355058 38.355058 34.195954 34.537304 66734656 0.000000
00:00:00+05:30

2001-07-03
36.518958 36.559535 32.461296 33.409775 36732608 0.000000
00:00:00+05:30

2001-09-17
29.205028 29.205028 26.541679 26.541679 17588032 0.000000
00:00:00+05:30

2001-09-21
23.536475 23.838772 22.544884 22.544884 28133504 0.000000
00:00:00+05:30

2003-04-10
41.971829 41.971829 30.905598 31.220387 211021376 0.000000
00:00:00+05:30

2003-04-11
29.564530 29.564530 23.545159 27.162411 276615040 0.000000
00:00:00+05:30

2004-05-17
52.142103 52.142103 42.594884 46.660141 31777600 0.000000
00:00:00+05:30
52.142103 52.142103 42.594884 46.660141 31777600 0.000000
00:00:00+05:30
plt.plot(infy_data['Close'])
2009-05-19
151.823812 159.254196 137.340109 138.311600 44417904 0.000000
00:00:00+05:30
# Plot the dropped rows as red dots
plt.plot(dropped_rows['Close'], 'ro')
2012-04-13
231.280494 231.712189 219.440926 220.676315 75255568 0.000000
00:00:00+05:30
# Set the chart title and labels
plt.title('Infosys
2013-04-12 Limited (INFY.NS) Close Price')
244.943696 244.943696 211.692631 214.300644 98027744 0.000000
00:00:00+05:30
plt.xlabel('Date')
plt.ylabel('Price')
2019-10-22
607.265082 607.265082 560.913998 565.483215 90152532 8.000000
00:00:00+05:30
plt.show()
2020-03-23
480.467198 499.107557 459.024147 468.411041 17146150 0.000000
00:00:00+05:30

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf


from statsmodels.tsa.stattools import adfuller
result = adfuller(infy_data['Close'].dropna())
adf_statistic, p_value, used_lag, n_obs, critical_values, icbest = result

print(f"ADF Statistic: {adf_statistic}")


print(f"p-value: {p_value}")
print(f"Critical Values: {critical_values}")
print(f"IC Best: {icbest}")
ADF Statistic: 1.780064005331251
p-value: 0.9983086467163326
Critical Values: {'1%': -3.431256544864883, '5%': -2.8619406218847865, '10%': -
IC Best: 53130.1037315555

# Applying first-order differencing


infy_data['Close_diff'] = infy_data['Close'].diff().dropna()

# Re-run the ADF test on the differenced data


result_diff = adfuller(infy_data['Close_diff'].dropna())
adf_statistic_diff, p_value_diff, used_lag_diff, n_obs_diff, critical_values_diff,

# Print the results of the ADF test after differencing


print("ADF Statistic (after differencing):", adf_statistic_diff)
print("p-value (after differencing):", p_value_diff)
print("Critical Values (after differencing):", critical_values_diff)

# Checking stationarity based on the differenced ADF test


if p_value_diff < 0.05:
print("The data is now stationary after differencing.")
else:
print("The data is still non-stationary.")
ADF Statistic (after differencing): -14.908596648932555
p-value (after differencing): 1.4700599486043455e-27
Critical Values (after differencing): {'1%': -3.4312567962837206, '5%': -2.8619
The data is now stationary after differencing.
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(infy_data['Close_diff'].dropna(), lags=40, ax=ax1)
plot_pacf(infy_data['Close_diff'].dropna(), lags=40, ax=ax2)
plt.show()

model = ARIMA(infy_data['Close'], order=(1, 1, 1)) # Assuming p=1, d=1, q=1 based


arima_result = model.fit()

# Print the summary of the ARIMA model


print(arima_result.summary())

fig, ax = plt.subplots(figsize=(10, 6))


ax = infy_data['Close'].plot(ax=ax, label='Original')
fig = plot_predict(arima_result, start="2020-01-01", end=infy_data.index[-1], ax=ax

plt.title("ARIMA Model Forecast of Infosys Close Price")


plt.legend()
plt.show()
/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/base/tsa_model.py:473:
self._init_dates(dates, freq)
/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/base/tsa_model.py:473:
self._init_dates(dates, freq)
/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/base/tsa_model.py:473:
self._init_dates(dates, freq)
SARIMAX Results
==============================================================================
Dep. Variable: Close No. Observations: 7252
Model: ARIMA(1, 1, 1) Log Likelihood -26693.676
Date: Tue, 12 Nov 2024 AIC 53393.352
Time: 19:08:57 BIC 53414.019
Sample: 0 HQIC 53400.461
- 7252
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.8324 0.025 -32.899 0.000 -0.882 -0.783
ma.L1 0.8659 0.023 37.138 0.000 0.820 0.912
sigma2 92.2848 0.416 222.017 0.000 91.470 93.099
===============================================================================
Ljung-Box (L1) (Q): 0.28 Jarque-Bera (JB): 19812
Prob(Q): 0.60 Prob(JB):
Heteroskedasticity (H): 109.50 Skew: -
Prob(H) (two-sided): 0.00 Kurtosis: 2
===============================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-
data_var = infy_data[['Close', 'Volume']].dropna()

# Applying first-order differencing to make the data stationary


data_var_diff = data_var.diff().dropna()

# Fit the VAR model on the differenced data


var_model = VAR(data_var_diff)
var_results = var_model.fit(maxlags=15, ic='aic') # Selecting the best lag based o

# Print summary of the VAR model


print(var_results.summary())

# Forecasting for the next 10 steps


forecast_steps = 30

forecast = var_results.forecast(data_var_diff.values[-var_results.k_ar:], steps=for


forecast_index = pd.date_range(start=data_var_diff.index[-1] + pd.Timedelta(days=1)
forecast_df = pd.DataFrame(forecast, index=forecast_index, columns=['Close_forecast

# Plotting the forecasted Close prices


plt.figure(figsize=(12, 6))
plt.plot(data_var_diff['Close'].iloc[-100:], label='Actual Differenced Close Prices
plt.plot(forecast_df['Close_forecast'], label='Forecasted Differenced Close Prices'
plt.title("VAR Model Forecast for Infosys Differenced Close Prices")
plt.xlabel("Date")
plt.ylabel("Differenced Close Price")
plt.legend()
plt.show()

/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/base/tsa_model.py:473:
self._init_dates(dates, freq)
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Tue, 12, Nov, 2024
Time: 19:24:38
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: 36.8091
Nobs: 7236.00 HQIC: 36.7704
Log likelihood: -153435. FPE: 9.12773e+15
Log likelihood: -153435. FPE: 9.12773e+15
AIC: 36.7501 Det(Omega_mle): 9.05002e+15
--------------------------------------------------------------------
Results for equation Close
=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const 0.250495 0.113646 2.204 0.028
L1.Close 0.039736 0.011787 3.371 0.001
L1.Volume -0.000000 0.000000 -1.068 0.285
L2.Close -0.025370 0.011804 -2.149 0.032
L2.Volume -0.000000 0.000000 -0.277 0.782
L3.Close 0.011919 0.011811 1.009 0.313
L3.Volume -0.000000 0.000000 -0.628 0.530
L4.Close -0.008895 0.011816 -0.753 0.452
L4.Volume -0.000000 0.000000 -0.294 0.769
L5.Close 0.012165 0.011862 1.026 0.305
L5.Volume -0.000000 0.000000 -0.306 0.759
L6.Close -0.044420 0.011862 -3.745 0.000
L6.Volume -0.000000 0.000000 -0.497 0.619
L7.Close 0.023109 0.011872 1.947 0.052
L7.Volume 0.000000 0.000000 0.022 0.983
L8.Close -0.026218 0.011872 -2.208 0.027
L8.Volume -0.000000 0.000000 -0.266 0.790
L9.Close 0.017885 0.011887 1.505 0.132
L9.Volume -0.000000 0.000000 -0.361 0.718
L10.Close -0.014710 0.011890 -1.237 0.216
L10.Volume -0.000000 0.000000 -0.311 0.756
L11.Close 0.000662 0.011891 0.056 0.956
L11.Volume 0.000000 0.000000 0.168 0.866
L12.Close 0.015761 0.011890 1.326 0.185
L12.Volume -0.000000 0.000000 -0.103 0.918
L13.Close 0.023291 0.011890 1.959 0.050
L13.Volume -0.000000 0.000000 -0.564 0.573
L14.Close -0.002642 0.011890 -0.222 0.824
L14.Volume -0.000000 0.000000 -0.520 0.603
L15.Close 0.006713 0.011882 0.565 0.572
L15.Volume -0.000000 0.000000 -0.505 0.613
=============================================================================

Results for equation Volume


=============================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------
const 6509.961672 116914.784581 0.056 0.956
L1.Close -14017.875052 12126.121474 -1.156 0.248
L1.Volume -0.685753 0.011753 -58.348 0.000
L2.Close -4703.977812 12143.052159 -0.387 0.698
L2.Volume -0.529449 0.014180 -37.338 0.000
L3.Close -4182.859758 12150.343472 -0.344 0.731
L3.Volume -0.437622 0.015433 -28.357 0.000
L4.Close -943.612554 12155.718162 -0.078 0.938
L4.Volume -0.346772 0.016147 -21.475 0.000
L5.Close -2902.126012 12203.205592 -0.238 0.812
L5.Close -2902.126012 12203.205592 -0.238 0.812
# Importing necessary
L5.Volume libraries
-0.284638 for comparison analysis
0.016558 -17.190 0.000
import yfinance as yf4261.744782
L6.Close 12203.588836 0.349 0.727
import pandas as pd
L6.Volume -0.299434 0.016785 -17.840 0.000
import matplotlib.pyplot
L7.Close as plt
-2648.054491 12213.124895 -0.217 0.828
from scipy.stats import f_oneway
L7.Volume -0.259335 0.017023 -15.235 0.000
L8.Close 5473.581957 12212.953055 0.448 0.654
# Fetching data for Infosys and other IT companies in the same domain (e.g., 0.000
L8.Volume -0.235372 0.017070 -13.789 TCS, W
L9.Close
tickers = { -5368.862864 12228.650581 -0.439 0.661
L9.Volume "INFY.NS",
"Infosys": -0.178662 0.017022 -10.496 0.000
L10.Close
"TCS": "TCS.NS", 8269.714469 12232.329517 0.676 0.499
L10.Volume -0.163061 0.016783 -9.716 0.000
"Wipro": "WIPRO.NS",
L11.Close -2192.363088 12232.540552 -0.179 0.858
"HCL Technologies": "HCLTECH.NS"
L11.Volume -0.153986 0.016556 -9.301 0.000
}
L12.Close -3139.218801 12232.175702 -0.257 0.797
L12.Volume -0.171137 0.016145 -10.600 0.000
# Downloading
L13.Close historical stock
5335.931400 data for all companies
12232.271521 0.436 0.663
data L13.Volume
= {} -0.115165 0.015430 -7.464 0.000
for company,
L14.Closeticker in tickers.items():
1240.402887 12232.378255 0.101 0.919
data[company]
L14.Volume = yf.download(ticker,
-0.133572 start="2015-01-01",
0.014177 end="2023-12-31")['Clos
-9.422 0.000
L15.Close 2717.072327 12223.372263 0.222 0.824
L15.Volume -0.075875 0.011752
[*********************100%***********************] -6.456
1 of 1 completed 0.000
=============================================================================
[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
Correlation matrix of residuals
[*********************100%***********************] 1 of 1 completed
Close Volume
Close 1.000000 -0.032149
stock_data
Volume = pd.concat(data, axis=1, join='outer')
-0.032149 1.000000

#Rename the columns to represent the company name


stock_data.columns = tickers.keys()
stock_data

Infosys TCS Wipro HCL Technologies

Date

2015-01-01 00:00:00+00:00 493.600006 1272.775024 207.150055 401.700012

2015-01-02 00:00:00+00:00 503.299988 1289.724976 208.987549 401.312500

2015-01-05 00:00:00+00:00 498.975006 1270.125000 209.362549 394.562500

2015-01-06 00:00:00+00:00 488.549988 1223.300049 204.468796 384.024994

2015-01-07 00:00:00+00:00 490.887512 1208.849976 202.912552 374.899994

... ... ... ... ...

2023-12-22 00:00:00+00:00 1562.900024 3824.000000 462.649994 1462.699951

2023-12-26 00:00:00+00:00 1543.949951 3795.550049 470.100006 1458.150024

2023-12-27 00:00:00+00:00 1567.099976 3811.199951 470.950012 1472.050049

2023-12-28 00:00:00+00:00 1562.650024 3799.899902 469.450012 1472.449951

2023-12-29 00:00:00+00:00 1542.900024 3793.399902 471.299988 1466.099976

2221 rows × 4 columns


# Plotting stock price trends for all companies
plt.figure(figsize=(14, 7))
for company in stock_data.columns:
plt.plot(stock_data.index, stock_data[company], label=company)
plt.title("Stock Price Trends of Infosys and Competitors (2015-2023)")
plt.xlabel("Date")
plt.ylabel("Close Price (INR)")
plt.legend()
plt.show()
from scipy.stats import f_oneway

# Performing ANOVA tests: Infosys vs other companies


anova_results = {}

for company in stock_data.columns:


if company != "Infosys": # Compare Infosys with each company
anova_result = f_oneway(stock_data["Infosys"].dropna(), stock_data[company]
anova_results[f"Infosys vs {company}"] = {
"F-statistic": anova_result.statistic,
"p-value": anova_result.pvalue
}

# Displaying the results of ANOVA tests


anova_results

{'Infosys vs TCS': {'F-statistic': 3615.814101251156, 'p-value': 0.0},


'Infosys vs Wipro': {'F-statistic': 3930.6524639233107, 'p-value': 0.0},
'Infosys vs HCL Technologies': {'F-statistic': 413.30666676139117,
'p-value': 6.287628400349884e-88}}

for comparison, result in anova_results.items():


print(f"{comparison}:")
print(f" F-statistic: {result['F-statistic']:.4f}")
print(f" p-value: {result['p-value']:.4f}")
if result['p-value'] < 0.05:
print(" -> Significant difference in means.\n")
else:
print(" -> No significant difference in means.\n")

Infosys vs TCS:
F-statistic: 3615.8141
p-value: 0.0000
-> Significant difference in means.

Infosys vs Wipro:
F-statistic: 3930.6525
p-value: 0.0000
-> Significant difference in means.

Infosys vs HCL Technologies:


F-statistic: 413.3067
p-value: 0.0000
-> Significant difference in means.

import requests
from bs4 import BeautifulSoup
import pandas as pd
# Define the URL for Infosys news on Seeking Alpha
url = 'https://seekingalpha.com/symbol/INFY/news'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KH
}

response = requests.get(url, headers=headers)


if response.status_code == 200:
print("Request successful!")
else:
print(f"Failed to retrieve content. Status code: {response.status_code}")

# Check if the request was successful


if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Find all article containers


articles = soup.find_all('li', class_='symbol_item')

# Initialize lists to store article details


titles = []
links = []
dates = []

# Extract details for each article


for article in articles:
# Extract title
title = article.find('a', class_='title').get_text(strip=True)
titles.append(title)

# Extract link
link = 'https://seekingalpha.com' + article.find('a', class_='title')['href
links.append(link)

# Extract date
date = article.find('span', class_='date').get_text(strip=True)
dates.append(date)

# Create a DataFrame to store the articles


df = pd.DataFrame({
'Title': titles,
'Link': links,
'Date': dates
})

# Display the DataFrame


print(df)
else:
print(f'Failed to retrieve content. Status code: {response.status_code}')

Request successful!
Empty DataFrame
Columns: [Title, Link, Date]
Index: []

Start coding or generate with AI.

You might also like