FEATURES OF PYTHON PACKAGES:
1. NUMPY
One of the most fundamental packages in Python, NumPy is a general-purpose array-
processing package. It provides high-performance multidimensional array objects and tools
to work with the arrays. NumPy is an efficient container of generic multi-dimensional data.
NumPy’s main object is the homogeneous multidimensional array. It is a table of Elements
or numbers of the same datatype, indexed by a tuple of positive integers. In NumPy,
dimensions are called axes and the number of axes is called rank. NumPy’s array class is
called ndarray aka array.
Basic array operations: add, multiply, slice, flatten, reshape, index arrays
Advanced array operations: stack arrays, split into sections, broadcast arrays
Work with DateTime or Linear Algebra
Basic Slicing and Advanced Indexing in NumPy Python.
2. SCIPY
The SciPy library is one of the core packages that make up the SciPy stack. Now, there is a
difference between SciPy Stack and SciPy, the library. SciPy builds on the NumPy array
object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with
additional tools, SciPy library contains modules for efficient mathematical routines as linear
algebra, interpolation, optimization, integration, and statistics. There are various issues
related to Scientific Computation that arises while working with data science.
SciPy provides us with a variety of sub-packages to solve these issues efficiently.
SciPy library has amazingly fast computational power and easy to use.
It can operate an array of NumPy libraries and has also optimized the functions used
in NumPy.
After GNU Scientific library, SciPy is one of the most used scientific libraries.
3. PANDAS
Pandas is an open-source Python package that provides high-performance, easy-to-use
data structures and data analysis tools for the labeled data in Python programming
language. Pandas stand for Python Data Analysis Library. Pandas is a perfect tool for data
wrangling or munging. It is designed for quick and easy data manipulation, reading,
aggregation, and visualization. Pandas take data in a CSV or TSV file or a SQL database
and create a Python object with rows and columns called a data frame. The data frame is
very similar to a table in statistical software, say Excel or SPSS.
Indexing, manipulating, renaming, sorting, merging data frame
Update, Add, Delete columns from a data frame
Impute missing files, handle missing data or NANs
Plot data with histogram or box plot
4. STATSMODELS
Statsmodels is built for hardcore statistics. The core of the Statsmodels Library is
production ready”. Traditional models like robust linear models, generalized linear model
(GLM) etc. have all been around for a long time and have been validated against “R &
Stata”. It also contains the time series analysis section, which includes vector
autoregression (VAR), AR & ARMA.
Linear/ Multiple regression – Linear regression is a statistical method for modeling
the linear relationship between a dependent variable and one or more explanatory
variables.
Logistic regression – The logistic model is used in statistics to model the
likelihood of a specific event/class occurring such as win/lose, pass/fail, etc.
Time series analysis – It refers to the analysis of time series data to retrieve
meaningful statistics and many other data characteristics
Statistical tests – Refers to the many statistical tests that can be done using the
Statsmodels Library.
5. JUPYTER
Project Jupyter is a suite of software products used in interactive computing. Packages
under Jupyter project include
Jupyter notebook − A web based interface to programming environments of Python,
Julia, R and many others
QtConsole − Qt based terminal for Jupyter kernels similar to IPython
nbviewer − Facility to share Jupyter notebooks
JupyterLab − Modern web based integrated interface for all products.
Offers a powerful interactive Python shell.
Acts as a main kernel for Jupyter notebook and other front end tools of Project
Jupyter.
Possesses object introspection ability. Introspection is the ability to check
properties of an object during runtime.
Syntax highlighting.
Stores the history of interactions.
Tab completion of keywords, variables and function names.
Magic command system useful for controlling Python environment and
performing OS tasks.
PYTHON INSTALLATION
Open the python official web site. (https://www.python.org/)
Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or above
versions)
Install "python-3.10.6-amd64.exe"
PACKAGE INSTALLATION
Open command prompt and enter the following code to check whether the python was installed
properly or not, “python –version”. If installation is proper it returns the version of python
Enter the following code to check whether the python package manager was installed properly
or not, “pip –version”.
If installation is proper it returns the version of python package manager
Enter the following code to install the Numpy library: pip install numpy
Enter the following code to install the SciPy library: pip install scipy
Enter the following code to install the Statsmodels library: pip install statsmodels
Enter the following code to install the Pandas library: pip install Pandas
Enter the following code to install the Jupyter: pip install Jupyter
OUTPUT:
PROGRAM:
1. Creating Arrays:
0-D Arrays
Each value in an array is a 0-D array.
import numpy as np
arr = np.array(42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.
import numpy as np
arr = np.array([1, 2,3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
2. Array Dimensions:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)
3. Access 2-D Arrays:
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
4. Access 3-D Arrays:
To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
5. Array Slicing:
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]. We can also define the step, like
this: [start:end:step].
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
6. Data Types:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)
7. Copy & View:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.copy()
arr[0] = 42
print(arr)
print(x)
8. Make a view:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.view()
arr[0] = 42
print(arr) print(x)
9. Array Shape & Reshaping:
Array Shape NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
10. Array Reshaping:
Reshaping means changing the shape of an array. By reshaping we can add or remove
dimensions or change number of elements in each dimension.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2) print(newarr)
11. Array Iterating:
Iterating means going through elements one by one. As we deal with multi-
dimensional arrays in numpy, we can do this using basic for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)
12. Joining Array:
Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
13. Splitting Array:
Splitting is reverse operation of Joining. Joining merges multiple arrays into one and
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3)
print(newarr)
14. Searching Arrays:
We can search an array for a certain value, and return the indexes that get a match. To
search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4)
print(x)
15. Sorting:
Sorting means putting elements in an ordered sequence. Ordered sequence is any
sequence that has an order corresponding to elements, like numeric or alphabetical,
ascending or descending. The NumPy ndarray object has a function called sort(), that
will sort a specified array.
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
16. Filtering Arrays:
Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.
import numpy as np
arr = np.array([41, 42, 43, 44]) x = [True, False, True, False] newarr = arr[x]
print(newarr)
OUTPUT:
PROGRAM:
import numpy as np
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print("The first matrix value is ::>",a)
b = np.array([[2,3,4],[5,6,7], [8,9,10]])
print("The second matrix value is ::>",b)
mul= np.multiply(a,b)
add= np.add(a,b)
sub=np.subtract(a,b)
div=np.divide(a,b)
print("Addition Matrix Resultant is ::>",add)
print("Subtraction Matrix Resultant is ::>",sub)
print("Division Matrix Resultant is ::>",div)
print("Multiplication Matrix Resultant is ::>",mul)
OUTPUT:
PROGRAM:
import pandas as pd
df = pd.DataFrame({ 'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha
Hinton', 'Syed Wharton'],
'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22, 23]})
print("Original DataFrame:")
print(df)
df1 = df.copy(deep = True)
df = df.drop([0, 1])
df1 = df1.drop([2])
print("\nNew DataFrames:")
print(df) print(df1)
print('\n"one_to_one”: check if merge keys are unique in both left and right datasets:"')
df_one_to_one = pd.merge(df, df1, validate = "one_to_one")
print(df_one_to_one)
print('\n"one_to_many” or “1:m”: check if merge keys are unique in left dataset:')
df_one_to_many = pd.merge(df, df1, validate = "one_to_many")
print(df_one_to_many)
print('“many_to_one” or “m:1”: check if merge keys are unique in right dataset:')
df_many_to_one = pd.merge(df, df1, validate = "many_to_one")
print(df_many_to_one)
PROGRAM:
#DATA COLLECT
import pandas as pd
import numpy as np
importmatplotlib.pyplot as plt
importseaborn as sns
dataset=pd.read_csv("iris.txt")
dataset.head()
dataset=pd.read_excel("iris.xlsx")
dataset.head()
dataset=pd.read_csv("iris.csv")
dataset.head()
dataset.info()
dataset.Species.unique()
#EDA
dataset.describe()
dataset.corr()
dataset.Species.value_counts()
sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Sepal.Length","Sepal.Width")
add_legend()
sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Petal.Length","Petal.Widh")
add_legend()
sns.pairplot(dataset,hue="Species")
plt.hist(dataset["Sepal.Length"],bin=25);
sns.FacetGrid(dataset,hue="Species",size=6).map(sns.displot,"Sepal.Width").add_legend();
sns.boxplot(x='Species',y='Petal.Length',data=dataset)
#PREPROCESSING
fromsklearn.preprocessing import StandardScaler
ss=StandardScaler()
x=dataset.drop(['Species'],axis=1) y=dataset['Species']
scaler=ss.fit(x)
x_stdscaler=scaler.transform(x) x_stdscaler
fromsklearn.preprocessing import LabelEncoder
le=LabelEncoder()
y=le.fit_transform(y)
#SPLITTING
From sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
x_train.value_counts
#MODEL SELECTION
From sklearn.svm import SVC
svc=SVC(kernel="linear")
svc.fit(x_train,y_train)
y_pred=svc.predict(x_test)
y_pred
fromsklearn.metrics import accuracy_score
accuracy_score(y_pred,y_test)
#PREDICTION
fromsklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train,y_train)
KNeighborsClassifier(n_neighbors=3)
y_pred=knn.predict(x_test)
accuracy_score(y_pred,y_test)
OUTPUT:
DATASET HEADS:
Unnamed Sepal. Sepal.
Petal.Length Petal.Width Species
:0 Length Width
0 1 5.1 3.5 1.4 0.2 setosa
1 2 4.9 3.0 1.4 0.2 setosa
2 3 4.7 3.2 1.3 0.2 setosa
3 4 4.6 3.1 1.5 0.2 setosa
4 5 5.0 3.6 1.4 0.2 setosa
DATASET INFORMATION:
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
0 Unnamed: 0 150 non-null int64
1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
DATASET UNIQUE:
array(['setosa', 'versicolor', 'virginica'], dtype=object)
DATASET SPECIES VALUE COUNTS:
setosa 50
versicolor 50
virginica 50
Name: Species, dtype: int64
DATASET DESCRIPTION:
Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width
150.0000
count 150.000000 150.000000 150.000000 150.000000
00
mean 75.500000 5.843333 3.057333 3.758000 1.199333
std 43.445368 0.828066 0.435866 1.765298 0.762238
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
Sepal.Length
Unnamed: 0 sepal.Width Petal.Length Petal.Width
Unnamed: 0 1.000000 0.716676 -0.402301 0.882637 0.900027
Sepal.Length 0.716676 1.000000 -0.117570 0.871754 0.817941
Sepal.Width -0.402301 -0.117570 1.000000 -0.428440 -0.366126
Petal.Length 0.882637 0.871754 -0.428440 1.000000 0.962865
Petal.Width 0.900027 0.817941 0.366126 0.962865 1.000000
DATASET CORRELATION:
SCATTER PLOT:
PAIRPLOT:
HISTOGRAM:
BOXPLOT:
PREPROCESSING:
array([[-1.72054204e+00, -9.00681170e-01, 1.01900435e+00,
-1.34022653e+00, -1.31544430e+00],
[-1.69744751e+00, -1.14301691e+00, -1.31979479e-01,
-1.34022653e+00, -1.31544430e+00],
[-1.67435299e+00, -1.38535265e+00, 3.28414053e-01,
-1.39706395e+00, -1.31544430e+00],
[-1.65125846e+00, -1.50652052e+00, 9.82172869e-02,
-1.28338910e+00, -1.31544430e+00],
[-1.58197489e+00, -1.50652052e+00, 7.88807586e-01, [-2.42492502e-01, -2.94841818e-01, -
3.62176246e-01, 7.62758269e-01, 7.90670654e-01]])
SPLITTING:
bound method DataFrame.value_counts of Unnamed: 0
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
81 82 5.5 2.4 3.7 1.0
133 134 6.3 2.8 5.1 1.5
137 138 6.4 3.1 5.5 1.8
75 76 6.6 3.0 4.4 1.4
109 110 7.2 3.6 6.1 2.5
.. ... ... ... ... ...
71 72 6.1 2.8 4.0 1.3
106 107 4.9 2.5 4.5 1.7
14 15 5.8 4.0 1.2 0.2
92 93 5.8 2.6 4.0 1.2
102 103 7.1 3.0 5.9 2.1
[105 rows x 5 columns]>
MODEL SELECTION:
1.0
PREDICTION:
1.0
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("diabetes_csv.csv")
df.head()
df.skin.value_counts()
df.mean(axis = 0)
print(df.loc[:,'skin'].mean())
df.mean(axis = 1)[0:5]
df.median()
print(df.loc[:,'skin'].median())
df.median(axis = 1)[0:5] df.mode()
df.std() print(df.loc[:,'skin'].std())
df.std(axis = 1)[0:5]
df.var()
print(df.skew())
df.describe()
df.describe(include='all')
print(df.kurtosis())
norm_data = pd.DataFrame(np.random.normal(size=100000)) norm_data.plot(kind="density",
figsize=(10,10));
# Plot black line at mean
plt.vlines(norm_data.mean(), ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median
plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0, color="red");
OUTPUT:
HEAD DATA’S:
preg Plas pres skin insu mass pedi age class
0 6 148 72 35 0 33.6 0.627 50 tested_positive
1 1 85 66 29 0 26.6 0.351 31 tested_negative
2 8 183 64 0 0 23.3 0.672 32 tested_positive
3 1 89 66 23 94 28.1 0.167 21 tested_negative
4 0 137 40 35 168 43.1 2.288 33 tested_positive
FREQUENCY:
0 227
32 31
30 27
27 23
23 22
33 20
28 20
18 20
31 19
19 18
39 18
29 17
40 16
25 16
MEAN:
20.536458333333332
0 43.153375
1 29.868875
2 38.871500
3 40.283375
4 57.298500
dtype: float64
MODE:
preg plas pres skin insu mass pedi age class
0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative
1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN
MEDIAN:
23.0
0 34.30
1 27.80
2 15.65
3 25.55
4 37.50
dtype: float64
STANDARD DEVIATION:
15.952217567727677
0 49.397286
1 31.519803
2 62.253392
3 37.591100
4 61.533847
VARIANCE:
preg 11.354056
plas 1022.248314
pres 374.647271
skin 254.473245
insu 13281.180078
mass 62.159984
pedi 0.109779
age 138.303046
dtype: float64
SKEWNESS:
preg 0.901674
plas 0.173754
pres -1.843608
skin 0.109372
insu 2.272251
dtype: float64
KURTOSIS:
preg 0.159220
plas 0.640780
pres 5.180157
skin -0.520072
insu 7.214260
mass 3.290443
pedi 5.594954
age 0.643159
dtype: float64
GRAPH:
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("pima-indians-diabetes.csv")
df.head()
df.mean(axis = 0)
print(df.loc[:,'35'].mean())
df.mean(axis = 1)[0:5]
df.median()
print(df.loc[:,'33.6'].median())
df.median(axis = 1)[0:5] df.mode()
df.std()
print(df.loc[:,'35'].std())
df.std(axis = 1)[0:5] df.var()
print(df.skew())
print(df.kurtosis())
norm_data = pd.DataFrame(np.random.normal(size=100000))
norm_data.plot(kind="density",figsize=(10,10));
# Plot black line at mean
plt.vlines(norm_data.mean(),ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median
plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0,color="red");
OUTPUT:
HEAD DATA’S:
6 148 72 35 0 33.6 0.627 50 1
0 1 85 66 29 0 26.6 0.351 31 0
1 8 183 64 0 0 23.3 0.672 32 1
2 1 89 66 23 94 28.1 0.167 21 0
3 0 137 40 35 168 43.1 2.288 33 1
4 5 116 74 0 0 25.6 0.201 30 0
MEAN:
20.517601043024772
0 26.550111
1 34.663556
2 35.807444
3 51.043111
4 27.866778
dtype: float64
MODE:
6 148 72 35 0 33.6 0.627 50 1
0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0
1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN
MEDIAN:
32.0
0 26.6
1 8.0
2 23.0
3 35.0
4 5.0
dtype: float64
STANDARD DEVIATION:
15.954059060433842
0 31.119744
1 59.585320
2 37.639873
3 60.541569
4 41.114755
dtype: float64
VARIANCE:
6 11.362809
148 1022.622445
72 375.125415
35 254.532001
0 13290.194335
33.6 62.237755
0.627 0.109890
50 138.116452
1 0.227226
dtype: float64
SKEWNESS:
6 0.903976
148 0.176412
72 -1.841911
35 0.112058
0 2.270630
33.6 -0.427950
0.627 1.921190
50 1.135165
1 0.638949
dtype: float64
KURTOSIS:
6 0.161293
148 0.642992
72 5.168578
35 -0.518325
0 7.205266
33.6 3.282498
0.627 5.593374
50 0.660872
1 -1.595913
dtype: float64
GRAPH:
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
%matplotlib inline
diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\diabetes.csv")
diabetes.head()
diabetes = datasets.load_diabetes()
print(diabetes.DESCR)
diabetes.feature_names
# Now we will split the data into the independent and independent variable
X = diabetes.data[:,np.newaxis,3]
Y = diabetes.target
#We will split the data into training and testing data fromsklearn.model_selection
import train_test_split x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)
# Linear Regression
fromsklearn.linear_model import LinearRegression
reg=LinearRegression()
reg.fit(x_train,y_train)
y_pred = reg.predict(x_test)
Coef=reg.coef_
print(Coef)
fromsklearn.metrics import mean_squared_error, r2_score
MSE=mean_squared_error(y_test,y_pred)
R2=r2_score(y_test,y_pred) print(R2,MSE)
frommatplotlib.pyplot
import * importmatplotlib.pyplot as plt
plt.scatter(y_pred, y_test)
plt.title('Predicted data vs Real Data')
plt.xlabel('y_pred') plt.ylabel('y_test')
plt.show() plt.scatter(x_test, y_test)
plt.plot(x_test,y_pred,linewidth=2)
plt.title('Linear Regression')
plt.xlabel('y_pred')
plt.ylabel('y_test')
plt.show()
model = LogisticRegression()
model.fit(x_train,y_train)
y_predict=model.predict(x_test)
model_score = model.score(x_test,y_test)
print(model_score)
print(metrics.confusion_matrix(y_test, y_predict))
OUTPUT:
DIABETES DESCRIPTION:
Diabetes dataset
Ten baseline variables, age, sex, body mass index, average blood
Pressure, and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a
Quantitative measure of disease progression one year after baseline.
**Data Set Characteristics: **
: Number of Instances: 442
: Number of Attributes: First 10 columns are numeric predictive values
: Target: Column 11 is a quantitative measure of disease progression one year after
baseline
: Attribute Information:
- Age age in years
- Sex
- bmi body mass index
- bp average blood pressure
- s1 tc, total serum cholesterol
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, total cholesterol / HDL
- s5 ltg, possibly log of serum triglycerides level
- s6 glu, blood sugar level
COEFFICIENT VALUE:
[731.87600042]
MEAN SQUARE ERROR AND R2 VALUE:
0.16465773342986756 & 4765.090270861111
PREDICTED DATA VS REAL DATA:
LINEAR REGRESSION:
MODEL SCORE FOR LOGISTIC REGRESSION:
0.007518796992481203
CONFUSION MATRIX FOR LOGISTIC REGRESSION:
[[130 17]
[ 38 46]]
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets %matplotlib inline
diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\FDS LAb\\diabetes.csv")
diabetes.head()
importstatsmodels.api as sm
fromstatsmodels.stats.anova import anova_lm
X = diabetes[["Age", "BMI"]]## the input variables
y = diabetes["Glucose"] ## the output variables, the one you want to predict
X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model
# Note the difference in argument order model2 = sm.OLS(y, X).fit()
predictions = model2.predict(X) # make the predictions by the model # Print out the
statistics
model2.summary()
OUTPUT:
HEAD DATA’S:
Blood Skin DiabetesPedigree
Pregnancies Glucose Insulin BMI Age Outcome
Pressure Thickness
Function
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3
1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
OLS Regression Results
Dep. Variable: Glucose R-squared: 0.114
Model: OLS Adj. R-squared: 0.112
Method: Least Squares F-statistic: 49.33
Date: Tue, 08 Nov 2022 Prob (F-statistic): 7.05e-21
Time: 22:28:35 Log-Likelihood: -3703.7
No. Observations: 768 AIC: 7413.
Df Residuals: 765 BIC: 7427.
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 70.2952 5.402 13.013 0.000 59.691 80.899
Age 0.6955 0.093 7.514 0.000 0.514 0.877
BMI 0.8589 0.138 6.220 0.000 0.588 1.130
Omnibus: 18.855 Durbin-Watson: 1.836
Prob(Omnibus): 0.000 Jarque-Bera (JB): 38.868
Skew: -0.007 Prob(JB): 3.63e-09
Kurtosis: 4.102 Cond. No. 235.
PROGRAM:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
importseaborn as sn
%matplotlib inline importseaborn as sns
importmatplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")
df.head()
mean = df.loc[:,'Fare'].mean()
sd = df.loc[:,'Fare'].std()
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()
OUTPUT:
NORMAL CURVE:
PROGRAM:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
importseaborn as sn
%matplotlib inline importseaborn as sns
importmatplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")
df.head()
sns.distplot(df["Fare"]) sns.distplot(df["Age"])
plt.contour(df[["Fare","Parch"]])
OUTPUT:
DENSITY PLOT:
CONTOUR PLOT:
PROGRAM:
import numpy as np
import pandas as pd
importseaborn as sn
%matplotlib inline importseaborn as sns
importmatplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()
plt.figure(figsize=(8,8))
sn.scatterplot(x="Age", y="Fare", hue="Sex", data=df) plt.show()
df.corr()
# plotting correlation heatmap
dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True) # displaying heatmap
plt.show()
OUTPUT:
SCATTER PLOT:
HEAP MAP:
PROGRAM:
import numpy as np
import pandas as pd
importseaborn as sn
%matplotlib inline importseaborn as sns
importmatplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")
df.head()
plt.hist(df["Fare"])
OUTPUT:
HISTOGRAM:
array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),
array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,
307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),
<BarContainer object of 10 artists>)
PROGRAM:
import numpy as np
import pandas as pd
importseaborn as sn
%matplotlib inline importseaborn as sns
importmatplotlib.pyplot as plt
frommpl_toolkits import mplot3d df=pd.read_csv("C:\\Users\\KSK\\Documents\\
train.csv") df.head()
%matplotlib inline
fig = plt.figure(figsize=(8,8)) ax = plt.axes(projection='3d') ax =
plt.axes(projection='3d') zline = np.linspace(0, 15, 1000) xline = np.sin(zline)
yline = np.cos(zline) ax.plot3D(xline, yline, zline, 'gray') zdata = df[["Fare"]]
xdata = df[["Age"]]
ydata = df[["Parch"]]
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
OUTPUT:
THREE DIMENSIONAL LINES:
THREE DIMENSIONAL SCATTERPLOT:
PROGRAM:
%matplotlib inline import numpy as np
import matplotlib.pyplot as plt
frommpl_toolkits.basemap i
mport Basemap plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6,
lat_0=45, lon_0=-100,) m.etopo(scale=0.5, alpha=0.5) x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, ' Seattle', fontsize=12);
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, ) draw_map(m)
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)
draw_map(m)
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)
draw_map(m);
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45,
lat_2=55, width=1.6E7, height=1.2E7)
draw_map(m)
OUTPUT:
ORTHO PROJECTION:
MAPPING LONGITUDE AND LATITUDE:
CYLINDRICAL PROJECTIONS:
PSEUDO-CYLINDRICAL PROJECTIONS:
PERSPECTIVE PROJECTION:
CONIC PROJECTION: