0% found this document useful (0 votes)

40 views55 pages

FINAL FDS MANUAL Print

The document provides an overview of several essential Python packages, including NumPy for array processing, SciPy for scientific computations, Pandas for data manipulation, Statsmodels for statistical modeling, and Jupyter for interactive computing. It also outlines the installation process for Python and these packages, along with example code snippets demonstrating basic functionalities such as array creation, manipulation, and data analysis. The document serves as a comprehensive guide for users looking to leverage these tools for data science and statistical analysis.

Uploaded by

durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views55 pages

FINAL FDS MANUAL Print

Uploaded by

durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 55

FEATURES OF PYTHON PACKAGES:

1. NUMPY
One of the most fundamental packages in Python, NumPy is a general-purpose array-
processing package. It provides high-performance multidimensional array objects and tools
to work with the arrays. NumPy is an efficient container of generic multi-dimensional data.
NumPy’s main object is the homogeneous multidimensional array. It is a table of Elements
or numbers of the same datatype, indexed by a tuple of positive integers. In NumPy,
dimensions are called axes and the number of axes is called rank. NumPy’s array class is
called ndarray aka array.
 Basic array operations: add, multiply, slice, flatten, reshape, index arrays
 Advanced array operations: stack arrays, split into sections, broadcast arrays
 Work with DateTime or Linear Algebra
 Basic Slicing and Advanced Indexing in NumPy Python.

2. SCIPY
The SciPy library is one of the core packages that make up the SciPy stack. Now, there is a
difference between SciPy Stack and SciPy, the library. SciPy builds on the NumPy array
object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with
additional tools, SciPy library contains modules for efficient mathematical routines as linear
algebra, interpolation, optimization, integration, and statistics. There are various issues
related to Scientific Computation that arises while working with data science.
 SciPy provides us with a variety of sub-packages to solve these issues efficiently.
 SciPy library has amazingly fast computational power and easy to use.
 It can operate an array of NumPy libraries and has also optimized the functions used
in NumPy.
 After GNU Scientific library, SciPy is one of the most used scientific libraries.

3. PANDAS
Pandas is an open-source Python package that provides high-performance, easy-to-use
data structures and data analysis tools for the labeled data in Python programming
language. Pandas stand for Python Data Analysis Library. Pandas is a perfect tool for data
wrangling or munging. It is designed for quick and easy data manipulation, reading,
aggregation, and visualization. Pandas take data in a CSV or TSV file or a SQL database
and create a Python object with rows and columns called a data frame. The data frame is
very similar to a table in statistical software, say Excel or SPSS.

 Indexing, manipulating, renaming, sorting, merging data frame

 Update, Add, Delete columns from a data frame
 Impute missing files, handle missing data or NANs
 Plot data with histogram or box plot

4. STATSMODELS
Statsmodels is built for hardcore statistics. The core of the Statsmodels Library is
production ready”. Traditional models like robust linear models, generalized linear model
(GLM) etc. have all been around for a long time and have been validated against “R &
Stata”. It also contains the time series analysis section, which includes vector
autoregression (VAR), AR & ARMA.
 Linear/ Multiple regression – Linear regression is a statistical method for modeling
the linear relationship between a dependent variable and one or more explanatory
variables.
 Logistic regression – The logistic model is used in statistics to model the
likelihood of a specific event/class occurring such as win/lose, pass/fail, etc.
 Time series analysis – It refers to the analysis of time series data to retrieve
meaningful statistics and many other data characteristics
 Statistical tests – Refers to the many statistical tests that can be done using the
Statsmodels Library.
5. JUPYTER
Project Jupyter is a suite of software products used in interactive computing. Packages
under Jupyter project include
Jupyter notebook − A web based interface to programming environments of Python,
Julia, R and many others
QtConsole − Qt based terminal for Jupyter kernels similar to IPython
nbviewer − Facility to share Jupyter notebooks
JupyterLab − Modern web based integrated interface for all products.
 Offers a powerful interactive Python shell.
 Acts as a main kernel for Jupyter notebook and other front end tools of Project
Jupyter.
 Possesses object introspection ability. Introspection is the ability to check
properties of an object during runtime.
 Syntax highlighting.
 Stores the history of interactions.
 Tab completion of keywords, variables and function names.
 Magic command system useful for controlling Python environment and
performing OS tasks.
PYTHON INSTALLATION
 Open the python official web site. (https://www.python.org/)
 Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or above
versions)
 Install "python-3.10.6-amd64.exe"

PACKAGE INSTALLATION
Open command prompt and enter the following code to check whether the python was installed
properly or not, “python –version”. If installation is proper it returns the version of python

Enter the following code to check whether the python package manager was installed properly
or not, “pip –version”.

If installation is proper it returns the version of python package manager

 Enter the following code to install the Numpy library: pip install numpy
 Enter the following code to install the SciPy library: pip install scipy
 Enter the following code to install the Statsmodels library: pip install statsmodels
 Enter the following code to install the Pandas library: pip install Pandas
 Enter the following code to install the Jupyter: pip install Jupyter
OUTPUT:
PROGRAM:

1. Creating Arrays:

 0-D Arrays
Each value in an array is a 0-D array.

import numpy as np
arr = np.array(42)
print(arr)
 1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.

import numpy as np
arr = np.array([1, 2,3, 4, 5])
print(arr)
 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
2. Array Dimensions:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)
3. Access 2-D Arrays:
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.

import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])

4. Access 3-D Arrays:

To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

5. Array Slicing:
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]. We can also define the step, like
this: [start:end:step].

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])

6. Data Types:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

7. Copy & View:

import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.copy()
arr[0] = 42
print(arr)
print(x)

8. Make a view:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.view()
arr[0] = 42
print(arr) print(x)

9. Array Shape & Reshaping:

Array Shape NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

10. Array Reshaping:

Reshaping means changing the shape of an array. By reshaping we can add or remove
dimensions or change number of elements in each dimension.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2) print(newarr)

11. Array Iterating:

Iterating means going through elements one by one. As we deal with multi-
dimensional arrays in numpy, we can do this using basic for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)

12. Joining Array:

Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

13. Splitting Array:

Splitting is reverse operation of Joining. Joining merges multiple arrays into one and
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3)
print(newarr)

14. Searching Arrays:

We can search an array for a certain value, and return the indexes that get a match. To
search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4)
print(x)

15. Sorting:
Sorting means putting elements in an ordered sequence. Ordered sequence is any
sequence that has an order corresponding to elements, like numeric or alphabetical,
ascending or descending. The NumPy ndarray object has a function called sort(), that
will sort a specified array.
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
16. Filtering Arrays:
Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.
import numpy as np
arr = np.array([41, 42, 43, 44]) x = [True, False, True, False] newarr = arr[x]
print(newarr)

OUTPUT:
PROGRAM:

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

print("The first matrix value is ::>",a)

b = np.array([[2,3,4],[5,6,7], [8,9,10]])

print("The second matrix value is ::>",b)

mul= np.multiply(a,b)

add= np.add(a,b)

sub=np.subtract(a,b)

div=np.divide(a,b)

print("Addition Matrix Resultant is ::>",add)

print("Subtraction Matrix Resultant is ::>",sub)

print("Division Matrix Resultant is ::>",div)

print("Multiplication Matrix Resultant is ::>",mul)

OUTPUT:
PROGRAM:

import pandas as pd

df = pd.DataFrame({ 'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha

Hinton', 'Syed Wharton'],

'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'Age': [18.5, 21.2, 22.5, 22, 23]})

print("Original DataFrame:")

print(df)

df1 = df.copy(deep = True)

df = df.drop([0, 1])

df1 = df1.drop([2])

print("\nNew DataFrames:")

print(df) print(df1)

print('\n"one_to_one”: check if merge keys are unique in both left and right datasets:"')

df_one_to_one = pd.merge(df, df1, validate = "one_to_one")

print(df_one_to_one)

print('\n"one_to_many” or “1:m”: check if merge keys are unique in left dataset:')

df_one_to_many = pd.merge(df, df1, validate = "one_to_many")

print(df_one_to_many)

print('“many_to_one” or “m:1”: check if merge keys are unique in right dataset:')

df_many_to_one = pd.merge(df, df1, validate = "many_to_one")

print(df_many_to_one)
PROGRAM:

#DATA COLLECT

import pandas as pd

import numpy as np

importmatplotlib.pyplot as plt

importseaborn as sns

dataset=pd.read_csv("iris.txt")

dataset.head()

dataset=pd.read_excel("iris.xlsx")

dataset.head()

dataset=pd.read_csv("iris.csv")

dataset.head()

dataset.info()

dataset.Species.unique()

#EDA

dataset.describe()

dataset.corr()

dataset.Species.value_counts()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Sepal.Length","Sepal.Width")

add_legend()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Petal.Length","Petal.Widh")

add_legend()

sns.pairplot(dataset,hue="Species")

plt.hist(dataset["Sepal.Length"],bin=25);

sns.FacetGrid(dataset,hue="Species",size=6).map(sns.displot,"Sepal.Width").add_legend();
sns.boxplot(x='Species',y='Petal.Length',data=dataset)

#PREPROCESSING

fromsklearn.preprocessing import StandardScaler

ss=StandardScaler()

x=dataset.drop(['Species'],axis=1) y=dataset['Species']

scaler=ss.fit(x)

x_stdscaler=scaler.transform(x) x_stdscaler

fromsklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

#SPLITTING

From sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

x_train.value_counts

#MODEL SELECTION

From sklearn.svm import SVC

svc=SVC(kernel="linear")

svc.fit(x_train,y_train)

y_pred=svc.predict(x_test)

y_pred

fromsklearn.metrics import accuracy_score

accuracy_score(y_pred,y_test)

#PREDICTION

fromsklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=3)

knn.fit(x_train,y_train)
KNeighborsClassifier(n_neighbors=3)

y_pred=knn.predict(x_test)

accuracy_score(y_pred,y_test)

OUTPUT:

DATASET HEADS:

Unnamed Sepal. Sepal.

Petal.Length Petal.Width Species
:0 Length Width

0 1 5.1 3.5 1.4 0.2 setosa

1 2 4.9 3.0 1.4 0.2 setosa

2 3 4.7 3.2 1.3 0.2 setosa

3 4 4.6 3.1 1.5 0.2 setosa

4 5 5.0 3.6 1.4 0.2 setosa

DATASET INFORMATION:
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype

0 Unnamed: 0 150 non-null int64

1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

DATASET UNIQUE:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

DATASET SPECIES VALUE COUNTS:

setosa 50

versicolor 50

virginica 50

Name: Species, dtype: int64

DATASET DESCRIPTION:

Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width

150.0000
count 150.000000 150.000000 150.000000 150.000000
00

mean 75.500000 5.843333 3.057333 3.758000 1.199333

std 43.445368 0.828066 0.435866 1.765298 0.762238

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000

Sepal.Length
Unnamed: 0 sepal.Width Petal.Length Petal.Width

Unnamed: 0 1.000000 0.716676 -0.402301 0.882637 0.900027

Sepal.Length 0.716676 1.000000 -0.117570 0.871754 0.817941

Sepal.Width -0.402301 -0.117570 1.000000 -0.428440 -0.366126

Petal.Length 0.882637 0.871754 -0.428440 1.000000 0.962865

Petal.Width 0.900027 0.817941 0.366126 0.962865 1.000000

DATASET CORRELATION:

SCATTER PLOT:
PAIRPLOT:

HISTOGRAM:
BOXPLOT:

PREPROCESSING:

array([[-1.72054204e+00, -9.00681170e-01, 1.01900435e+00,

-1.34022653e+00, -1.31544430e+00],

[-1.69744751e+00, -1.14301691e+00, -1.31979479e-01,

-1.34022653e+00, -1.31544430e+00],

[-1.67435299e+00, -1.38535265e+00, 3.28414053e-01,

-1.39706395e+00, -1.31544430e+00],

[-1.65125846e+00, -1.50652052e+00, 9.82172869e-02,

-1.28338910e+00, -1.31544430e+00],

[-1.58197489e+00, -1.50652052e+00, 7.88807586e-01, [-2.42492502e-01, -2.94841818e-01, -

3.62176246e-01, 7.62758269e-01, 7.90670654e-01]])

SPLITTING:

bound method DataFrame.value_counts of Unnamed: 0

Sepal.LengthSepal.WidthPetal.LengthPetal.Width

81 82 5.5 2.4 3.7 1.0

133 134 6.3 2.8 5.1 1.5

137 138 6.4 3.1 5.5 1.8

75 76 6.6 3.0 4.4 1.4

109 110 7.2 3.6 6.1 2.5

.. ... ... ... ... ...

71 72 6.1 2.8 4.0 1.3

106 107 4.9 2.5 4.5 1.7

14 15 5.8 4.0 1.2 0.2

92 93 5.8 2.6 4.0 1.2

102 103 7.1 3.0 5.9 2.1

[105 rows x 5 columns]>

MODEL SELECTION:

1.0

PREDICTION:

1.0

PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("diabetes_csv.csv")

df.head()

df.skin.value_counts()

df.mean(axis = 0)

print(df.loc[:,'skin'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'skin'].median())

df.median(axis = 1)[0:5] df.mode()

df.std() print(df.loc[:,'skin'].std())

df.std(axis = 1)[0:5]

df.var()
print(df.skew())

df.describe()

df.describe(include='all')

print(df.kurtosis())

norm_data = pd.DataFrame(np.random.normal(size=100000)) norm_data.plot(kind="density",

figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(), ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0, color="red");

OUTPUT:

HEAD DATA’S:

preg Plas pres skin insu mass pedi age class

0 6 148 72 35 0 33.6 0.627 50 tested_positive

1 1 85 66 29 0 26.6 0.351 31 tested_negative

2 8 183 64 0 0 23.3 0.672 32 tested_positive

3 1 89 66 23 94 28.1 0.167 21 tested_negative

4 0 137 40 35 168 43.1 2.288 33 tested_positive

FREQUENCY:

0 227
32 31
30 27
27 23
23 22
33 20
28 20
18 20
31 19
19 18
39 18
29 17
40 16
25 16

MEAN:

20.536458333333332

0 43.153375

1 29.868875

2 38.871500

3 40.283375

4 57.298500

dtype: float64

MODE:
preg plas pres skin insu mass pedi age class

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

23.0

0 34.30

1 27.80

2 15.65

3 25.55

4 37.50

dtype: float64
STANDARD DEVIATION:

15.952217567727677

0 49.397286

1 31.519803

2 62.253392

3 37.591100

4 61.533847

VARIANCE:

preg 11.354056

plas 1022.248314

pres 374.647271

skin 254.473245

insu 13281.180078

mass 62.159984

pedi 0.109779

age 138.303046

dtype: float64

SKEWNESS:

preg 0.901674

plas 0.173754

pres -1.843608

skin 0.109372

insu 2.272251

dtype: float64

KURTOSIS:
preg 0.159220

plas 0.640780

pres 5.180157

skin -0.520072

insu 7.214260

mass 3.290443

pedi 5.594954

age 0.643159

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("pima-indians-diabetes.csv")

df.head()

df.mean(axis = 0)

print(df.loc[:,'35'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'33.6'].median())

df.median(axis = 1)[0:5] df.mode()

df.std()

print(df.loc[:,'35'].std())

df.std(axis = 1)[0:5] df.var()

print(df.skew())

print(df.kurtosis())
norm_data = pd.DataFrame(np.random.normal(size=100000))
norm_data.plot(kind="density",figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(),ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0,color="red");

OUTPUT:

HEAD DATA’S:

6 148 72 35 0 33.6 0.627 50 1

0 1 85 66 29 0 26.6 0.351 31 0

1 8 183 64 0 0 23.3 0.672 32 1

2 1 89 66 23 94 28.1 0.167 21 0

3 0 137 40 35 168 43.1 2.288 33 1

4 5 116 74 0 0 25.6 0.201 30 0

MEAN:

20.517601043024772

0 26.550111

1 34.663556

2 35.807444

3 51.043111

4 27.866778

dtype: float64
MODE:

6 148 72 35 0 33.6 0.627 50 1

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

32.0

0 26.6

1 8.0

2 23.0

3 35.0

4 5.0

dtype: float64

STANDARD DEVIATION:

15.954059060433842

0 31.119744

1 59.585320

2 37.639873

3 60.541569

4 41.114755

dtype: float64

VARIANCE:
6 11.362809

148 1022.622445

72 375.125415

35 254.532001

0 13290.194335

33.6 62.237755

0.627 0.109890

50 138.116452

1 0.227226

dtype: float64

SKEWNESS:

6 0.903976

148 0.176412

72 -1.841911

35 0.112058

0 2.270630

33.6 -0.427950

0.627 1.921190

50 1.135165

1 0.638949

dtype: float64

KURTOSIS:

6 0.161293

148 0.642992

72 5.168578

35 -0.518325

0 7.205266

33.6 3.282498
0.627 5.593374

50 0.660872

1 -1.595913

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

%matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\diabetes.csv")

diabetes.head()

diabetes = datasets.load_diabetes()

print(diabetes.DESCR)

diabetes.feature_names

# Now we will split the data into the independent and independent variable

X = diabetes.data[:,np.newaxis,3]

Y = diabetes.target

#We will split the data into training and testing data fromsklearn.model_selection

import train_test_split x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)

# Linear Regression

fromsklearn.linear_model import LinearRegression

reg=LinearRegression()

reg.fit(x_train,y_train)

y_pred = reg.predict(x_test)

Coef=reg.coef_

print(Coef)

fromsklearn.metrics import mean_squared_error, r2_score

MSE=mean_squared_error(y_test,y_pred)

R2=r2_score(y_test,y_pred) print(R2,MSE)

frommatplotlib.pyplot

import * importmatplotlib.pyplot as plt

plt.scatter(y_pred, y_test)

plt.title('Predicted data vs Real Data')

plt.xlabel('y_pred') plt.ylabel('y_test')

plt.show() plt.scatter(x_test, y_test)

plt.plot(x_test,y_pred,linewidth=2)

plt.title('Linear Regression')

plt.xlabel('y_pred')

plt.ylabel('y_test')

plt.show()

model = LogisticRegression()

model.fit(x_train,y_train)

y_predict=model.predict(x_test)

model_score = model.score(x_test,y_test)

print(model_score)

print(metrics.confusion_matrix(y_test, y_predict))
OUTPUT:

DIABETES DESCRIPTION:

Diabetes dataset

Ten baseline variables, age, sex, body mass index, average blood

Pressure, and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a

Quantitative measure of disease progression one year after baseline.

Data Set Characteristics:

: Number of Instances: 442

: Number of Attributes: First 10 columns are numeric predictive values

: Target: Column 11 is a quantitative measure of disease progression one year after

baseline

: Attribute Information:

- Age age in years

- Sex

- bmi body mass index

- bp average blood pressure

- s1 tc, total serum cholesterol

- s2 ldl, low-density lipoproteins

- s3 hdl, high-density lipoproteins

- s4 tch, total cholesterol / HDL

- s5 ltg, possibly log of serum triglycerides level

- s6 glu, blood sugar level

COEFFICIENT VALUE:

[731.87600042]

MEAN SQUARE ERROR AND R2 VALUE:

0.16465773342986756 & 4765.090270861111

PREDICTED DATA VS REAL DATA:

LINEAR REGRESSION:
MODEL SCORE FOR LOGISTIC REGRESSION:

0.007518796992481203

CONFUSION MATRIX FOR LOGISTIC REGRESSION:

[[130 17]

[ 38 46]]
PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn import datasets %matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\FDS LAb\\diabetes.csv")

diabetes.head()

importstatsmodels.api as sm

fromstatsmodels.stats.anova import anova_lm

X = diabetes[["Age", "BMI"]]## the input variables

y = diabetes["Glucose"] ## the output variables, the one you want to predict

X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model

# Note the difference in argument order model2 = sm.OLS(y, X).fit()

predictions = model2.predict(X) # make the predictions by the model # Print out the
statistics
model2.summary()

OUTPUT:

HEAD DATA’S:

Blood Skin DiabetesPedigree

Pregnancies Glucose Insulin BMI Age Outcome
Pressure Thickness
Function

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3
1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

OLS Regression Results

Dep. Variable: Glucose R-squared: 0.114

Model: OLS Adj. R-squared: 0.112

Method: Least Squares F-statistic: 49.33

Date: Tue, 08 Nov 2022 Prob (F-statistic): 7.05e-21

Time: 22:28:35 Log-Likelihood: -3703.7

No. Observations: 768 AIC: 7413.

Df Residuals: 765 BIC: 7427.

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

const 70.2952 5.402 13.013 0.000 59.691 80.899

Age 0.6955 0.093 7.514 0.000 0.514 0.877

BMI 0.8589 0.138 6.220 0.000 0.588 1.130

Omnibus: 18.855 Durbin-Watson: 1.836

Prob(Omnibus): 0.000 Jarque-Bera (JB): 38.868

Skew: -0.007 Prob(JB): 3.63e-09

Kurtosis: 4.102 Cond. No. 235.

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

mean = df.loc[:,'Fare'].mean()

sd = df.loc[:,'Fare'].std()

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()
OUTPUT:

NORMAL CURVE:
PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

sns.distplot(df["Fare"]) sns.distplot(df["Age"])

plt.contour(df[["Fare","Parch"]])
OUTPUT:

DENSITY PLOT:

CONTOUR PLOT:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()

plt.figure(figsize=(8,8))

sn.scatterplot(x="Age", y="Fare", hue="Sex", data=df) plt.show()

df.corr()

# plotting correlation heatmap

dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True) # displaying heatmap

plt.show()
OUTPUT:

SCATTER PLOT:

HEAP MAP:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

plt.hist(df["Fare"])
OUTPUT:

HISTOGRAM:

array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),

array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,

307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),

<BarContainer object of 10 artists>)

PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

frommpl_toolkits import mplot3d df=pd.read_csv("C:\\Users\\KSK\\Documents\\

train.csv") df.head()

%matplotlib inline

fig = plt.figure(figsize=(8,8)) ax = plt.axes(projection='3d') ax =

plt.axes(projection='3d') zline = np.linspace(0, 15, 1000) xline = np.sin(zline)

yline = np.cos(zline) ax.plot3D(xline, yline, zline, 'gray') zdata = df[["Fare"]]

xdata = df[["Age"]]

ydata = df[["Parch"]]

ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');

OUTPUT:

THREE DIMENSIONAL LINES:

THREE DIMENSIONAL SCATTERPLOT:

PROGRAM:

%matplotlib inline import numpy as np

import matplotlib.pyplot as plt

frommpl_toolkits.basemap i

mport Basemap plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)

m.bluemarble(scale=0.5);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6,

lat_0=45, lon_0=-100,) m.etopo(scale=0.5, alpha=0.5) x, y = m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, ' Seattle', fontsize=12);

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90,

llcrnrlon=-180, urcrnrlon=180, ) draw_map(m)

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)

draw_map(m)

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)

draw_map(m);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45,

lat_2=55, width=1.6E7, height=1.2E7)

draw_map(m)

OUTPUT:

ORTHO PROJECTION:
MAPPING LONGITUDE AND LATITUDE:

CYLINDRICAL PROJECTIONS:
PSEUDO-CYLINDRICAL PROJECTIONS:

PERSPECTIVE PROJECTION:
CONIC PROJECTION:

Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Exploring Python Data Packages
No ratings yet
Exploring Python Data Packages
77 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Fds Record
No ratings yet
Fds Record
69 pages
Foundation of Data Science Lab Manual
No ratings yet
Foundation of Data Science Lab Manual
31 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Python Libraries
No ratings yet
Python Libraries
6 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Numpy Arrays and Data Manipulation Guide
No ratings yet
Numpy Arrays and Data Manipulation Guide
39 pages
Python NumPy for Developers
No ratings yet
Python NumPy for Developers
43 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
Lab-02 AI
No ratings yet
Lab-02 AI
14 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Self Numpy
No ratings yet
Self Numpy
6 pages
Num Py Pandas Interview Qa
No ratings yet
Num Py Pandas Interview Qa
7 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
LT2 - 07 - Numpy Matplotlib Pandas
101 pages
B14 - LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
B14 - LT2 - 07 - Numpy Matplotlib Pandas
101 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
NumPy Basics: Arrays & Computation Guide
No ratings yet
NumPy Basics: Arrays & Computation Guide
33 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Python for Scientific Computing: NumPy & Pandas
No ratings yet
Python for Scientific Computing: NumPy & Pandas
7 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
Python Numpy and Pandas Interview Questions
No ratings yet
Python Numpy and Pandas Interview Questions
16 pages
Python Modules & Data Tools Guide
No ratings yet
Python Modules & Data Tools Guide
9 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
Python Libraries Overview
No ratings yet
Python Libraries Overview
19 pages
Week 4 - Introduction To Python #3
No ratings yet
Week 4 - Introduction To Python #3
47 pages
Packages
No ratings yet
Packages
37 pages
Data Preprocessing-AIML Algorithm1
No ratings yet
Data Preprocessing-AIML Algorithm1
47 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
Unit IV Python Part1
No ratings yet
Unit IV Python Part1
23 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
NumPy and Pandas Basics for Data Analysis
No ratings yet
NumPy and Pandas Basics for Data Analysis
61 pages
Unit 5
No ratings yet
Unit 5
28 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
New Chat
No ratings yet
New Chat
30 pages
Nptel Presentation
No ratings yet
Nptel Presentation
24 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
Unit 3
No ratings yet
Unit 3
56 pages
CLOUD MANAGEMENT Techniques For Monitoring, Optimizing
No ratings yet
CLOUD MANAGEMENT Techniques For Monitoring, Optimizing
102 pages
Auto Scaling
No ratings yet
Auto Scaling
34 pages
Equivalence & Boundary Value
No ratings yet
Equivalence & Boundary Value
25 pages
Developer
No ratings yet
Developer
7 pages
Boundary Value Analysis
No ratings yet
Boundary Value Analysis
9 pages
Coordination and Agreement
No ratings yet
Coordination and Agreement
77 pages
QB 1 New
No ratings yet
QB 1 New
32 pages
It8761 Set 2
No ratings yet
It8761 Set 2
3 pages
CS8711 Set4
No ratings yet
CS8711 Set4
2 pages
CS8711 Set3
No ratings yet
CS8711 Set3
2 pages
Class VI Math: Decimals Exercises and Solutions
No ratings yet
Class VI Math: Decimals Exercises and Solutions
20 pages
Cryptography Exam Solutions
No ratings yet
Cryptography Exam Solutions
9 pages
GED 405 Presentation
No ratings yet
GED 405 Presentation
14 pages
Introductory Handbook On Policing Urban Spaces
No ratings yet
Introductory Handbook On Policing Urban Spaces
118 pages
Residential Building Plan
No ratings yet
Residential Building Plan
1 page
Group 2 PR-1
100% (1)
Group 2 PR-1
43 pages
HOTS-SOLO Model Re-Entry Plan
No ratings yet
HOTS-SOLO Model Re-Entry Plan
2 pages
Learning Package CIVE1129 4th Ed
No ratings yet
Learning Package CIVE1129 4th Ed
52 pages
6440 Pah
No ratings yet
6440 Pah
6 pages
Master American Accent: 9 Tips
No ratings yet
Master American Accent: 9 Tips
20 pages
Ordinance Barangay Reading Center
No ratings yet
Ordinance Barangay Reading Center
3 pages
Level: Certificate/Diploma/Bachelors Ref. Number: APEL: (Office Use Only)
No ratings yet
Level: Certificate/Diploma/Bachelors Ref. Number: APEL: (Office Use Only)
13 pages
Aspiring JET Program Educator
No ratings yet
Aspiring JET Program Educator
3 pages
Employee Expense Claim Form
No ratings yet
Employee Expense Claim Form
1 page
C Sharp Net Training 50
No ratings yet
C Sharp Net Training 50
2 pages
Lavagna SE Tech
No ratings yet
Lavagna SE Tech
137 pages
G 3407
No ratings yet
G 3407
2 pages
Crisc D1 Qa
No ratings yet
Crisc D1 Qa
280 pages
Apple's Iphone Air and The Marketing
No ratings yet
Apple's Iphone Air and The Marketing
2 pages
Gearboxes Series RD: Multipurpose Housing
No ratings yet
Gearboxes Series RD: Multipurpose Housing
21 pages
Complete Physics PDF Notes
No ratings yet
Complete Physics PDF Notes
241 pages
Programme Guide-Mcom Compressed
No ratings yet
Programme Guide-Mcom Compressed
105 pages
Half Yearly Exam Syllabus and Time Table Print This
No ratings yet
Half Yearly Exam Syllabus and Time Table Print This
3 pages
Mux Demux Encoder Decoder
No ratings yet
Mux Demux Encoder Decoder
18 pages
SQL Basics and Data Types Guide
No ratings yet
SQL Basics and Data Types Guide
110 pages
Workplace Safety Quiz Guide
No ratings yet
Workplace Safety Quiz Guide
2 pages
Year 4 Maths Test - Addition - Questions
No ratings yet
Year 4 Maths Test - Addition - Questions
4 pages
Image Processing Important Questions
No ratings yet
Image Processing Important Questions
2 pages
IIM Kozhikode: Final Placements Report 2018
No ratings yet
IIM Kozhikode: Final Placements Report 2018
8 pages
Pneumatic Conveyor Systems Guide
100% (3)
Pneumatic Conveyor Systems Guide
36 pages
BR111
No ratings yet
BR111
44 pages
Blombos Cave: Early Human Behavior
No ratings yet
Blombos Cave: Early Human Behavior
6 pages