0% found this document useful (0 votes)

46 views31 pages

Fds Lab Manual

The document outlines the Data Science Laboratory course at the Christian College of Engineering and Technology, detailing various experiments focused on Python programming and data science packages. It includes installation procedures for essential Python libraries such as NumPy, Pandas, and TensorFlow, along with practical examples and outputs for using these libraries. The document serves as a practical guide for students to learn data science concepts through hands-on experience.

Uploaded by

Vanitha Janarthanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views31 pages

Fds Lab Manual

Uploaded by

Vanitha Janarthanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

CHRISTIAN COLLEGE OF ENGINEERING AND TECHNOLOGY

ODDANCHATRAM – 624 619

DEPARTMENT OF INFORMATION TECHNOLOGY

CS3361 DATA SCIENCE LABORATORY

This is to certify that this is the bonafied record work done by

___________________________________in the Computer Laboratory of this institution, as
prescribed by the Anna University Chennai for the year semester B.Tech Practical
Examination, during ___________________.

HEAD OF THE DEPARTMENT STAFF INCHARGE

Submitted for the practical exam held on ____________________

Internal Examiner External Examiner

INDEX

SI.NO. DATE NAME OF THE EXPERIMENT PAGE.NO SIGNATURE

Ex. No: 01
Installation of Features for python
Aim:

Procedure:
Install Python Data Science Packages
Python is a high-level and general-purpose programming language with data science
and machine learning packages. Use the video below to install on Windows, MacOS, or Linux.
As a first step, install Python for Windows, MacOS, or Linux.
Python Packages
The power of Python is in the packages that are available either through the pip or conda
package managers. This page is an overview of some of the best packages for machine learning
and data science and how to install them.
We will explore the Python packages that are commonly used for data science and machine
learning. You may need to install the packages from the terminal, Anaconda prompt, command
prompt, or from the Jupyter Notebook. If you have multiple versions of Python or have specific
dependencies then use an environment manager such as pyenv. For most users, a single
installation is typically sufficient. The Python package manager pip has all of the packages
(such as gekko) that we need for this course. If there is an administrative access error, install
to the local profile with the --user flag.
pip install gekko
Gekko
Gekko provides an interface to gradient-based solvers for machine learning and optimization
of mixed-integer, differential algebraic equations, and time series models. Gekko provides
exact first and second derivatives through automatic differentiation and discretization with
simultaneous or sequential methods.
pip install gekko
Keras
Keras provides an interface for artificial neural networks. Keras acts as an interface for the
TensorFlow library. Other backend packages were supported until version 2.4. TensorFlow is
now the only backend and is installed separately with pip install tensorflow.
pip install keras
Matplotlib
The package matplotlib generates plots in Python.
pip install matplotlib
Numpy
Numpy is a numerical computing package for mathematics, science, and engineering. Many
data science packages use Numpy as a dependency.
pip install numpy
OpenCV
OpenCV (Open Source Computer Vision Library) is a package for real-time computer vision
and developed with support from Intel Research.
pip install opencv-python
Pandas
Pandas visualizes and manipulates data tables. There are many functions that allow efficient
manipulation for the preliminary steps of data analysis problems.
pip install pandas
Plotly
Plotly renders interactive plots with HTML and JavaScript. Plotly Express is included with
Plotly.
pip install plotly
PyTorch
PyTorch enables deep learning, computer vision, and natural language processing.
Development is led by Facebook's AI Research lab (FAIR).
pip install torch
Scikit-Learn
Scikit-Learn (or sklearn) includes a wide variety of classification, regression and clustering
algorithms including neural network, support vector machine, random forest, gradient
boosting, k-means clustering, and other supervised or unsupervised learning methods.
pip install scikit-learn
SciPy
SciPy is a general-purpose package for mathematics, science, and engineering and extends the
base capabilities of NumPy.
pip install scipy
Seaborn
Seaborn is built on matplotlib, and produces detailed plots in few lines of code.
pip install seaborn
Statsmodels
Statsmodels is a package for exploring data, estimating statistical models, and performing
statistical tests. It include descriptive statistics, statistical tests, plotting functions, and result
statistics.
pip install statsmodels
TensorFlow
TensorFlow is an open source machine learning platform with particular focus on training and
inference of deep neural networks. Development is led by the Google Brain team.
pip install tensorflow

Result:
Ex. No: 02
Working with NumPy
Aim:

Programs and output :

Type Program Output
Array import numpy [1 2 3 4 5]
arr = numpy.array([1, 2, 3, 4, 5])
print(arr)

import numpy as np
arr =np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]
print(arr)
Tuble import numpy as np [1 2 3 4 5]
arr = np.array((1, 2, 3, 4, 5))
print(arr)

0-D array with import numpy as np 42

value 42 arr = np.array(42)
print(arr)

1-D array import numpy as np [1 2 3 4 5]

containing the arr = np.array([1, 2, 3, 4, 5])
values 1,2,3,4,5 print(arr)

2-D array import numpy as np [[1 2 3]

containing two arr = np.array([[1, 2, 3], [4, 5, 6]]) [4 5 6]]
arrays with the print(arr)
values 1,2,3 and
4,5,6:

Check how many import numpy as np 0

dimensions the a = np.array(42) 1
arrays have b = np.array([1, 2, 3, 4, 5]) 2
c = np.array([[1, 2, 3], [4, 5, 6]]) 3
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2,
3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

NumPy Array import numpy as np 2

Indexing arr = np.array([1, 2, 3, 4])
print(arr[1])
import numpy as np 7
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])

Access the import numpy as np 2nd element on 1st row: 2

element on the arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
first row, second print('2nd element on 1st row: ', arr[0,
column: 1])

Access the import numpy as np 5th element on 2nd row:

element on the arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) 10
2nd row, 5th print('5th element on 2nd row: ', arr[1,
column: 4])

Slicing arrays import numpy as np [2 3 4 5]

We pass slice arr = np.array([1, 2, 3, 4, 5, 6, 7])
instead of index print(arr[1:5])
like this:
[start:end]. import numpy as np [5 6 7]
We can also arr = np.array([1, 2, 3, 4, 5, 6, 7])
define the step, print(arr[4:])
like this:
[start:end:step] import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7]) [1 2 3 4]
print(arr[:4])

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7]) [5 6]
print(arr[-3:-1])

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7]) [2 4]
print(arr[1:5:2])

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2]) [1 3 5 7]

import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9,
10]]) [7 8 9]
print(arr[1, 1:4])

NumPy Array import numpy as np [42 2 3 4 5]

Copy vs View arr = np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]
x = arr.copy()
arr[0] = 42
print(arr)
print(x)

import numpy as np [42 2 3 4 5]

arr = np.array([1, 2, 3, 4, 5]) [42 2 3 4 5]
x = arr.view()
arr[0] = 42
print(arr)
print(x)

Reshaping arrays import numpy as np [[ 1 2 3]

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, [ 4 5 6]
10, 11, 12]) [ 7 8 9]
newarr = arr.reshape(4, 3) [10 11 12]]
print(newarr)

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12]) [[[ 1 2]
newarr = arr.reshape(2, 3, 2) [ 3 4]
print(newarr) [ 5 6]]

[[ 7 8]
[ 9 10]
[11 12]]]

Result:
Ex. No: 03
Working with Pandas data frames
Aim:
To learn the various uses of Pandas package in Python with examples.
Introduction:

Pandas is a Python library used for working with data sets. It has functions for
analyzing, cleaning, exploring, and manipulating data. The name "Pandas" has a reference to
both "Panel Data", and "Python Data Analysis" . Pandas allows us to analyze big data and make
conclusions based on statistical theories. Pandas can clean messy data sets, and make them
readable and relevant. Relevant data is very important in data science.

import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)
print(myvar)

Output:

cars passings
0 BMW 3
1 Volvo 7

2 Ford 2
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

Output:
0 1
1 7
2 2
dtype: int64

• If nothing else is specified, the values are labeled with their index number. First value
has index 0, second value has index 1 etc. This label can be used to access a specified
value.
import pandas as pd
a = ['a', 'b', 'c']
myvar = pd.Series(a)
print(myvar[1])
Output:
B

import pandas as pd
a = [30, 31, 32]
myvar = pd.Series(a, index = ["A", "B", "C"])
print(myvar)

Output:

A 30
B 31
C 32
dtype: int64

Create a simple Pandas Series from a dictionary:

import pandas as pd
KeyValuePair = {"Index-1": 420, "Index-2": 380, "Index-3": 390}
myvar = pd.Series(KeyValuePair)
print(myvar)
Output:
Index-1 420
Index-2 380
Index-3 390
dtype: int64
Create a Series using only data from "Index-1" and "Index-3":

import pandas as pd
KeyValuePair = {"Index-1": 420, "Index-2": 380, "Index-3": 390}
myvar = pd.Series(KeyValuePair,index=["Index-1","Index-3"])
print(myvar)

Output:
Index-1 420
Index-3 390
dtype: int64

Create a DataFrame from two Series:

import pandas as pd
data = {
"Item": ['A', 'B', 'C'],
"Cost": [20, 10, 5]}
myvar = pd.DataFrame(data)
myvar = pd.DataFrame(data,index=["I1","I2","I3"])
print(myvar)

Output:
Item Cost
I1 A 20
I2 B 10
I3 C 5

import pandas as pd
data = {
"Item": ['A', 'B', 'C'],
"Cost": [20, 10, 5]}
df= pd.DataFrame(data)
for i in range(0,3):
print(df.loc[i])

Output:
Item A
Cost 20
Name: 0, dtype: object
Item B
Cost 10
Name: 1, dtype: object
Item C
Cost 5
Name: 2, dtype: object
Load Files in to a DataFrame:

import pandas as pd
df = pd.read_csv('/student.csv') #Can change the file extension
print(df.to_string())

Output:
id name class mark gender
0 1 John Deo Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
8 9 Tes Qry Six 78 male
9 10 Big John Four 55 female
10 11 Ronald Six 89 female
11 12 Recky Six 94 female
12 13 Kty Seven 88 female
Viewing the Data
• One of the most used method for getting a quick overview of the DataFrame, is the
head() method.
• The head() method returns the headers and a specified number of rows, starting from
the top.
• There is also a tail() method for viewing the last rows of the DataFrame.

import pandas as pd
df = pd.read_csv('/content/student.csv')
print("Head Data")
print(df.head(3))
print("Tail Data")
print(df.tail(3))

Output:

Head Data
id name class mark gender
0 1 John Deo Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
Tail Data
id name class mark gender
32 33 Kenn Rein Six 96 female
33 34 Gain Toe Seven 69 male
34 35 Rows Noump Six 88 female

Data cleaning means fixing bad data in your data set.

Bad data could be:

• Empty cells
• Data in wrong format
• Wrong data
• Duplicates

import pandas as pd
df = pd.read_csv('/content/clean.csv')
print(df)

Output:
Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 NaN 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 2020/12/26 120 250 NaN
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 NaN
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0
Empty Cells
• Empty cells can potentially give you a wrong result when you analyze data.
Remove Rows
• One way to deal with empty cells is to remove rows that contain empty cells.
• This is usually OK, since data sets can be very big, and removing a few rows will not
have a big impact on the result.

import pandas as pd
df = pd.read_csv('/content/clean.csv')
new_df = df.dropna()
print(new_df.to_string())

• Note: By default, the dropna () method returns a new DataFrame, and will not change
the original.
• If you want to change the original DataFrame, use the inplace = True argument
import pandas as pd
df = pd.read_csv('/content/clean.csv')
df.dropna(inplace = True)
print(df.to_string())

Replace Empty Values

• Another way of dealing with empty cells is to insert a new value instead.
• This way you do not have to delete entire rows just because of some empty cells.
• The fillna() method allows us to replace empty cells with a value

import pandas as pd
df = pd.read_csv('/content/clean.csv')
df.fillna(130, inplace = True)
print(df.to_string())

Output:

Duration Date Pulse Maxpulse Calories

0 60 '2020/12/01' 110 130 409.1
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 130.0
19 60 '2020/12/19' 103 123 323.0
20 45 '2020/12/20' 97 125 243.0
21 60 '2020/12/21' 108 131 364.2
22 45 130 100 119 282.0
23 60 '2020/12/23' 130 101 300.0
24 45 '2020/12/24' 105 132 246.0
25 60 '2020/12/25' 102 126 334.5
26 60 2020/12/26 120 250 130.0
27 60 '2020/12/27' 92 118 241.0
28 60 '2020/12/28' 103 132 130.0
29 60 '2020/12/29' 100 132 280.0
30 60 '2020/12/30' 102 129 380.3
31 60 '2020/12/31' 92 115 243.0

Replace Using Mean, Median, or Mode

A common way to replace empty cells, is to calculate the mean, median or mode value of the
column.
Pandas uses the mean () median () and mode () methods to calculate the respective values for
a specified column:

import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)
print(df.to_string())

Output:
Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.10
1 60 '2020/12/02' 117 145 479.00
2 60 '2020/12/03' 103 135 340.00
3 45 '2020/12/04' 109 175 282.40
4 45 '2020/12/05' 117 148 406.00
5 60 '2020/12/06' 102 127 300.00
6 60 '2020/12/07' 110 136 374.00
7 450 '2020/12/08' 104 134 253.30
8 30 '2020/12/09' 109 133 195.10
9 60 '2020/12/10' 98 124 269.00
10 60 '2020/12/11' 103 147 329.30
11 60 '2020/12/12' 100 120 250.70
12 60 '2020/12/12' 100 120 250.70
13 60 '2020/12/13' 106 128 345.30
14 60 '2020/12/14' 104 132 379.30
15 60 '2020/12/15' 98 123 275.00
16 60 '2020/12/16' 98 120 215.20
17 60 '2020/12/17' 100 120 300.00
18 45 '2020/12/18' 90 112 304.68
19 60 '2020/12/19' 103 123 323.00
20 45 '2020/12/20' 97 125 243.00
21 60 '2020/12/21' 108 131 364.20
22 45 130 100 119 282.00
23 60 '2020/12/23' 130 101 300.00
24 45 '2020/12/24' 105 132 246.00
25 60 '2020/12/25' 102 126 334.50
26 60 20201226 100 120 250.00
27 60 '2020/12/27' 92 118 241.00
28 60 '2020/12/28' 103 132 304.68
29 60 '2020/12/29' 100 132 280.00
30 60 '2020/12/30' 102 129 380.30
31 60 '2020/12/31' 92 115 243.00

x = df["Calories"].median()
x = df["Calories"].mode()[0]

Convert Into a Correct Format

Duration Date Pulse Maxpulse Calories

0 60 2020-12-01 110 130 409.1
1 60 2020-12-02 117 145 479.0
2 60 2020-12-03 103 135 340.0
3 45 2020-12-04 109 175 282.4
4 45 2020-12-05 117 148 406.0
5 60 2020-12-06 102 127 300.0
6 60 2020-12-07 110 136 374.0
7 450 2020-12-08 104 134 253.3
8 30 2020-12-09 109 133 195.1
9 60 2020-12-10 98 124 269.0
10 60 2020-12-11 103 147 329.3
11 60 2020-12-12 100 120 250.7
12 60 2020-12-12 100 120 250.7
13 60 2020-12-13 106 128 345.3
14 60 2020-12-14 104 132 379.3
15 60 2020-12-15 98 123 275.0
16 60 2020-12-16 98 120 215.2
17 60 2020-12-17 100 120 300.0
18 45 2020-12-18 90 112 NaN
19 60 2020-12-19 103 123 323.0
20 45 2020-12-20 97 125 243.0
21 60 2020-12-21 108 131 364.2
22 45 NaT 100 119 282.0
23 60 2020-12-23 130 101 300.0
24 45 2020-12-24 105 132 246.0
25 60 2020-12-25 102 126 334.5
26 60 2020-12-26 100 120 250.0
27 60 2020-12-27 92 118 241.0
28 60 2020-12-28 103 132 NaN
29 60 2020-12-29 100 132 280.0
30 60 2020-12-30 102 129 380.3
31 60 2020-12-31 92 115 243.0

Plotting
Pandas uses the plot() method to create diagrams.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/content/1.csv')
df.plot()
plt.show()

Scatter Plot
Specify that you want a scatter plot with the kind argument:
kind = 'scatter'
A scatter plot needs an x- and a y-axis.
In the example below we will use "Duration" for the x-axis and "Calories" for the y-axis.
Include the x and y arguments like this:
x = 'Duration', y = 'Calories'

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/content/1.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()

Result:
Ex 4 . Reading data from text files does exploring various commands

Aim

Procedure
Python provides inbuilt functions for creating, writing, and reading files. There are two types
of files that can be handled in python, normal text files and binary files (written in binary
language, 0s, and 1s).
Text files: In this type of file, Each line of text is terminated with a special character called
EOL (End of Line), which is the new line character (‘\n’) in python by default.
Binary files: In this type of file, there is no terminator for a line, and the data is stored after
converting it into machine-understandable binary language.
There are 6 access modes in python.
Read Only (‘r’) : Open text file for reading.
Read and Write (‘r+’): Open the file for reading and writing
Write Only (‘w’) : Open the file for writing.
Write and Read (‘w+’) : Open the file for reading and writing.
Append Only (‘a’): Open the file for writing.
Append and Read (‘a+’) : Open the file for reading and writing.
There are three ways to read data from a text file.
read() : Returns the read bytes in form of a string. Reads n bytes, if no n specified, reads the
entire file.
File_object.read([n])
readline() : Reads a line of the file and returns in form of a string.For specified n, reads at most
n bytes. However, does not reads more than one line, even if n exceeds the length of the line.
File_object.readline([n])
readlines() : Reads all the lines and return them as each line a string element in a list.
File_object.readlines()
Program
# Program to show various ways to read and
# write data in a file.
file1 = open("myfile.txt","w")
L = ["This is CSE Department \n", "Testing Line 2 \n", "Testing Line 3 \n"]
# \n is placed to indicate EOL (End of Line)
file1.write("Hello \n")
file1.writelines(L)
file1.close() #to change file access modes
file1 = open("myfile.txt","r+")
print("Output of Read function is ")
print(file1.read())
print()

# seek(n) takes the file handle to the nth

# bite from the beginning.
file1.seek(0)
print( "Output of Readline function is ")
print(file1.readline())
print()
file1.seek(0)

# To show difference between read and readline

print("Output of Read(9) function is ")
print(file1.read(9))
print()
file1.seek(0)
print("Output of Readline(9) function is ")
print(file1.readline(9))

file1.seek(0)
# readlines function
print("Output of Readlines function is ")
print(file1.readlines())
print()
file1.close()
Output
Output of Read function is
Hello
This is CSE Department
Testing Line 2
Testing Line 3

Output of Readline function is

Hello

Output of Read(9) function is

Hello
Th

Output of Readline(9) function is

Hello

Output of Readlines function is

[“Hello \n”, "This is CSE Department \n", "Testing Line 2 \n", "Testing Line 3 \n"]

Result:
Ex.No 5. a. Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing the following: Univariate analysis: Frequency, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis.

Aim:

Procedure
• Download dataset like Pima Indian diabetes dataset. Save them in any drive and call
them for process.
• The mean () function can be used to calculate mean/average of a given list of numbers.
• The median () method calculates the median (middle value) of the given data set.
• The mode of a set of data values is the value that appears most often.
• The var () method calculates the variance for each column.
• Standard deviation std () is a number that describes how spread out the values are.
• The skew () method calculates the skew for each column. Skewness refers to a
distortion or asymmetry that deviates from the symmetrical bell curve, or normal
distribution, in a set of data.
Kurtosis:
It is also a statistical term and an important characteristic of frequency distribution. It
determines whether a distribution is heavy-tailed in respect of the normal distribution. It
provides information about the shape of a frequency distribution.

Program:
import pandas as pd
from scipy.stats import kurtosis
import pylab as p
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
df1 = pd.DataFrame(df, columns= ['Age','Glucose'])
print (df1)
df1.mean()
df1.median()
df1.mode()
print(df1.var())
df1.std()
print(df1.skew())
print(kurtosis(df, axis=0, bias=True))

Dataset download link

https://github.com/npradaschnor/Pima-Indians-Diabetes-
Dataset/blob/master/Pima%20Indians%20Diabetes%20Dataset.ipynb

Result:
Ex.No: 5 b. Linear Regression and Logistic Regression with the Diabetes Dataset Using
Python Machine Learning
Aim

Procedure
• Load sklearn Libraries.
• Load Data
• Load the diabetes dataset
• Split Dataset
• Creating Model Linear Regression and Logistic Regression
• Make predictions using the testing set
• Finding Coefficient and Mean Square Error

Program
import matplotlib. pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#To calculate accuracy measures and confusion matrix

from sklearn import metrics
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# Create Logistic regression object

Logistic_model = LogisticRegression()
Logistic_model.fit(diabetes_X_train, diabetes_y_train)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(diabetes_y_test, diabetes_y_pred))
y_predict = Logistic_model.predict(diabetes_X_train)
#print("Y predict/hat ", y_predict)
y_predict

Output
Coefficients: [938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47

Result:
Ex. No: 5 c. Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing the following: Multiple Regression
Aim

Procedure
• The Pandas module allows us to read csv files and return a DataFrame object.
• Then make a list of the independent values and call this variable X.
• Put the dependent values in a variable called y.
• From the sklearn module we will use the LinearRegression() method to create a linear
regression object.
• This object has a method called fit() that takes the independent and dependent values
as parameters and fills the regression object with data that describes the relationship.
• We have a regression object that are ready to predict age values based on a person
Glucose and BloodPressure
Program
import pandas as pd
from sklearn import linear_model
df = pd.read_csv (r'd:\\diabetes.csv')
print (df)
X = df[['Glucose', 'BloodPressure']]
y = df['Age']
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedage = regr.predict([[150, 13]])
print(predictedage)

Output
[28.77214401]

Result:
Ex.No. 5 d. Also compare the results of the above analysis for the two data sets.
Aim

Procedure
Step 1: Prepare the datasets to be compared
Step 2: Create the two DataFrames
Based on the above data, you can then create the following two DataFrames
Step 3: Compare the values between the two Pandas DataFrames
• In this step, you’ll need to import the NumPy package.
• Let’s say that you have the following data stored in a CSV file called car1.csv
• While you have the data below stored in a second CSV file called car2.csv
Program
import pandas as pd
import numpy as np
data_1 = pd.read_csv(r'd:\car1.csv')
df1 = pd.DataFrame(data_1)
data_2 = pd.read_csv(r'd:\car2.csv')
df2 = pd.DataFrame(data_2)
df1['amount1'] = df2['amount1']
df1['prices_match'] = np.where(df1['amount'] == df2['amount1'], 'True', 'False')
df1['price_diff'] = np.where(df1['amount'] == df2['amount1'], 0, df1['amount'] -
df2['amount1'])
print(df1)
Output
Model City Year amount amount 1 prices_match price_diff
0 Maruti Chennai 2022 600000 600000 True 0
1 Hyndai Chennai 2022 700000 700000 True 0
2 Ford Chennai 2022 800000 850000 False -50000
3 Kia Chennai 2022 900000 900000 True 0
4 XL6 Chennai 2022 1000000 1000000 True 0
5 Tata Chennai 2022 1100000 1150000 False -50000
6 Audi Chennai 2022 1200000 1200000 True 0
7 Ertiga Chennai 2022 1300000 1300000 True 0

Result:
Ex. No: 06 Apply and explore various plotting functions

Aim:

Program:
Histogram Plotting

import matplotlib.pyplot as plt

import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()

Line Plotting
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()
Scatter Plotting
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()

Result:

Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Foundation of Data Science Lab Manual
No ratings yet
Foundation of Data Science Lab Manual
31 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
NumPy and Pandas: Essential Python Libraries
No ratings yet
NumPy and Pandas: Essential Python Libraries
72 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
Exploring Python Data Packages
No ratings yet
Exploring Python Data Packages
77 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Module 4
No ratings yet
Module 4
4 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
58 pages
Fds Record
No ratings yet
Fds Record
69 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
56 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Unit 5 Python Packages 240127 185930
No ratings yet
Unit 5 Python Packages 240127 185930
34 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
BIG DATA Lab Record-2024
No ratings yet
BIG DATA Lab Record-2024
59 pages
Data Science Cs3362 Lab Record
No ratings yet
Data Science Cs3362 Lab Record
39 pages
Key Python Libraries for Numerical Computing
100% (1)
Key Python Libraries for Numerical Computing
41 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
PP&DS Unit Iii
No ratings yet
PP&DS Unit Iii
26 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Print
No ratings yet
Print
296 pages
FDS Lab Manual-1
No ratings yet
FDS Lab Manual-1
51 pages
Unit 5
No ratings yet
Unit 5
40 pages
Exp1 Ref Doc Installation
No ratings yet
Exp1 Ref Doc Installation
6 pages
Feature Engineering - Introduction
No ratings yet
Feature Engineering - Introduction
74 pages
Datascience Lab Manual
No ratings yet
Datascience Lab Manual
46 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
Python Libraries: NumPy & Pandas Guide
No ratings yet
Python Libraries: NumPy & Pandas Guide
79 pages
Fds Labmanual
No ratings yet
Fds Labmanual
57 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
FDS Record
No ratings yet
FDS Record
59 pages
NumPy and Pandas for Data Science
No ratings yet
NumPy and Pandas for Data Science
17 pages
Data Science Lab - Ii - Cse - CS3361
No ratings yet
Data Science Lab - Ii - Cse - CS3361
55 pages
Unit 5-Python Packages 240127 185930
100% (1)
Unit 5-Python Packages 240127 185930
34 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Unit II 07 Numpy
No ratings yet
Unit II 07 Numpy
6 pages
BDA Practical File
No ratings yet
BDA Practical File
57 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
NumPy Basics: Arrays & Computation Guide
No ratings yet
NumPy Basics: Arrays & Computation Guide
33 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Looping
No ratings yet
Looping
12 pages
FDSA Lab Manual 1
No ratings yet
FDSA Lab Manual 1
34 pages
Cloud Computing Technologies
No ratings yet
Cloud Computing Technologies
1 page
IV Ai-Ds Ad3491 Fdsa QB Unit2
No ratings yet
IV Ai-Ds Ad3491 Fdsa QB Unit2
3 pages
Final
100% (1)
Final
21 pages
A Study of History and Culture
No ratings yet
A Study of History and Culture
4 pages
Nikto
No ratings yet
Nikto
6 pages
Object Oriented Programming OOPs - CS3391 - Important Questions With Answer2 - Unit 2 - Inheritance Packages and Interfaces
No ratings yet
Object Oriented Programming OOPs - CS3391 - Important Questions With Answer2 - Unit 2 - Inheritance Packages and Interfaces
13 pages
Unit 3 Two Marks Q&A
No ratings yet
Unit 3 Two Marks Q&A
6 pages
Unit 4 Two Marks Q&A
No ratings yet
Unit 4 Two Marks Q&A
5 pages
DC 5th Unit
No ratings yet
DC 5th Unit
44 pages
Java Tracing for Developers
100% (1)
Java Tracing for Developers
79 pages
MSC in Mathematics and Finance Imperial College London, 2020-2021
No ratings yet
MSC in Mathematics and Finance Imperial College London, 2020-2021
4 pages
VSYSTO 2CH Motorcycle Dash Cam Manual
No ratings yet
VSYSTO 2CH Motorcycle Dash Cam Manual
15 pages
Summary of Dshark: A General, Easy To Program and Scalable Framework For Analyzing In-Network Packet Traces.
No ratings yet
Summary of Dshark: A General, Easy To Program and Scalable Framework For Analyzing In-Network Packet Traces.
8 pages
STM32 Ultra Low Power 32 Bit MCUs
No ratings yet
STM32 Ultra Low Power 32 Bit MCUs
13 pages
Ragalyst: Automated Human-Aligned Agentic Evaluation For Domain-Specific Rag
No ratings yet
Ragalyst: Automated Human-Aligned Agentic Evaluation For Domain-Specific Rag
16 pages
Master Data Governance For Material - Send A Mail Notification During The Governance Process
No ratings yet
Master Data Governance For Material - Send A Mail Notification During The Governance Process
73 pages
RecoverPoint - Replacement Procedure-RPA Gen 6 (Components and Chassis) - HW - Guide
No ratings yet
RecoverPoint - Replacement Procedure-RPA Gen 6 (Components and Chassis) - HW - Guide
85 pages
Installing ANSYS 17.0 on Mac Guide
100% (1)
Installing ANSYS 17.0 on Mac Guide
1 page
Virtual Lab Mapping For B. Tech in Information Technology: Subject Code Subject Name List of Experiment V-Lab
No ratings yet
Virtual Lab Mapping For B. Tech in Information Technology: Subject Code Subject Name List of Experiment V-Lab
11 pages
Chapter 3 Discrete Mathematics and Combinatorics
No ratings yet
Chapter 3 Discrete Mathematics and Combinatorics
42 pages
Poornesh Resume
No ratings yet
Poornesh Resume
2 pages
Smart Access Control System Overview
No ratings yet
Smart Access Control System Overview
8 pages
Bresadkjfje
No ratings yet
Bresadkjfje
22 pages
Cb3402 Unit 1 Notes
No ratings yet
Cb3402 Unit 1 Notes
43 pages
Structural Engineering Thesis
No ratings yet
Structural Engineering Thesis
96 pages
Volume 3 of 3 - SS PTSIREP P2
No ratings yet
Volume 3 of 3 - SS PTSIREP P2
314 pages
ILM-DS User Manual
No ratings yet
ILM-DS User Manual
40 pages
Android Captcha Debug Log
No ratings yet
Android Captcha Debug Log
5 pages
Csat (Pyq) : Previous Year Questions
No ratings yet
Csat (Pyq) : Previous Year Questions
8 pages
A1 Essay Plan Template
No ratings yet
A1 Essay Plan Template
6 pages
CISA Cloud Security Technical Reference
100% (1)
CISA Cloud Security Technical Reference
46 pages
Bernard M.E. Moret Addison-Wesley-Longman, 1998: The Theory of Computation
No ratings yet
Bernard M.E. Moret Addison-Wesley-Longman, 1998: The Theory of Computation
33 pages
SMC-00 Datasheet v1.7
No ratings yet
SMC-00 Datasheet v1.7
2 pages
Chapter 4 - IoT
No ratings yet
Chapter 4 - IoT
57 pages
Understanding Time Complexity Analysis
No ratings yet
Understanding Time Complexity Analysis
8 pages
Why Engines Overrun
No ratings yet
Why Engines Overrun
15 pages
Tidua 44 A
No ratings yet
Tidua 44 A
88 pages
205 (Number) : in Mathematics
No ratings yet
205 (Number) : in Mathematics
7 pages
Morphological Operations in Computer Vision
No ratings yet
Morphological Operations in Computer Vision
24 pages

Fds Lab Manual

Uploaded by

Fds Lab Manual

Uploaded by

CHRISTIAN COLLEGE OF ENGINEERING AND TECHNOLOGY

ODDANCHATRAM – 624 619

DEPARTMENT OF INFORMATION TECHNOLOGY

CS3361 DATA SCIENCE LABORATORY

This is to certify that this is the bonafied record work done by

HEAD OF THE DEPARTMENT STAFF INCHARGE

Submitted for the practical exam held on ____________________

Internal Examiner External Examiner

SI.NO. DATE NAME OF THE EXPERIMENT PAGE.NO SIGNATURE

Programs and output :

0-D array with import numpy as np 42

1-D array import numpy as np [1 2 3 4 5]

2-D array import numpy as np [[1 2 3]

Check how many import numpy as np 0

NumPy Array import numpy as np 2

Access the import numpy as np 2nd element on 1st row: 2

Access the import numpy as np 5th element on 2nd row:

Slicing arrays import numpy as np [2 3 4 5]

NumPy Array import numpy as np [42 2 3 4 5]

import numpy as np [42 2 3 4 5]

Reshaping arrays import numpy as np [[ 1 2 3]

Create a simple Pandas Series from a dictionary:

Create a DataFrame from two Series:

Data cleaning means fixing bad data in your data set.

Bad data could be:

Replace Empty Values

Duration Date Pulse Maxpulse Calories

Replace Using Mean, Median, or Mode

Convert Into a Correct Format

Duration Date Pulse Maxpulse Calories

# seek(n) takes the file handle to the nth

# To show difference between read and readline

Output of Readline function is

Output of Read(9) function is

Output of Readline(9) function is

Output of Readlines function is

Dataset download link

#To calculate accuracy measures and confusion matrix

# Split the data into training/testing sets

# Create linear regression object

# Train the model using the training sets

# Make predictions using the testing set

# Create Logistic regression object

import matplotlib.pyplot as plt

You might also like