21AD321 - PRINCIPLES OF DATA SCIENCE
LABORATORY PRACTICAL RECORD
DEPARTMENT OF ARTIFICIAL INTELLIGENCE
AND DATA SCIENCE
SRI SHAKTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institution, Affiliated to Anna University
Accredited by NAAC with “A” Grade COIMBATORE –
62
DECEMBER 2024
SRI SHAKTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
COIMBATORE - 62.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
CERTIFICATE
Certified that this is a bonafide record of practical work done
by Mr. /Ms. bearing Register Number of
Third Semester Bachelor of Technology in Artificial Intelligence and Data
Science in the 21AD321 Principles of Data Science Laboratory during the
Academic Year 2024-2025 under our supervision.
Place: Coimbatore
Date:
Staff In-Charge Head of the Department
Submitted for the End Semester Practical Examination held on ………………
Internal Examiner External Examiner
EXP.NO DATE LIST OF EXPERIMENT PAGE MARKS SIGN
NO
01 NUMPY SIMPLE
IMPLEMENTATION
PANDAS-INDEXING
02
AND SELECTION
03 VARIABILITY-RANGE
IMPLEMENTATION
04
OF NORMAL CURVE
05 FINDING MEAN AND MEDIAN USING
PYTHON
THE CORRELATION BETWEEN STUDY
06
HOURS AND EXAM SCORES USING
PYTHON
07 BASIC OPERATIONS ON A NUMPY
ARRAY: MULTIPLICATION AND
DIVISION
08 CREATING
A DATAFRAME
USING PANDAS
09 CREATING A DATAFRAME USING
LISTS
10
Z-SCORES CALCULATION
11
AVERAGE NUMPY
12
PYTHON PROGRAM TO FIND MEAN IN
AN ARRAY
PYTHON PROGRAM TO FIND MEDIAN
13
IN AN ARRAY
14 CUMULATIVE FREQUENCY
FREQUENCY OF EACH UNIQUE
15 ELEMENT IN A LIST USING PYTHON
EX:NO:01
NUMPY SIMPLE IMPLEMENTATION
DATE:
AIM:
Develop a python program to create a one-dimensional array from user input
and perform various comparison and arithmetic operations on it.
ALGORITHM:
STEP 1: Import the numpy module as np.
STEP 2: Initialize an empty list called arr.
STEP 3: Take an integer input from the user and store it in a variable n.
STEP 4: Loop from 0 to n-1, and for each iteration:
STEP 4.1: Take an integer input from the user and store it in a variable ele.
STEP 4.2: Append ele to the list arr.
STEP 5: Convert the list arr to a numpy array and store it in a variable arr1.
STEP 6: Print the result of applying the le method on arr1 with 8 as the argument.
STEP 7: Print the result of applying the lt method on arr1 with 10 as the
argument.
STEP 8: Print the result of applying the gt method on arr1 with 6 as the argument.
STEP 9: Print the result of applying the ge method on arr1 with 4 as the
argument.
STEP 10: Print the result of applying the eq method on arr1 with 4 as the
argument.
STEP 11: Print the result of applying the ne method on arr1 with 4 as the
argument.
STEP 12: Print the result of applying the np.negative function on arr1.
PROGRAM:
import numpy as np #numerical python
arr=[]
n=int(input("Enter no of elements - "))
for i in range(0,n):
ele=int(input("Enter the value - "))
arr.append(ele)
arr1 = np.array(arr)
print(arr1. le (8))
print(arr1. lt (10))
print(arr1. gt (6))
print(arr1. ge (4))
print(arr1. eq (4))
print(arr1. ne (4))
print(np.negative(arr1))
OUTPUT:
RESULT:
Thus, a python program to create a one-dimensional array from user
input andperform various comparison and arithmetic operations on it is executed
successfully and the output is verified.
EX:NO:02
DATE: PANDAS-INDEXING AND SELECTION
AIM:
Develop a python program for indexing and selection using pandas.
ALGORITHM:
STEP1: Start the program.
STEP2: Import pandas as pd
STEP3: Using pandas indexing and selection operation will takes place
STEP4: Display the result
STEP5: Stop the program
PROGRAM:
import pandas as pd
data = pd.Series([0.25,0.50,0.75,1.0],['a','b','c','d'])
print(data,'\n')
print(data[1],'\n')
#print(data.loc[1],'\n')
#print(data.loc[1:3],'\n')
print(data.iloc[1],'\n')
print(data.iloc[1:3],'\n')
OUTPUT:
RESULT:
Thus, the python program for indexing and selection using pandas is
executed successfully and the output is verified.
EX NO: 03
DATE: VARIABILITY-RANGE
AIM:
To calculate and display the maximum, minimum, and range of the entered
values.
ALGORITHM:
STEP 1:Initialize an empty list lst.
STEP 2:Prompt the user to enter the number of elements (n).
STEP 3:Loop for i from 1 to n:
STEP 3.1:Prompt the user to enter a value (ele).
STEP 3.2:Append ele to the list lst.
STEP 4:Calculate the minimum value using min(lst) and store it in minimum.
STEP 5:Calculate the maximum value using max(lst) and store it in maximum.
STEP 6:Calculate the range by subtracting the minimum from the
maximum andstore it in ran.
STEP 7:Print "Maximum is {maximum}".
STEP 8:Print "Minimum is {minimum}".
STEP 9:Print "Range is {ran}".
PROGRAM:
lst = []
n = int(input("Enter no of elements: "))
for i in range(n):
ele = int(input("Enter value: "))
lst.append(ele)
minimum = min(lst)
maximum = max(lst)
ran = maximum - minimum
OUTPUT:
RESULT:
The program calculates and display the maximum,minimum and range
successfully and the output was verified.
EX:NO:04
IMPLEMENTATION OF NORMAL CURVE
DATE:
AIM:
To develop a python program to visualize a normal curve.
ALGORITHM:
STEP 1: Import necessary libraries: `numpy`, `matplotlib.pyplot`,
`scipy.stats.norm`, and `statistics`.
STEP 2: Define the x-axis range using `np.arange()` from -20 to 20 with a step
of 0.01.
STEP 3: Calculate the mean of the x-axis values using `statistics.mean()`.
STEP 4: Calculate the standard deviation of the x-axis values using
`statistics.stdev()`.
STEP 5: Generate y-axis values using `norm.pdf()` with the x-axis, mean, and
standard deviation as parameters.
STEP 6: Plot the x-axis against the y-axis using `plt.plot()`.
STEP 7: Set the title of the plot to "First Curve" using
`plt.title()`.
STEP 8: Display the plot using `plt.show()`.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
x_axis = np.arange(-20,20,0.01)
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis,norm.pdf(x_axis,mean,sd))
plt.title("First Curve")
plt.show()
OUTPUT:
RESULT:
Thus, a python program to visualize a normal curve has been executed
and theoutput has been verified.
EX.NO:05 FINDING MEAN AND MEDIAN USING PYTHON
DATE:
AIM:
To calculate the mean and median of an array of numbers using NumPy.
ALGORITHM:
STEP 1: Create an array with the values [55, 65, 75, 85, 95, 105, 115].
STEP 2:Use the NumPy function to sum all elements in the array and
divide by the total number of elements.
STEP 3:Use the NumPy function to sort the array and find the middle
value.
STEP 4: If the array has an even number of elements, average the two
middle values to find the median.
STEP 5:Save the calculated mean and median in variables.
STEP 6:Print the mean and median values to the console.
STEP 7:Terminate the program after displaying the results
PROGRAM
import numpy as np
arr = np.array([55, 65, 75, 85, 95, 105, 115]
mean_value = np.mean(arr)
median_value = np.median(arr)
print(f"The mean of the array is: {mean_value}")
print(f"The median of the array is: {median_value}"
OUTPUT:
The mean of the array is: 85.0
The median of the array is: 85.0
RESULT:
The program successfully calculates the mean and median of the array [55, 65,
75, 85, 95, 105, 115] using NumPy.
The mean is 85.0 and the median is also 85.0 .
EX.NO:06
DATE: THE CORRELATION BETWEEN STUDY HOURS AND
EXAM SCORES USING PYTHON
AIM:
To calculate the correlation between study hours and exam scores using the given
datasets.
ALGORITHM:
STEP 1: Create an array representing study hours: [1,2,3,4,5].
STEP 2:Create an array representing exam scores: [50,60,70,80,90] .
STEP 3:Import the NumPy library to utilize its functions for calculations.
STEP 4: Use NumPy’s corrcoef function to compute the correlation coefficient
between the two arrays.
STEP 5:Retrieve the correlation value from the output of the correlation
coefficient calculation.
STEP 6: Print the correlation coefficient to the console for interpretation.
STEP 7:Assess the correlation value to determine the strength (positive, negative,
or none) and direction of the relationship between study hours and exam scores.
STEP 8:Terminate the program after displaying and interpreting the results.
PROGRAM:
import numpy as np
study_hours = np.array([1, 2, 3, 4, 5])
exam_scores = np.array([50, 60, 70, 80, 90])
correlation = np.corrcoef(study_hours, exam_scores)[0, 1]
print(f"The correlation between study hours and exam scores is: {correlation}")
OUTPUT:
The correlation between study hours and exam scores is: 1.0.
RESULT:
The program calculates the correlation coefficient between study hours and
exam scores as approximately 1.0, indicating a perfect positive correlation.
This means that as study hours increase, exam scores also tend to increase consistently
EX.NO:07
DATE: BASIC OPERATIONS ON A NUMPY ARRAY:
MULTIPLICATION AND DIVISION
AIM:
To create a NumPy array and perform basic operations such as multiplication and
division.
ALGORITHM:
STEP 1:Import the NumPy library to access its functionalities.
STEP 2:Define a NumPy array with a set of numerical values.
STEP 3:Define a second NumPy array of the same shape to perform operations.
STEP 4:Use the multiplication operator to multiply the two arrays element-wise.
STEP 5:Use the division operator to divide the elements of the first array
by the corresponding elements of the second array.
STEP 6:Ensure that division by zero is handled if applicable.
STEP 7:Print the results of the multiplication and division operations.
STEP 8:Terminate the program after displaying the results.
PROGRAM:
import numpy as np array1 = np
.array([10, 20, 30, 40, 50])
array2 = np.array([2, 4, 6, 8, 10])
multiplication_result = array1 * array2
division_result = array1 / array2
print("Multiplication Result:", multiplication_result)
print("Division Result:", division_result)
OUTPUT:
Multiplication Result: [ 20 80 180 320 500]
Division Result: [5. 5. 5. 5. 5.]
RESULT:
The program successfully creates two NumPy arrays and performs basic operations.
The multiplication of the arrays results in [20, 80, 180, 320, 500], while the
division yields
[5., 5., 5., 5., 5.]
EX.NO:08
DATE: CREATING A DATAFRAME USING PANDAS
AIM :
Creating a DataFrame using a Dictionary and Inserting Data
ALGORITHM:
Step 1: Import the Pandas library.
Step 2: Prepare the data in dictionary format.
Step 3: Create a DataFrame from the dictionary using pd.DataFrame().
Step 4: Insert a new column into the DataFrame.
Step 5: Display the DataFrame.Program:import pandas as pd
PROGRAM:
data = { 'Product': ['Laptop', 'Mobile', 'Tablet'], 'Price': [1000, 500, 300],'Stock': [50, 150,
100]}
df = pd.DataFrame(data)
df['Discount'] = [10, 5, 7]
print(df)
OUTPUT:
Product Price Stock Discount
Laptop 1000 50 10
Mobile 500 150 5
Tablet 300 100 7
RESULT:
The DataFrame was successfully created from a dictionary, and a new column
named "Discount" was inserted. The DataFrame was displayed with the updated data,
showing products, their prices, stock, and discount values.
EX.NO:09
DATE: CREATING A DATAFRAME USING LISTS
AIM :
Creating a DataFrame using Lists and Adding Rows.
ALGORITHM:
STEP 1: Import the Pandas library.
STEP 2: Prepare the data as lists.
STEP 3: Create a DataFrame by passing the lists to pd.DataFrame()
STEP 4: Insert a new row using loc[].
STEP 5: Display the updated DataFrame.Program:import pandas as pd
PROGRAM:
names = ['John', 'Emma', 'Sophia']
ages = [28, 22, 32]
cities = ['New York', 'London', 'Sydney']
df = pd.DataFrame({'Name': names, 'Age': ages, 'City':
cities})
df.loc[3] = ['Michael', 26, 'Toronto']
print(df)
OUTPUT:
Name Age City
John 28 New York
Emma 22 London
Sophia 32 Sydney
Michael 26 Toronto
RESULT:
The DataFrame was created using lists, and a new row with data for
"Michael" Was successfully added using the loc[] method. The updated
DataFrame, showing names, ages, and cities, was displayed as expected.
EX.NO:10
DATE: Z-SCORES CALCULATION
AIM :
To standardize data points and compare them across different datasets by converting
them to a common scale.
ALGORITHM:
STEP 1:Calculate the Mean of the dataset.
STEP 2:Compute the Standard Deviation.
STEP 3:Apply the Z-Score Formula:
where 𝑋 is the data point, 𝜇 is the mean, and σ is the standard deviation.
PROGRAM :
import numpy as np
data = np.array([10, 12, 23, 23, 16, 23, 21, 16])
mean = np.mean(data)
std_dev = np.std(data)
z_scores = (data - mean) / std_dev
print(f"Data: {data}")
print(f"Mean: {mean}")
print(f"Standard Deviation: {std_dev}")
print(f"Z-Scores: {z_scores}")
OUTPUT :
Data: [10 12 23 23 16 23 21 16]
Mean: 18.0
Standard Deviation: 4.24
Z-Scores: [-1.89 -1.41 1.19 1.19 -0.47 1.19 0.71 -0.47]
RESULT :
Displays the mean, standard deviation, and Z-scores for the dataset.
EX.NO:11
DATE: AVERAGE NUMPY
AIM :
To compute the average (mean) of a list of numbers using NumPy.
ALGORITHM :
STEP 1:Import NumPy: You need to have the NumPy library available in your Python
environment.
STEP 2:Create a List or Array of Numbers: Prepare the data for which you want to calculate
the average.
STEP 3:Use NumPy's Mean Function: Apply the np.mean() function to compute the average.
STEP 4:Output the Result: Display or use the computed average.
PROGRAM :
import numpy as np
data = [10, 20, 30, 40, 50]
average = np.mean(data)
print(f"The average is: {average}")
OUTPUT:
The average is: 30.0
RESULT :
The given program is executed and output is verified.
EX.NO:12
DATE: PYTHON PROGRAM TO FIND MEAN IN AN ARRAY
AIM :
Write a python program to find mean in an array.
ALGORITHM :
STEP 1: Get Input from User .
STEP 1.1: Prompt the user to enter a series of numbers separated by spaces
STEP 1.2: Read the input string from the user
STEP 2: Convert Input to List of Integers
STEP 2.1: Split the input string by spaces to create a list of substrings
STEP 2.2: Convert each substring to an integer
STEP 2.3: Form a list of integers from the converted substrings
STEP 3: Calculate the Mean
STEP 3.1: Check if the list of integers is not empty
STEP 3.2: If the list is not empty
STEP 3.2.1: Compute the sum of the integers in the list
STEP 3.2.2: Divide the sum by the number of integers to obtain the mean
STEP 3.3: If the list is empty, set the mean to 0
STEP 4: Display the Mean
STEP 4.1: Print the calculated mean to the user
STEP 5: End.
PROGRAM :
def calculate_mean(array):
return sum(array) / /len(array) if array else 0
user_input = input("Enter numbers separated by spaces: ")
array = list(map(int, user_input.split()))
OUTPUT :
10 20 30 40
Mean: 25
RESULT :
The given program is executed and output is verified.
EX.NO:13
DATE: PYTHON PROGRAM TO FIND MEDIAN IN AN ARRAY
AIM :
Write a python program to find median in an array.
ALGORITHM :
STEP 1: Get Input from User
STEP 1.1: Prompt the user to enter a series of numbers separated by spaces
STEP 1.2: Read the input string from the user
STEP 2: Convert Input to List of Integers
STEP 2.1: Split the input string by spaces to create a list of substrings
STEP 2.2: Convert each substring to an integer
STEP 2.3: Form a list of integers from the converted substrings
STEP 3: Calculate the Median
STEP 3.1: Check if the list of integers is not empty
STEP 3.2: If the list is not empty
STEP 3.2.1: Sort the list of integers
STEP 3.2.2: Find the median: - If the list has an odd number of elements, the median is the
middle element. - If the list has an even number of elements, the median is the average of the
two middle elements.
STEP 3.3: If the list is empty, set the median to 0
STEP 4: Display the Median
STEP 4.1: Print the calculated median to the user
STEP 5: End
PROGRAM :
def calculate_median(array):
if not array:
return 0
array.sort()
n = len(array)
mid = n // 2
OUTPUT :
10 20 30 40 50
Median: 30
RESULT :
The given program is executed and output is verified.
EX.NO:14
DATE: CUMULATIVE FREQUENCY
AIM :
The aim is to compute the cumulative frequency of a dataset using pandas, which can
simplify the process and provide more powerful data handling capabilities .
ALGORITHM :
STEP 1:Import Libraries: Use pandas to handle and process the data.
STEP 2:Create a DataFrame: Convert the dataset into a DataFrame for easier
manipulation.
STEP 3:Calculate Frequencies: Count the occurrences of each unique value using
pandas functions.
STEP 4:Sort Data: Sort the values in ascending order.
STEP 5:Compute Cumulative Frequency: Use the cumulative sum function to compute
cumulative frequencies.
STEP 6:Display Results: Print or display the results in a readable format.
PROGRAM :
import pandas as pd
data = [5, 1, 9, 2, 3, 5, 6, 2, 9, 5, 1, 3, 2, 7, 9]
df = pd.Series(data).value_counts().sort_index().cumsum().reset_index()
df.columns = ['Value', 'Cumulative Frequency']
print(df)
OUTPUT :
Value Cumulative Frequency
0 1 2
1 2 5
2 3 7
3 5 10
4 6 11
5 7 12
6 9 15
RESULT :
Thus,the above program is executed and output is verified.
EX.NO:15
DATE: FREQUENCY OF EACH UNIQUE ELEMENT IN A LIST
USING PYTHON
AIM:
To determine the frequency of each unique element in a list using Python.
ALGROTHIM:
STEP 1:Initialize a Data Structure: Use a dictionary to store each unique element and its
corresponding frequency.
STEP 2:Iterate Through the List: Traverse the list and update the frequency count of each
element in the dictionary.
STEP 3:Output the Results: Print or return the dictionary containing the elements and their
frequencies.
PROGRAM:
from collections import Counter
elements = ['apple', 'banana', 'apple', 'orange', 'banana', 'banana']
frequency = Counter(elements)
print("Element Frequencies:")
for element, count in frequency.items():
print(f"{element}: {count}")
OUTPUT :
Element Frequencies:
apple: 2
banana: 3
orange: 1
RESULT:
Thus, the above program is executed and output is verified.