0% found this document useful (0 votes)
4 views35 pages

DS Lab Programs

h

Uploaded by

rk8783
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views35 pages

DS Lab Programs

h

Uploaded by

rk8783
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

INTRODUCTION TO DATA SCIENCE - UCS23G03J

Practical Record Work

Done and submitted by

Name:

Registration Id.:

in partial fulfillment of the requirements for the award of the degree of

[Link]. Computer Science

DEPARTMENT OF COMPUTER SCIENCE

FACULTY OF SCIENCE AND HUMANITIES

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY


KATTANKULATHUR – 603 203

JULY - NOV 2025


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

KATTANKULATHUR– 603 203

BONAFIDE CERTIFICATE

Certified that this record book is a bonafide practical work done by

Name:

Registration Id.:

in the Computer Science laboratory during academic year 2025-2026


and submitted for the practical examination of V semester course,
UCS23G03J-INTRODUCTION TO DATA SCIENCE held on _________ in
partial fulfillment of the requirements for the award of the degree of [Link].
in Computer Science

Course Tutor HoD

Examiners 1.

2.
CONTENTS

Sl. Date of Page Signature


Title of the Exercise
No. Completion No. of the Tutor

Importing dataset and analyzing dataset


1 02/07/2025
using Python.

Prediction of salary using Linear


2 09/07/2025
Regression

3 16/07/2025 Creating and loading a DataFrame

4 23/07/2025 Plotting a Graph

5 06/08/2025 Exploring plot() function

6 13/08/2025 Employee management system

Student management system using


7 21/08/2025
conditional and control statements

Creating and manipulating a


8 28/08/2025
DataFrame

Importing built-in dataset using Python


9 04/09/2025
Libraries

10 19/09/2025 Exploring statistical functions

11 26/09/2025 Data cleaning and Data preprocessing


[Link]: 1 Importing dataset and analysing dataset using Python.

02/07/2025

Aim:

To analyse student’s scores in Math, Science, and English, calculate their average
scores, identify the top-performing student, and visualize the average scores using a bar chart.

Procedures:

Step 1: Import necessary libraries [Link], pandas.

Step 2: Load the dataset from CSV file

Step 3: Display the original dataset

Step 4: Calculate the average score for each student

Step 5: Display the dataset including the new average score column

Step 6: Find the top student based on the highest average score

Step 7: Plot a bar chart of average scores

Program:

1. CSV File - Sample Dataset ([Link])

Name,Math,Science,English
Alice,78,85,90
Bob,65,70,60
Charlie,95,92,88
David,50,45,55
Eve,88,76,90

2. Python Program:

import pandas as pd
import [Link] as plt
# Load dataset
data = pd.read_csv('[Link]')

# Display first few rows


print(" Dataset:")
print(data)

# Calculate average score for each student


data['Average'] = data[['Math', 'Science', 'English']].mean(axis=1)
# Display data with average
print("\n Dataset with Average Score:")
print(data)

# Find top student


top_student = [Link][data['Average'].idxmax()]
print(f"\n Top Student: {top_student['Name']} with average score
{top_student['Average']}")

# Plot bar chart of average scores


[Link](figsize=(8,5))
[Link](data['Name'], data['Average'], color='skyblue')
[Link]('Average Scores of Students')
[Link]('Student')
[Link]('Average Score')
[Link](True)
[Link]()
Result:
Thus the program is executed successfully.
[Link] Prediction of Salary Using Linear Regression

09/07/2025

Aim:

To predict the salary of an individual based on their years of experience using a linear
regression model.

Procedure:

1. Import necessary libraries pandas, LinearRegression from sklearn.linear_model


2. Create the dataset using dictionary with two lists 1. years of experience (exp) and
corresponding salaries (salary).
3. Convert the dictionary into a pandas DataFrame for easy data handling.
4. Prepare the data for the model
5. Create and train the Linear Regression model:
6. Make prediction use the trained model to predict the salary for an individual with 15
years of experience.
7. Output the prediction Print the predicted salary for 15 years of experience.

Program:

import pandas as pd
from sklearn.linear_model import LinearRegression
data={'exp':[1,2,3,4,5],'salary':[50000,90000,75000,80000,90000]}
df=[Link](data)
x=df[['exp']]
y=df[['salary']]
model=LinearRegression()
[Link](x,y)
print("Predicted Salary for 15 years experience",[Link]([[15]])[0])
Output:

Predicted Salary for 15 years experience [161000.]

Result:

The program prints the predicted salary based on the linear relationship learned from
the dataset.
[Link] Creating and Loading a Data Frame
16/07/2025

Aim:

To create a pandas DataFrame from a Python dictionary containing student information


(Name, Age, Gender, and CGPA) and display the DataFrame.

Procedure:

1. Import pandas library pandas


2. Create a dictionary named data with keys as column names (Name, Age, Gender,
CGPA) and values as lists containing the corresponding data.
3. Create a DataFrame use [Link](data) to convert the dictionary into a pandas
DataFrame.
4. Display the DataFrame to display the tabular data.

Program:

# Working with Data Frames

import pandas as pd

data = { 'Name': ['Sammy','Tim', 'Ram', 'Jai', 'Sai'],

'Age': [19, 17, 15, 20, 18],

'Gender': ['F', 'M', 'M', 'M', 'F'],

'CGPA' : [8.5, 'NaN',9, 7.5,8.5]}

df = [Link](data)

#Printing the Dataframe

Print(df )
Result:

Thus the program is executed which creates and displays the DataFrame.
[Link] Plotting a Graph

23/07/2025

Aim:

To visualize the number of steps walked each day over a week using a line plot.

Procedure:

1. Import Matplotlib Library


2. Create a list days containing the days of the week. and steps_walked containing the
corresponding number of steps walked each day.
3. Use [Link]() to create a line plot.
4. Add Titles and Labels:
5. Display the Plot using [Link]() to render and display the plot.

Program:

import [Link] as plt

days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

steps_walked = [8934, 14902, 3409, 25672, 12300, 2023, 6890]

[Link](days, steps_walked, "x-r")

[Link]("Step count for the week")

[Link]("Days of the week")

[Link]("Steps walked")

[Link]()
Result:

Thus the program is executed successfully and a line plot is displayed showing the
number of steps walked each day of the week.
[Link] Exploring plot() Function

06/08/2025

Aim

To visualize and compare the step count of a person over the days of the week for this
week and last week using a line graph in Python with matplotlib.

Procedure

1. Import the [Link] module.

2. Define the list of days in the week (Mon to Sun).

3. Store the number of steps walked for this week in a list.

4. Store the number of steps walked for last week in another list.

5. Plot the step counts of this week using a green line with circle markers ("o-g").

6. Plot the step counts of last week using a magenta dashed line with triangle markers
("v--m").

7. Add a title, labels for the X-axis (Days of the week) and Y-axis (Steps walked).

8. Enable grid lines for better readability.

9. Display the line graph using [Link]()

Program:

import [Link] as plt

days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

steps_walked = [8934, 14902, 3409, 25672, 12300, 2023, 6890]

steps_last_week = [9788, 8710, 5308, 17630, 21309, 4002, 5223]


[Link](days, steps_walked, "o-g")

[Link](days, steps_last_week, "v--m")

[Link]("Step count | This week and last week")

[Link]("Days of the week")

[Link]("Steps walked")

[Link](True)

[Link]()

Result

A line graph is successfully plotted showing the step counts for this week and last
week, which helps to compare the walking activity trends over the days of the week.
[Link]: 6 Employee Management System

13/08/2025

Aim

To create and display employee records in Python using different data types such as
List, Dictionary, Tuple, Set, and String.

Procedure

1. Create an empty list employees to store multiple employee records.

2. Define each employee as a dictionary containing:

o id (String)

o name (String)

o role as a tuple (Position, Department)

o skills as a set (to store unique skills, automatically removing duplicates).

3. Add each employee dictionary to the employees list.

4. Traverse through the employees list using a for loop.

5. For each employee, display:

o Name

o ID

o Role (Position and Department accessed from the tuple)

o Skills (joined into a single string after sorting the set).

Program:

# Step 1: Define employee records using all data types


# List of employees (List)

employees = []

# Create employee 1

emp1 = {

"id": "E001", # String

"name": "Alice", # String

"role": ("HR Manager", "HR"), # Tuple: (Position, Department)

"skills": {"recruiting", "training", "communication"} # Set

# Create employee 2

emp2 = {

"id": "E002",

"name": "Bob",

"role": ("IT Specialist", "IT"),

"skills": {"python", "networking", "security", "python"} # Set will remove


duplicate

# Add to employee list

[Link](emp1)

[Link](emp2)

# Step 2: Display data

print("=== Employee Records ===")


for emp in employees:

print(f"\nName: {emp['name']}")

print(f"ID: {emp['id']}")

print(f"Role: {emp['role'][0]} in {emp['role'][1]} department") # Accessing tuple

print(f"Skills: {', '.join(sorted(emp['skills']))}") # Accessing set

Result
The program successfully displays employee records using all data types.
[Link] Student Management System using conditional and control statements

21/08/2025

Aim

To develop a menu-driven Student Management System in Python to add,


display, and search student records.

Procedure

1. Create an empty list students to store all student records.

2. Use a while True loop to repeatedly show a menu until the user chooses Exit.

3. Provide options:

o Add Student → Input roll number, name, marks, store them in a


dictionary, and append to students.

o Display All Students → Traverse the students list and print details
(also display Pass/Fail based on marks).

o Search Student by Roll No → Ask for roll number, search in the list,
and display record if found.

o Exit → Break the loop and stop the program.

4. Use conditions (if-elif-else) to perform the selected operation.

Program:

students = [] # list to store student records

while True:

print("\n===== Student Management System =====")

print("1. Add Student")


print("2. Display All Students")

print("3. Search Student by Roll No")

print("4. Exit")

choice=int(input("Enter your choice from 1 to 4:"))

# 1. Add Student

if choice == 1:

roll = input("Enter Roll Number: ")

name = input("Enter Name: ")

marks = int(input("Enter Marks: "))

student = {"roll": roll, "name": name, "marks": marks}

[Link](student)

print("✅ Student added successfully!")

# 2. Display All Students

elif choice == 2:

if not students:

print("No students available.")

else:

print("\n---- Student Records ----")

for s in students:

status = "Pass" if s["marks"] >= 40 else "Fail"

print(f"Roll: {s['roll']}, Name: {s['name']}, Marks: {s['marks']}


({status})")
# 3. Search Student by Roll No

elif choice == 3:

roll = input("Enter Roll Number to Search: ")

found = False

for s in students:

if s["roll"] == roll:

status = "Pass" if s["marks"] >= 40 else "Fail"

print(f"Roll: {s['roll']}, Name: {s['name']}, Marks: {s['marks']}


({status})")

found = True

break

if not found:

print("❌ Student not found!")

# 4. Exit

elif choice == 4:

print("Exiting... Thank you!")

break

else:

print("Invalid choice! Try again.")


Result
The program successfully maintains student records with features to add new students,
display all records, and search by roll number.
[Link] Creating and manipulating A Data Frame
28/08/2025

Aim Aim
To perform various DataFrame manipulations using the Pandas library in
Python, such as column selection, row filtering, adding/updating data, sorting,
grouping, merging, dropping, and reshaping.
Procedure
1. Import the pandas library.
2. Create a dataset using a dictionary and convert it into a Pandas DataFrame.
3. Perform the following manipulations step by step:
o Select Columns (single & multiple).
o Filter Rows based on conditions.
o Add a new column (Bonus).
o Update values in the DataFrame.
o Sort records by Salary.
o Group and aggregate data (average salary by department).
o Merge two DataFrames on a common key.
o Drop columns/rows.
o Reshape using a Pivot Table.
4. Print the results after each operation to observe the changes.
Program :

import pandas as pd

# 1. Creating Dataset

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40],

'Salary': [50000, 60000, 75000, 80000],

'Department': ['IT', 'HR', 'IT', 'Finance']


}

df = [Link](data)

print("Original DataFrame:\n", df, "\n")

# 2. Selecting Columns

print("Select single column (Name):\n", df['Name'], "\n")

print("Select multiple columns (Name, Age):\n", df[['Name', 'Age']], "\n")

# 3. Filtering Rows

print("Employees older than 30:\n", df[df['Age'] > 30], "\n")

print("Employees with Salary > 60000:\n", df[df['Salary'] > 60000], "\n")

# 4. Adding a New Column

df['Bonus'] = df['Salary'] * 0.10

print("After adding Bonus column:\n", df, "\n")

# 5. Updating Values

[Link][df['Name'] == 'Alice', 'Age'] = 26

print("After updating Alice's age:\n", df, "\n")

# 6. Sorting

print("Sorted by Salary (descending):\n", df.sort_values(by='Salary',


ascending=False), "\n")

# 7. Grouping & Aggregation

print("Average Salary by Department:\n",


[Link]('Department')['Salary'].mean(), "\n")

# 8. Merging DataFrames
df_a = [Link]({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df_b = [Link]({'ID': [1, 2, 3], 'Age': [26, 30, 35]})

merged = [Link](df_a, df_b, on='ID')

print("Merged DataFrame:\n", merged, "\n")

# 9. Dropping Columns & Rows

df_dropped_col = [Link](columns=['Bonus'])

print("After dropping Bonus column:\n", df_dropped_col, "\n")

df_dropped_row = [Link](0) # drop first row (Alice)

print("After dropping first row:\n", df_dropped_row, "\n")

# 10. Reshaping (Pivot Table)

pivot_df = df.pivot_table(values='Salary', index='Department', aggfunc='sum')

print("Pivot Table (Total Salary by Department):\n", pivot_df, "\n")

OUTPUT

Original DataFrame:

Name Age Salary Department

0 Alice 25 50000 IT

1 Bob 30 60000 HR

2 Charlie 35 75000 IT

3 David 40 80000 Finance

Select single column (Name):


0 Alice

1 Bob

2 Charlie

3 David

Name: Name, dtype: object

Select multiple columns (Name, Age):

Name Age

0 Alice 25

1 Bob 30

2 Charlie 35

3 David 40

Employees older than 30:

Name Age Salary Department

2 Charlie 35 75000 IT

3 David 40 80000 Finance

Employees with Salary > 60000:

Name Age Salary Department

2 Charlie 35 75000 IT
3 David 40 80000 Finance

After adding Bonus column:

Name Age Salary Department Bonus

0 Alice 25 50000 IT 5000.0

1 Bob 30 60000 HR 6000.0

2 Charlie 35 75000 IT 7500.0

3 David 40 80000 Finance 8000.0

After updating Alice's age:

Name Age Salary Department Bonus

0 Alice 26 50000 IT 5000.0

1 Bob 30 60000 HR 6000.0

2 Charlie 35 75000 IT 7500.0

3 David 40 80000 Finance 8000.0

Sorted by Salary (descending):

Name Age Salary Department Bonus

3 David 40 80000 Finance 8000.0

2 Charlie 35 75000 IT 7500.0

1 Bob 30 60000 HR 6000.0


0 Alice 26 50000 IT 5000.0

Average Salary by Department:

Department

Finance 80000.0

HR 60000.0

IT 62500.0

Name: Salary, dtype: float64

Merged DataFrame:

ID Name Age

0 1 Alice 26

1 2 Bob 30

2 3 Charlie 35

After dropping Bonus column:

Name Age Salary Department

0 Alice 26 50000 IT

1 Bob 30 60000 HR

2 Charlie 35 75000 IT

3 David 40 80000 Finance


After dropping first row:

Name Age Salary Department Bonus

1 Bob 30 60000 HR 6000.0

2 Charlie 35 75000 IT 7500.0

3 David 40 80000 Finance 8000.0

Result:

The program successfully demonstrates DataFrame manipulation operations in


Pandas,
[Link] Import a Built-in Dataset

04/09/2025

Aim

To load and explore the California Housing dataset using Scikit-learn and Pandas
for initial data analysis.

Procedure

1. Import the fetch_california_housing function from [Link].


2. Load the dataset using fetch_california_housing().
3. Print the dataset description (DESCR) to understand the features.
4. Convert the dataset into a Pandas DataFrame for easier manipulation.
5. Display the first 5 rows using .head() to preview the data.

Program:

from [Link] import fetch_california_housing

import pandas as pd

housing = fetch_california_housing()

#Print Dataset Description

print([Link])

# Create a DataFrame for easier data manipulation

housing_df = [Link](data= housing. data, columns=


housing.feature_names)

# Display the first few rows

print(housing_df.head())
Result:

Thus the program successfully executed.


[Link] Exploring Statistical Functions

19/09/2025

Aim

To calculate and display mean, median, mode, and standard deviation of a


dataset using Python’s statistics module.

Procedure

1. Import the statistics module.

2. Define a sample data list.

3. Use built-in functions from the module:

o [Link](data) → calculates the average.

o [Link](data) → finds the middle value.

o [Link](data) → finds the most frequently occurring value.

o [Link](data) → computes the standard deviation.

4. Print the dataset and all calculated statistical values.

Program:

import statistics

# Sample data list

data = [1, 2, 2, 3, 4, 5, 5, 5, 6]

# Calculate statistical measures

mean_value = [Link](data)

median_value = [Link](data)
mode_value = [Link](data) # Returns the single most common value

std_deviation = [Link](data) # Sample standard deviation

# Print the results

print("Data: ", data)

print("Mean: ", mean_value)

print("Median: ", median_value)

print("Mode: ", mode_value)

print("Standard Deviation: ", std_deviation)

Result

The program successfully calculates descriptive statistics for the given dataset.
[Link] Data Cleaning and Preprocessing

26/09/2025

Aim

To perform data preprocessing on a dataset using pandas and scikit-learn


techniques such as handling missing values, encoding categorical data, and
normalizing numerical features.

Procedure

1. Import necessary libraries: pandas, numpy, LabelEncoder, MinMaxScaler.

2. Create a DataFrame with sample employee data.

3. Handle missing values:

o Replace missing numerical values (Age, Salary) with their mean.

4. Encode categorical data:

o Apply Label Encoding for Gender (F=0, M=1).

o Apply One-Hot Encoding for Department.

5. Normalize numerical columns:

o Scale Age and Salary between 0 and 1 using MinMaxScaler.

6. Display both the original dataset and the preprocessed dataset.

Program:

import pandas as pd

import numpy as np

from [Link] import LabelEncoder, MinMaxScaler

# Sample dataset

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],


'Age': [25, [Link], 30, 22, 28],

'Gender': ['F', 'M', 'M', 'M', 'F'],

'Salary': [50000, 60000, 65000, [Link], 58000],

'Department': ['HR', 'IT', 'IT', 'Marketing', 'HR']

# Create DataFrame

df = [Link](data)

print("Original Data:")

print(df)

# Step 1: Handle missing values

# Fill missing numeric values with mean

df['Age'].fillna(df['Age'].mean(), inplace=True)

df['Salary'].fillna(df['Salary'].mean(), inplace=True)

# Step 2: Encode categorical data

label_encoder = LabelEncoder()

df['Gender'] = label_encoder.fit_transform(df['Gender']) # F=0, M=1

# For Department, use one-hot encoding

df = pd.get_dummies(df, columns=['Department'])

# Step 3: Normalize numerical columns

scaler = MinMaxScaler()

df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

print("\nPreprocessed Data:")

print(df)
Result:

Thus the program is executed successfully.

You might also like