INTRODUCTION TO DATA SCIENCE - UCS23G03J
Practical Record Work
Done and submitted by
Name:
Registration Id.:
in partial fulfillment of the requirements for the award of the degree of
[Link]. Computer Science
DEPARTMENT OF COMPUTER SCIENCE
FACULTY OF SCIENCE AND HUMANITIES
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603 203
JULY - NOV 2025
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR– 603 203
BONAFIDE CERTIFICATE
Certified that this record book is a bonafide practical work done by
Name:
Registration Id.:
in the Computer Science laboratory during academic year 2025-2026
and submitted for the practical examination of V semester course,
UCS23G03J-INTRODUCTION TO DATA SCIENCE held on _________ in
partial fulfillment of the requirements for the award of the degree of [Link].
in Computer Science
Course Tutor HoD
Examiners 1.
2.
CONTENTS
Sl. Date of Page Signature
Title of the Exercise
No. Completion No. of the Tutor
Importing dataset and analyzing dataset
1 02/07/2025
using Python.
Prediction of salary using Linear
2 09/07/2025
Regression
3 16/07/2025 Creating and loading a DataFrame
4 23/07/2025 Plotting a Graph
5 06/08/2025 Exploring plot() function
6 13/08/2025 Employee management system
Student management system using
7 21/08/2025
conditional and control statements
Creating and manipulating a
8 28/08/2025
DataFrame
Importing built-in dataset using Python
9 04/09/2025
Libraries
10 19/09/2025 Exploring statistical functions
11 26/09/2025 Data cleaning and Data preprocessing
[Link]: 1 Importing dataset and analysing dataset using Python.
02/07/2025
Aim:
To analyse student’s scores in Math, Science, and English, calculate their average
scores, identify the top-performing student, and visualize the average scores using a bar chart.
Procedures:
Step 1: Import necessary libraries [Link], pandas.
Step 2: Load the dataset from CSV file
Step 3: Display the original dataset
Step 4: Calculate the average score for each student
Step 5: Display the dataset including the new average score column
Step 6: Find the top student based on the highest average score
Step 7: Plot a bar chart of average scores
Program:
1. CSV File - Sample Dataset ([Link])
Name,Math,Science,English
Alice,78,85,90
Bob,65,70,60
Charlie,95,92,88
David,50,45,55
Eve,88,76,90
2. Python Program:
import pandas as pd
import [Link] as plt
# Load dataset
data = pd.read_csv('[Link]')
# Display first few rows
print(" Dataset:")
print(data)
# Calculate average score for each student
data['Average'] = data[['Math', 'Science', 'English']].mean(axis=1)
# Display data with average
print("\n Dataset with Average Score:")
print(data)
# Find top student
top_student = [Link][data['Average'].idxmax()]
print(f"\n Top Student: {top_student['Name']} with average score
{top_student['Average']}")
# Plot bar chart of average scores
[Link](figsize=(8,5))
[Link](data['Name'], data['Average'], color='skyblue')
[Link]('Average Scores of Students')
[Link]('Student')
[Link]('Average Score')
[Link](True)
[Link]()
Result:
Thus the program is executed successfully.
[Link] Prediction of Salary Using Linear Regression
09/07/2025
Aim:
To predict the salary of an individual based on their years of experience using a linear
regression model.
Procedure:
1. Import necessary libraries pandas, LinearRegression from sklearn.linear_model
2. Create the dataset using dictionary with two lists 1. years of experience (exp) and
corresponding salaries (salary).
3. Convert the dictionary into a pandas DataFrame for easy data handling.
4. Prepare the data for the model
5. Create and train the Linear Regression model:
6. Make prediction use the trained model to predict the salary for an individual with 15
years of experience.
7. Output the prediction Print the predicted salary for 15 years of experience.
Program:
import pandas as pd
from sklearn.linear_model import LinearRegression
data={'exp':[1,2,3,4,5],'salary':[50000,90000,75000,80000,90000]}
df=[Link](data)
x=df[['exp']]
y=df[['salary']]
model=LinearRegression()
[Link](x,y)
print("Predicted Salary for 15 years experience",[Link]([[15]])[0])
Output:
Predicted Salary for 15 years experience [161000.]
Result:
The program prints the predicted salary based on the linear relationship learned from
the dataset.
[Link] Creating and Loading a Data Frame
16/07/2025
Aim:
To create a pandas DataFrame from a Python dictionary containing student information
(Name, Age, Gender, and CGPA) and display the DataFrame.
Procedure:
1. Import pandas library pandas
2. Create a dictionary named data with keys as column names (Name, Age, Gender,
CGPA) and values as lists containing the corresponding data.
3. Create a DataFrame use [Link](data) to convert the dictionary into a pandas
DataFrame.
4. Display the DataFrame to display the tabular data.
Program:
# Working with Data Frames
import pandas as pd
data = { 'Name': ['Sammy','Tim', 'Ram', 'Jai', 'Sai'],
'Age': [19, 17, 15, 20, 18],
'Gender': ['F', 'M', 'M', 'M', 'F'],
'CGPA' : [8.5, 'NaN',9, 7.5,8.5]}
df = [Link](data)
#Printing the Dataframe
Print(df )
Result:
Thus the program is executed which creates and displays the DataFrame.
[Link] Plotting a Graph
23/07/2025
Aim:
To visualize the number of steps walked each day over a week using a line plot.
Procedure:
1. Import Matplotlib Library
2. Create a list days containing the days of the week. and steps_walked containing the
corresponding number of steps walked each day.
3. Use [Link]() to create a line plot.
4. Add Titles and Labels:
5. Display the Plot using [Link]() to render and display the plot.
Program:
import [Link] as plt
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
steps_walked = [8934, 14902, 3409, 25672, 12300, 2023, 6890]
[Link](days, steps_walked, "x-r")
[Link]("Step count for the week")
[Link]("Days of the week")
[Link]("Steps walked")
[Link]()
Result:
Thus the program is executed successfully and a line plot is displayed showing the
number of steps walked each day of the week.
[Link] Exploring plot() Function
06/08/2025
Aim
To visualize and compare the step count of a person over the days of the week for this
week and last week using a line graph in Python with matplotlib.
Procedure
1. Import the [Link] module.
2. Define the list of days in the week (Mon to Sun).
3. Store the number of steps walked for this week in a list.
4. Store the number of steps walked for last week in another list.
5. Plot the step counts of this week using a green line with circle markers ("o-g").
6. Plot the step counts of last week using a magenta dashed line with triangle markers
("v--m").
7. Add a title, labels for the X-axis (Days of the week) and Y-axis (Steps walked).
8. Enable grid lines for better readability.
9. Display the line graph using [Link]()
Program:
import [Link] as plt
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
steps_walked = [8934, 14902, 3409, 25672, 12300, 2023, 6890]
steps_last_week = [9788, 8710, 5308, 17630, 21309, 4002, 5223]
[Link](days, steps_walked, "o-g")
[Link](days, steps_last_week, "v--m")
[Link]("Step count | This week and last week")
[Link]("Days of the week")
[Link]("Steps walked")
[Link](True)
[Link]()
Result
A line graph is successfully plotted showing the step counts for this week and last
week, which helps to compare the walking activity trends over the days of the week.
[Link]: 6 Employee Management System
13/08/2025
Aim
To create and display employee records in Python using different data types such as
List, Dictionary, Tuple, Set, and String.
Procedure
1. Create an empty list employees to store multiple employee records.
2. Define each employee as a dictionary containing:
o id (String)
o name (String)
o role as a tuple (Position, Department)
o skills as a set (to store unique skills, automatically removing duplicates).
3. Add each employee dictionary to the employees list.
4. Traverse through the employees list using a for loop.
5. For each employee, display:
o Name
o ID
o Role (Position and Department accessed from the tuple)
o Skills (joined into a single string after sorting the set).
Program:
# Step 1: Define employee records using all data types
# List of employees (List)
employees = []
# Create employee 1
emp1 = {
"id": "E001", # String
"name": "Alice", # String
"role": ("HR Manager", "HR"), # Tuple: (Position, Department)
"skills": {"recruiting", "training", "communication"} # Set
# Create employee 2
emp2 = {
"id": "E002",
"name": "Bob",
"role": ("IT Specialist", "IT"),
"skills": {"python", "networking", "security", "python"} # Set will remove
duplicate
# Add to employee list
[Link](emp1)
[Link](emp2)
# Step 2: Display data
print("=== Employee Records ===")
for emp in employees:
print(f"\nName: {emp['name']}")
print(f"ID: {emp['id']}")
print(f"Role: {emp['role'][0]} in {emp['role'][1]} department") # Accessing tuple
print(f"Skills: {', '.join(sorted(emp['skills']))}") # Accessing set
Result
The program successfully displays employee records using all data types.
[Link] Student Management System using conditional and control statements
21/08/2025
Aim
To develop a menu-driven Student Management System in Python to add,
display, and search student records.
Procedure
1. Create an empty list students to store all student records.
2. Use a while True loop to repeatedly show a menu until the user chooses Exit.
3. Provide options:
o Add Student → Input roll number, name, marks, store them in a
dictionary, and append to students.
o Display All Students → Traverse the students list and print details
(also display Pass/Fail based on marks).
o Search Student by Roll No → Ask for roll number, search in the list,
and display record if found.
o Exit → Break the loop and stop the program.
4. Use conditions (if-elif-else) to perform the selected operation.
Program:
students = [] # list to store student records
while True:
print("\n===== Student Management System =====")
print("1. Add Student")
print("2. Display All Students")
print("3. Search Student by Roll No")
print("4. Exit")
choice=int(input("Enter your choice from 1 to 4:"))
# 1. Add Student
if choice == 1:
roll = input("Enter Roll Number: ")
name = input("Enter Name: ")
marks = int(input("Enter Marks: "))
student = {"roll": roll, "name": name, "marks": marks}
[Link](student)
print("✅ Student added successfully!")
# 2. Display All Students
elif choice == 2:
if not students:
print("No students available.")
else:
print("\n---- Student Records ----")
for s in students:
status = "Pass" if s["marks"] >= 40 else "Fail"
print(f"Roll: {s['roll']}, Name: {s['name']}, Marks: {s['marks']}
({status})")
# 3. Search Student by Roll No
elif choice == 3:
roll = input("Enter Roll Number to Search: ")
found = False
for s in students:
if s["roll"] == roll:
status = "Pass" if s["marks"] >= 40 else "Fail"
print(f"Roll: {s['roll']}, Name: {s['name']}, Marks: {s['marks']}
({status})")
found = True
break
if not found:
print("❌ Student not found!")
# 4. Exit
elif choice == 4:
print("Exiting... Thank you!")
break
else:
print("Invalid choice! Try again.")
Result
The program successfully maintains student records with features to add new students,
display all records, and search by roll number.
[Link] Creating and manipulating A Data Frame
28/08/2025
Aim Aim
To perform various DataFrame manipulations using the Pandas library in
Python, such as column selection, row filtering, adding/updating data, sorting,
grouping, merging, dropping, and reshaping.
Procedure
1. Import the pandas library.
2. Create a dataset using a dictionary and convert it into a Pandas DataFrame.
3. Perform the following manipulations step by step:
o Select Columns (single & multiple).
o Filter Rows based on conditions.
o Add a new column (Bonus).
o Update values in the DataFrame.
o Sort records by Salary.
o Group and aggregate data (average salary by department).
o Merge two DataFrames on a common key.
o Drop columns/rows.
o Reshape using a Pivot Table.
4. Print the results after each operation to observe the changes.
Program :
import pandas as pd
# 1. Creating Dataset
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 75000, 80000],
'Department': ['IT', 'HR', 'IT', 'Finance']
}
df = [Link](data)
print("Original DataFrame:\n", df, "\n")
# 2. Selecting Columns
print("Select single column (Name):\n", df['Name'], "\n")
print("Select multiple columns (Name, Age):\n", df[['Name', 'Age']], "\n")
# 3. Filtering Rows
print("Employees older than 30:\n", df[df['Age'] > 30], "\n")
print("Employees with Salary > 60000:\n", df[df['Salary'] > 60000], "\n")
# 4. Adding a New Column
df['Bonus'] = df['Salary'] * 0.10
print("After adding Bonus column:\n", df, "\n")
# 5. Updating Values
[Link][df['Name'] == 'Alice', 'Age'] = 26
print("After updating Alice's age:\n", df, "\n")
# 6. Sorting
print("Sorted by Salary (descending):\n", df.sort_values(by='Salary',
ascending=False), "\n")
# 7. Grouping & Aggregation
print("Average Salary by Department:\n",
[Link]('Department')['Salary'].mean(), "\n")
# 8. Merging DataFrames
df_a = [Link]({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df_b = [Link]({'ID': [1, 2, 3], 'Age': [26, 30, 35]})
merged = [Link](df_a, df_b, on='ID')
print("Merged DataFrame:\n", merged, "\n")
# 9. Dropping Columns & Rows
df_dropped_col = [Link](columns=['Bonus'])
print("After dropping Bonus column:\n", df_dropped_col, "\n")
df_dropped_row = [Link](0) # drop first row (Alice)
print("After dropping first row:\n", df_dropped_row, "\n")
# 10. Reshaping (Pivot Table)
pivot_df = df.pivot_table(values='Salary', index='Department', aggfunc='sum')
print("Pivot Table (Total Salary by Department):\n", pivot_df, "\n")
OUTPUT
Original DataFrame:
Name Age Salary Department
0 Alice 25 50000 IT
1 Bob 30 60000 HR
2 Charlie 35 75000 IT
3 David 40 80000 Finance
Select single column (Name):
0 Alice
1 Bob
2 Charlie
3 David
Name: Name, dtype: object
Select multiple columns (Name, Age):
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40
Employees older than 30:
Name Age Salary Department
2 Charlie 35 75000 IT
3 David 40 80000 Finance
Employees with Salary > 60000:
Name Age Salary Department
2 Charlie 35 75000 IT
3 David 40 80000 Finance
After adding Bonus column:
Name Age Salary Department Bonus
0 Alice 25 50000 IT 5000.0
1 Bob 30 60000 HR 6000.0
2 Charlie 35 75000 IT 7500.0
3 David 40 80000 Finance 8000.0
After updating Alice's age:
Name Age Salary Department Bonus
0 Alice 26 50000 IT 5000.0
1 Bob 30 60000 HR 6000.0
2 Charlie 35 75000 IT 7500.0
3 David 40 80000 Finance 8000.0
Sorted by Salary (descending):
Name Age Salary Department Bonus
3 David 40 80000 Finance 8000.0
2 Charlie 35 75000 IT 7500.0
1 Bob 30 60000 HR 6000.0
0 Alice 26 50000 IT 5000.0
Average Salary by Department:
Department
Finance 80000.0
HR 60000.0
IT 62500.0
Name: Salary, dtype: float64
Merged DataFrame:
ID Name Age
0 1 Alice 26
1 2 Bob 30
2 3 Charlie 35
After dropping Bonus column:
Name Age Salary Department
0 Alice 26 50000 IT
1 Bob 30 60000 HR
2 Charlie 35 75000 IT
3 David 40 80000 Finance
After dropping first row:
Name Age Salary Department Bonus
1 Bob 30 60000 HR 6000.0
2 Charlie 35 75000 IT 7500.0
3 David 40 80000 Finance 8000.0
Result:
The program successfully demonstrates DataFrame manipulation operations in
Pandas,
[Link] Import a Built-in Dataset
04/09/2025
Aim
To load and explore the California Housing dataset using Scikit-learn and Pandas
for initial data analysis.
Procedure
1. Import the fetch_california_housing function from [Link].
2. Load the dataset using fetch_california_housing().
3. Print the dataset description (DESCR) to understand the features.
4. Convert the dataset into a Pandas DataFrame for easier manipulation.
5. Display the first 5 rows using .head() to preview the data.
Program:
from [Link] import fetch_california_housing
import pandas as pd
housing = fetch_california_housing()
#Print Dataset Description
print([Link])
# Create a DataFrame for easier data manipulation
housing_df = [Link](data= housing. data, columns=
housing.feature_names)
# Display the first few rows
print(housing_df.head())
Result:
Thus the program successfully executed.
[Link] Exploring Statistical Functions
19/09/2025
Aim
To calculate and display mean, median, mode, and standard deviation of a
dataset using Python’s statistics module.
Procedure
1. Import the statistics module.
2. Define a sample data list.
3. Use built-in functions from the module:
o [Link](data) → calculates the average.
o [Link](data) → finds the middle value.
o [Link](data) → finds the most frequently occurring value.
o [Link](data) → computes the standard deviation.
4. Print the dataset and all calculated statistical values.
Program:
import statistics
# Sample data list
data = [1, 2, 2, 3, 4, 5, 5, 5, 6]
# Calculate statistical measures
mean_value = [Link](data)
median_value = [Link](data)
mode_value = [Link](data) # Returns the single most common value
std_deviation = [Link](data) # Sample standard deviation
# Print the results
print("Data: ", data)
print("Mean: ", mean_value)
print("Median: ", median_value)
print("Mode: ", mode_value)
print("Standard Deviation: ", std_deviation)
Result
The program successfully calculates descriptive statistics for the given dataset.
[Link] Data Cleaning and Preprocessing
26/09/2025
Aim
To perform data preprocessing on a dataset using pandas and scikit-learn
techniques such as handling missing values, encoding categorical data, and
normalizing numerical features.
Procedure
1. Import necessary libraries: pandas, numpy, LabelEncoder, MinMaxScaler.
2. Create a DataFrame with sample employee data.
3. Handle missing values:
o Replace missing numerical values (Age, Salary) with their mean.
4. Encode categorical data:
o Apply Label Encoding for Gender (F=0, M=1).
o Apply One-Hot Encoding for Department.
5. Normalize numerical columns:
o Scale Age and Salary between 0 and 1 using MinMaxScaler.
6. Display both the original dataset and the preprocessed dataset.
Program:
import pandas as pd
import numpy as np
from [Link] import LabelEncoder, MinMaxScaler
# Sample dataset
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, [Link], 30, 22, 28],
'Gender': ['F', 'M', 'M', 'M', 'F'],
'Salary': [50000, 60000, 65000, [Link], 58000],
'Department': ['HR', 'IT', 'IT', 'Marketing', 'HR']
# Create DataFrame
df = [Link](data)
print("Original Data:")
print(df)
# Step 1: Handle missing values
# Fill missing numeric values with mean
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].mean(), inplace=True)
# Step 2: Encode categorical data
label_encoder = LabelEncoder()
df['Gender'] = label_encoder.fit_transform(df['Gender']) # F=0, M=1
# For Department, use one-hot encoding
df = pd.get_dummies(df, columns=['Department'])
# Step 3: Normalize numerical columns
scaler = MinMaxScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])
print("\nPreprocessed Data:")
print(df)
Result:
Thus the program is executed successfully.