Lab Manual of Data Analytics and Visualization
Lab Manual of Data Analytics and Visualization
Lab Manual
(BCDS-551)
Semester-V
Year-III
Prepared by
INDEX
List of Equipment 6
Evaluation Scheme 18
Safety Precautions 22
2
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
To cultivate student centric ecosystem that foster experiential learning, ethical problem
solving and sustainability.
To provide a conducive environment for professional growth of faculty and staff
through research & global collaboration contributing towards the overall growth of
nation.
To nurture a culture of active citizenship through excellence in educations,
entrepreneurship and innovation producing socially responsible and competent
technocrats.
1
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Department Vision
Department Mission
PEO1: Graduates will excel in their professional careers by applying advanced data
science skills in diverse industries, research institutions, and entrepreneurial ventures.
2
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Statement
Outcomes
PO1 Engineering Knowledge: Apply knowledge of mathematics, natural science,
computing, engineering fundamentals and an engineering specialization as specified
in WK1 to WK4 respectively to develop to the solution of complex engineering
problems.
PO2 Problem Analysis: Identify, formulate, review research literature and analyze
complex engineering problems reaching substantiated conclusions with consideration
for sustainable development. (WK1 to WK4)
PO3 Design/development of solutions: Design creative solutions for complex
engineering problems and design/develop systems/components/processes to meet
identified needs with consideration for the public health and safety, whole-life cost,
net zero carbon, culture, society and environment as required. (WK5)
PO4 Conduct Investigations of Complex Problems: Conduct investigations of complex
engineering problems using research-based knowledge including design of
experiments, modelling, analysis & interpretation of data to provide valid
conclusions. (WK8).
PO5 Engineering Tool Usage: Create, select and apply appropriate techniques, resources
and modern engineering & IT tools, including prediction and modelling recognizing
their limitations to solve complex engineering problems. (WK2 and WK6)
PO6 The Engineer and The World: Analyze and evaluate societal and environmental
aspects while solving complex engineering problems for its impact on sustainability
with reference to economy, health, safety, legal framework, culture and environment.
(WK1, WK5, and WK7).
PO7 Ethics: Apply ethical principles and commit to professional ethics, human values,
diversity and inclusion; adhere to national & international laws. (WK9)
PO8 Individual and Collaborative Team Work: Function effectively as an individual,
and as a member or leader in diverse teams, and in multidisciplinary settings.
PO9 Communications: Communicate effectively and inclusively within the engineering
community and society at large, such as being able to comprehend and write effective
reports and design documentation, make effective presentations considering cultural,
language, and learning differences
PO10 Project management and Finance: Apply knowledge and understanding of
engineering management principles and economic decision-making and apply these
to one’s own work, as a member and leader in a team, and to manage projects and in
multidisciplinary environments.
PO11 Life-long Learning: Recognize the need for, and have the preparation and ability for
i) independent and life-long learning ii) adaptability to new and emerging
technologies and iii) critical thinking in the broadest context of technological change.
(WK8)
3
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
COURSE OUTCOMES
4
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program
Specific
Program Outcomes (POs)
Outcomes
Course
(PSOs)
Outcome
PO10
PO11
PSO1
PSO2
PO1
PO2
PO3
PO4
PO5
PO6
PO7
PO8
PO9
BCDS551 3 2 3 1 3 - - - - - 3 3 2
BCDS551 2 3 2 1 2 - - - - - 3 3 3
BCDS551 3 3 3 2 2 - - - - - 3 3 3
3 2.33
Average 3 2.35 2.35 1.68 2.69 - - - - - 3
CO/Rubric-Experiment Mapping
Rubrics Marks
Pre-Lab
10
Writing Work
Record 5
Viva Voice 5
Total: 20
5
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
(List of Equipment)
Table: Hardware and Software Requirements
S.No. Category Requirement
1 Hardware • Computers (at least Intel i5 processor, 8GB RAM,
Requirements 500GB HDD/SSD) Monitors, Keyboards
Projector/Smartboard (for instructor
demonstrations) Network Router/Switch (for internet
and collaborative coding)
2 Software Operating System:
Requirements ✓ Windows
Programming Languages:
• Python
Google Colab
6
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
List of Experiments
As per AKTU Syllabus
Exp. COs
Name of Experiment
No.
Get input from the user and perform numerical operations: CO1
1
MAX, MIN, AVG, SUM, SQRT, ROUND
Perform data import/export operations for .CSV, .XLS, CO2
2
and .TXT using data frames
Input a matrix from the user and perform matrix addition, CO1
3 subtraction, multiplication, inverse, transpose, and division
using vectors
Perform statistical operations: Mean, Median, Mode, and CO2
4
Standard Deviation
Perform data pre-processing operations: CO1
5 i) Handling missing data
ii) Min-Max normalization
Perform dimensionality reduction using PCA on a Houses CO2
6 dataset
Collect data via web scraping, APIs, or data connectors from CO3
9
instructor-specified sources
Perform association analysis (e.g., Apriori) on a dataset CO4
10
and evaluate its accuracy
Build a recommendation system on a dataset and evaluate its CO4
11
accuracy
Build a time-series model on a dataset and evaluate its CO3
12
accuracy
7
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
8
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
9
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Evaluation Scheme:
The marks are based on the performance of the students in the following activities:
Total Subject Marks = 100
Marks Marks Distribution
Components
Distribution (%)
Record 30 30%
Quiz / Viva 20 20%
End Semester Examination 50 50%
10
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Students are advised to come to the laboratory at least 5 minutes before (to the starting time),
those who come after 5 minutes will not be allowed into the lab.
Plan your task properly much before to the commencement, come prepared to the lab with the
synopsis / program / experiment details.
Student should enter into the laboratory with:
a. Laboratory observation notes with all the details (Problem statement, Aim,
Algorithm, Procedure, Program, Expected Output, etc.,) filled in for the lab session.
b. Laboratory Record updated up to the last session experiments and other utensils (if
any) needed in the lab.
c. Proper Dress code and Identity card.
4Sign in the laboratory login register, write the TIME-IN, and occupy the computer system
allotted to you by the faculty.
5Execute your task in the laboratory, and record the results / output in the lab observation note
book, and get certified by the concerned faculty.
All the students should be polite and cooperative with the laboratory staff, must maintain the
discipline and decency in the laboratory.
Computer labs are established with sophisticated and high end branded systems, which should
be utilized properly.
Students / Faculty must keep their mobile phones in SWITCHED OFF mode during the lab
sessions. Misuse of the equipment, misbehaviors with the staff and systems etc., will attract
severe punishment.
Students must take the permission of the faculty in case of any urgency to go out; if anybody
found loitering outside the lab / class without permission during Description (Working) hours
will be treated seriously and punished appropriately.
Students should LOG OFF/ SHUT DOWN the computer system before he/she leaves the lab
after completing the task (experiment) in all aspects. He/she must ensure the system / seat is
kept properly.
11
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Do’s Don’ts
Bring your lab record with you. Overcrowd near the system.
Always come to the lab in time. Make any mark on the instruction
manual.
Maintain silence and be attentive. Drag the stool/chair inside the lab.
Keep individual records of observations. Leave the lab till the lab is over.
Always turn off the computer properly Take any material outside the lab with
before leaving the lab. you.
12
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
1. To get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT,
ROUND) using in R.
2. To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in R.
3. To get the input matrix from user and perform Matrix addition, subtraction, multiplication,
inverse transpose and division operations using vector concept in R.
4. To perform statistical operations (Mean, Median, Mode and Standard deviation) using R.
5. To perform data pre-processing operations i) Handling Missing data ii) Min-Max normalization
6. To perform dimensionality reduction operation using PCA for Houses Data Set
7. To perform Simple Linear Regression with R.
8. To perform K-Means clustering operation and visualize for iris data set
9. Learn how to collect data via web-scraping, APIs and data connectors from suitable
sources as specified by the instructor.
10. Perform association analysis on a given dataset and evaluate its accuracy.
11. Build a recommendation system on a given dataset and evaluate its accuracy.
12. Build a time-series model on a given dataset and evaluate its accuracy.
13. Build cartographic visualization for multiple datasets involving various countries of
the world; states and districts in India etc.
14. Perform text mining on a set of documents and visualize the most important words in a
visualization such as word cloud.
Note: The Instructor may add/delete/modify/tune experiments, wherever he/she
feels in a justified manner It is also suggested that open source tools should be
preferred to conduct the lab (R, Python etc. )
13
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 1.
Aim: - To get the input from user and perform numerical operations (MAX, MIN, AVG,
SUM, SQRT, ROUND) using in Python.
Description: -
This experiment aims to demonstrate how to accept numerical input from a user in Python and perform
basic mathematical operations such as finding the maximum, minimum, average (mean), and sum of
the input values. Additionally, it calculates the square root of each number using the math module and
rounds both the original numbers and their square roots to the nearest whole number or to two decimal
places. The input is taken as a comma-separated string, converted into a list of floating-point numbers,
and processed using built-in Python functions like max(), min(), sum(), and round(). The "Numerical
Operations Flowchart" in figure 1.1, illustrates the steps to input a number and perform various
operations such as MAX, MIN, AVG, SUM, SQRT, and ROUND. It guides the user through decision
points to choose the desired operation and display the result.
Apparatus Used:
1. Start
2. Prompt the user to enter a list of numbers separated by commas
3. Read the input from the user as a string
4. Split the input string using commas to extract individual number strings
5. Convert each extracted string into a float and store them in a list
6. Perform the following operations on the list of numbers:
a. Find the maximum number using max()
b. Find the minimum number using min()
c. Calculate the sum using sum()
d. Compute the average by dividing the sum by the total number of elements
e. Compute the square root of each number using math.sqrt()
f. Round each number and its square root using round()
7. Display all the computed results
8. End
14
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
15
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import math
user_input = input("Enter numbers separated by spaces: ")
numbers = [float(num) for num in user_input.split()]
maximum = max(numbers)
minimum = min(numbers)
average = sum(numbers) / len(numbers)
total_sum = sum(numbers)
square_roots = [math.sqrt(num) if num >= 0 else 'NaN' for num in
numbers]
rounded_values = [round(num, 2) for num in numbers]
print("\n--- Results ---")
print(f"Numbers Entered: {numbers}")
print(f"Maximum: {maximum}")
print(f"Minimum: {minimum}")
print(f"Average: {average}")
print(f"Sum: {total_sum}")
print(f"Square Roots: {square_roots}")
print(f"Rounded Values (2 decimal places): {rounded_values}")
16
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
17
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 2.
Description-
This experiment is designed to demonstrate how to perform data import and export operations
using data frames in Python, a crucial skill in data analytics. The primary goal is to read data
from different file formats such as .CSV (Comma-Separated Values), .XLS (Excel), and .TXT
(Text) into a structured format using the pandas library. Pandas provides convenient functions
like read_csv(), read_excel(), and read_table() to load data into a DataFrame, which allows for
efficient manipulation and analysis A flowchart is a diagram that represents a process or
workflow using symbols, such as rectangles, diamonds, and arrows, to show steps and decision
points. It visually illustrates the sequence of operations or decisions in a clear and structured
manner in figure 2.1.
Apparatus Used:
Step 1: Start
Step 2: Import the required library pandas
Step 3: Import the dataset from a .CSV file using pandas.read_csv()
Step 4: Import the dataset from a .XLS or .XLSX file using pandas.read_excel()
Step 5: Import the dataset from a .TXT file using pandas.read_table() or read_csv() with delimiter
Step 6: Display the contents of the imported data frames
Step 7: Perform any data verification or basic operations (optional)
Step 8: Export the DataFrame to a .CSV file using to_csv()
Step 9: Export the DataFrame to an .XLSX file using to_excel()
Step 10: Export the DataFrame to a .TXT file using to_csv() with sep='\t'
Step 11: End
18
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
Program Code:-
import pandas as pd
csv_df = pd.read_csv('data.csv')
print("CSV Data:\n", csv_df)
excel_df = pd.read_excel('data.xlsx')
print("\nExcel Data:\n", excel_df)
19
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
20
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment 3.
Aim 3:- To get the input matrix from user and perform Matrix addition, subtraction,
multiplication, inverse transpose and division operations using vector concept in python
Description: -
This experiment aims to demonstrate how to input matrices from the user and perform
fundamental matrix operations using the vector and array concepts in Python. The operations
include matrix addition, subtraction, multiplication, transpose, inverse, and division (element-
wise or matrix-wise, where applicable). The experiment utilizes Python’s NumPy library,
which provides efficient support for creating and manipulating multi-dimensional arrays
(vectors and matrices The "Matrix Operations Flowchart" visually represents the
sequence of steps for performing matrix addition, subtraction,
multiplication, inverse, transpose, and division in figure 3.1.
Apparatus Used:
Python Programming Language – For writing and executing the code.
Python IDE or Code Editor – Such as Google Colab, Jupyter Notebook.
Standard Input/output Devices – Keyboard for input, Monitor for output display.
Python Libraries –
math module (for square root calculation)
Built-in functions (max(), min(), sum(), round())
Algorithm: -
Step 1: Start
Step 2: Input two matrices, Matrix1 and Matrix2
Step 3: Matrix Addition: If dimensions match, add corresponding elements of Matrix1 and
Matrix2
Step 4: Matrix Subtraction: If dimensions match, subtract corresponding elements of
Matrix1 and Matrix2
Step 5: Matrix Multiplication: If columns of Matrix1 = rows of Matrix2, compute dot
product
Step 6: Matrix Inverse: If matrix is square and non-singular, compute inverse
Step 7: Matrix Transpose: Swap rows and columns of Matrix1 or Matrix2
Step 8: Matrix Division: If Matrix2 is invertible, multiply Matrix1 with the inverse of
Matrix2
Step 9: End
21
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
22
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code:-
import statistics
# Get user input
data = input("Enter numbers separated by spaces: ")
# Convert input string to a list of floats
numbers = list(map(float, data.split()))
# Perform statistical operations
mean = statistics.mean(numbers)
median = statistics.median(numbers)
try:
mode = statistics.mode(numbers)
except statistics.StatisticsError:
mode = "No unique mode"
stdev = statistics.stdev(numbers)
# Display results
print("\n--- Statistical Results ---")
print(f"Numbers: {numbers}")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Standard Deviation: {stdev}")
23
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
24
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
25
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment 4.
Description-
To perform statistical operations such as Mean, Median, Mode, and Standard
Deviation in Python, we can use the built-in statistics module. First, the
data is input as a list or array of numbers. The Mean is calculated by
summing all values and dividing by the total number of values. The Median is
the middle value when the data is sorted, while the Mode is the most
frequently occurring value. Lastly, the Standard Deviation is calculated to
measure how spread out the data is from the mean. These operations help
summarize and analyze datasets effectively. The "Statistical Operations
Flowchart" in figure 4.1, outlines the step-by-step process for performing
common statistical calculations such as mean, median, mode, and standard
deviation. It guides users from inputting the dataset to calculating and
displaying the results for each operation.
Apparatus Used:
Flowchart:-
27
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code-
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# Sample dataset with missing values
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, np.nan, 30, 22, 28],
'Salary': [50000, 54000, np.nan, 42000, 60000]
}
df = pd.DataFrame(data)
print("Original Data:\n", df)
# -----------------------------
# i) Handling Missing Data
# -----------------------------
# Method 1: Drop rows with missing values
df_dropped = df.dropna()
print("\nData after dropping rows with missing values:\n", df_dropped)
# Method 2: Fill missing values with mean (only for numeric columns)
df_filled = df.copy()
df_filled['Age'].fillna(df['Age'].mean(), inplace=True)
df_filled['Salary'].fillna(df['Salary'].mean(), inplace=True)
print("\nData after filling missing values with mean:\n", df_filled)
# -----------------------------
# ii) Min-Max Normalization
# -----------------------------
# Apply Min-Max Normalization only to numeric columns
scaler = MinMaxScaler()
numeric_cols = ['Age', 'Salary']
df_normalized = df_filled.copy()
df_normalized[numeric_cols] = scaler.fit_transform(df_filled[numeric_cols])
print("\nData after Min-Max Normalization:\n", df_normalized)
28
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
29
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment 5.
30
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart:
31
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# Sample dataset with missing values
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, np.nan, 30, 22, 28],
'Salary': [50000, 54000, np.nan, 42000, 60000]
}
df = pd.DataFrame(data)
print("Original Data:\n", df)
# Method 1: Drop rows with missing values
df_dropped = df.dropna()
print("\nData after dropping rows with missing values:\n", df_dropped)
# Method 2: Fill missing values with mean (only for numeric columns)
df_filled = df.copy()
df_filled['Age'].fillna(df['Age'].mean(), inplace=True)
df_filled['Salary'].fillna(df['Salary'].mean(), inplace=True)
print("\nData after filling missing values with mean:\n", df_filled)
# Apply Min-Max Normalization only to numeric columns
scaler = MinMaxScaler()
numeric_cols = ['Age', 'Salary']
df_normalized = df_filled.copy()
df_normalized[numeric_cols] =
scaler.fit_transform(df_filled[numeric_cols])
print("\nData after Min-Max Normalization:\n", df_normalized)
32
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
33
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment 6
Start
Project the original data onto the top k eigenvectors to get the reduced dataset.
End
34
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
35
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Sample Houses dataset (or replace with your CSV file)
data = {
'Price': [250000, 185000, 340000, 275000, 200000],
'Area': [2000, 1500, 2500, 2100, 1600],
'Bedrooms': [4, 3, 4, 3, 2],
'Bathrooms': [3, 2, 3, 2, 1],
'Age': [10, 15, 5, 12, 20]
}
df = pd.DataFrame(data)
print("Original Data:\n", df)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
print("\nPCA Result (2 components):\n", pca_df)
print("\nExplained Variance Ratio:", pca.explained_variance_ratio_)
print("Total Variance Captured:", sum(pca.explained_variance_ratio_))
plt.figure(figsize=(6, 4))
plt.scatter(pca_df['PC1'], pca_df['PC2'], color='green')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Houses Dataset')
plt.grid(True)
plt.show()
36
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
37
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 7.
38
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
39
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
X = np.array([500, 750, 1000, 1250, 1500]).reshape(-1, 1) # Area in
sq.ft
y = np.array([100000, 150000, 200000, 250000, 300000]) # Price in $
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
print("Intercept (b0):", model.intercept_)
print("Slope (b1):", model.coef_[0])
print("R² Score:", r2_score(y, y_pred))
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, y_pred, color='red', label='Regression Line')
plt.xlabel('Area (sq.ft)')
plt.ylabel('Price ($)')
plt.title('Simple Linear Regression')
plt.legend()
plt.grid(True)
plt.show()
40
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 8.
AIM: -To perform K-Means clustering operation and visualize for iris data set
Description: -
K-Means clustering is an unsupervised machine learning algorithm used for clustering data
into K distinct groups (clusters) based on feature similarities. The algorithm assigns each data
point to the cluster whose centroid is closest, where the centroid is the mean of the points in
that cluster.In the case of the Iris Data Set, K-Means clustering can be used to group the iris
flowers into distinct species based on features like petal length, petal width, sepal length, and
sepal width.
Apparatus Used:
Start
41
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
42
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
df = pd.DataFrame(X, columns=feature_names)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.title("K-Means Clustering on Iris Dataset")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.grid(True)
plt.colorbar(label="Cluster")
plt.show()
43
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 9.
A fraud detection system using machine learning involves leveraging algorithms to identify potentially
fraudulent activities by analyzing patterns and anomalies in data. By training a model on historical
transaction data, such as credit card transactions, insurance claims, or online payments, the system can
classify new transactions as either legitimate or fraudulent. Common machine learning algorithms like
Random Forest, Logistic Regression, and XGBoost are used for classification tasks, as they can
effectively handle complex datasets and provide high accuracy. The system works by extracting key
features from the data, such as transaction amount, time, and location, and using these features to build
predictive models. Once trained, the model can be deployed to make real-time predictions on new
transactions, alerting authorities or systems of potential fraud..
Apparatus Used:
Algorithm:
Start
Collect Data:
Gather historical data related to transactions, such as transaction amount, user details,
time, and location.
Preprocess Data:
Handle Missing Data: Fill or remove missing values in the dataset.
Encode Categorical Variables: Convert categorical variables into numeric values (e.g.,
using one-hot encoding).
Feature Selection: Identify relevant features that are most likely to indicate fraud (e.g.,
transaction amount, frequency).
Split Data: Divide the dataset into training and testing sets (e.g., 80% training, 20%
testing).
Choose Machine Learning Model:
Select an appropriate model for classification (e.g., Random Forest, Logistic Regression,
XGBoost).
Train the Model:
Use the training dataset to train the machine learning model.
Adjust hyperparameters if needed to improve model performance.
Evaluate the Model:
Test the model on the testing dataset.
Evaluate performance using metrics like accuracy, precision, recall, F1 score, and AUC-
ROC curve.
Make Predictions:
Use the trained model to classify new transactions as fraudulent or non-fraudulent.
Deploy the Model:
Integrate the model into the real-time transaction system to detect fraud on incoming
transactions.
44
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
45
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset (you can replace this with your actual dataset)
url =
'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titan
ic.csv'
data = pd.read_csv(url)
# Feature selection
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']
X = data[features] # Feature variables
y = data['Survived'] # Target variable (1: survived, 0: not survived)
# Make predictions
y_pred = model.predict(X_test)
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Not
46
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
47
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Experiment No 10.
Deep Learning Image Classification is the process of using deep learning models, particularly
Convolutional Neural Networks (CNNs), to automatically categorize images into predefined classes
or categories. This technique enables machines to identify objects, people, or scenes within an image
by learning from vast amounts of labeled image data.Deep learning models can learn complex patterns
in images, including spatial hierarchies of features such as edges, textures, shapes, and objects, which
makes them highly effective for tasks like image recognition, object detection, and scene segmentation.
Apparatus Used:
Algorithm:
Step 1: Load and prepare the image dataset.
Step 2: Preprocess the images by resizing them to a consistent size and normalizing their pixel values.
Step 3: Define a Convolutional Neural Network (CNN) with multiple convolutional and pooling
layers, followed by fully connected layers.
Step 4: Compile the model using an optimizer (e.g., Adam) and a loss function suited for
classification (e.g., categorical cross-entropy).
Step 5: Train the model using the training dataset and validate its performance on the validation
dataset.
Step 6: After training, evaluate the model on a separate test set to check its generalization
performance.
Step 7: Use the trained model to predict the class labels of new images.
Step 8: Deploy the model for real-time predictions in a production environment.
48
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Flowchart: -
49
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
Program Code: -
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
50
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
# Accuracy plot
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training accuracy')
plt.plot(history.history['val_accuracy'], label='Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
# Loss plot
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# Make a prediction
predictions = model.predict(img)
predicted_label = tf.argmax(predictions, axis=1).numpy()[0]
51
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
52
Galgotias College of Engineering and Technology
1, Knowledge Park II, Greater Noida – 201 310 (UP) INDIA
53