0% found this document useful (0 votes)

8 views7 pages

ML FINAL Lab Manual

The document is a lab manual for machine learning, detailing steps to set up Python and essential libraries, visualize datasets, preprocess data, and implement various machine learning algorithms including k-NN, linear regression, decision trees, and K-means clustering. It includes code snippets for handling missing data, encoding categorical variables, and evaluating model performance. The manual also covers loading and exploring datasets from CSV and Excel files using pandas.

Uploaded by

manvithajhebbar04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views7 pages

ML FINAL Lab Manual

Uploaded by

manvithajhebbar04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Machine Learning Lab Manual

1 Install set up Python and essential Libraries and display the version
#listed below
#numPy
#pandas
#scikit-learn
#matplotlib
#seaborn

!pip install numpy

!pip install pandas
!pip install -U scikit-learn
!pip install matplotlib

import numpy
import pandas
import sklearn
import matplotlib

#version
print(numpy.__version__)
print(pandas.__version__)
print(sklearn.__version__)
print(matplotlib.__version__)

#2 Introduce scikit-learn as a machine learning library

!pip install -U scikit-learn

#3 Write a program to visualize the dataset to gain insights using Matplotlib or

seaborn by plotting scatter plots, bar charts

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def visualize_dataset(file_path):
# Load the dataset into a pandas DataFrame
df = pd.read_csv(file_path)

# Plot scatter plots (pairplot)

sns.pairplot(df)
plt.title("Pairplot of the Dataset")
plt.show()

# Plot bar chart for categorical column (assuming the first column is
categorical)
if df.iloc[:, 0].dtype == 'object':
sns.countplot(x=df.columns[0], data=df)
plt.title("Bar Chart of Categorical Column")
plt.xlabel(df.columns[0])
plt.ylabel("Count")
plt.show()
else:
print("No categorical column found to plot bar chart.")

# Example usage
file_path = "C:\\Users\\NDC43\\Downloads\\Iris.csv"
visualize_dataset(file_path)

#4 Write a program to Handle missing data,encode categorical variables,and perform

Feature Scaling
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

# Load Iris dataset

iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target

def preprocess_dataset(df):
# Handle missing data (Iris dataset doesn't have missing values, but we'll
simulate some)
df.iloc[::10, 0] = float('NaN')

# Impute missing values

imputer = SimpleImputer(strategy='mean')
df[df.columns] = imputer.fit_transform(df[df.columns])

# Encode categorical variables (if applicable)

# Since the Iris dataset doesn't have categorical variables, we'll skip this
step

# Perform feature scaling (excluding the 'target' column)

scaler = StandardScaler()
df[df.columns[:-1]] = scaler.fit_transform(df[df.columns[:-1]])

return df

# Preprocess Iris dataset

preprocessed_df = preprocess_dataset(iris_df)

# Display preprocessed dataset

print("Preprocessed dataset:")
print(preprocessed_df.head())
#5 Write a program to implement a k-nearest Neighbours(k-NN)Classifier using
Scikit-learn and train the classifier on the dataset and evaluate the performance
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load Iris dataset

iris = load_iris()
X = iris.data # Features (independent variables)
y = iris.target # Target labels (dependent variable)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the k-NN classifier

k = 3 # Number of neighbors
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the classifier

knn_classifier.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = knn_classifier.predict(X_test)

# Evaluate the classifier's performance

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Display classification report

print("\n Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris. Target_names))

#6 Write a Python program to:

1. Load the Iris dataset

2. Convert it to a DataFrame

3. Display the full dataset and first 5 rows

4. Show dataset info

5. Check for missing values

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

print(df)
print(df.head())
print(df.info())
print(df.isnull().sum())

output:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

[150 rows x 4 columns]

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
dtypes: float64(4)
memory usage: 4.8 KB
None
sepal length (cm) 0
sepal width (cm) 0
petal length (cm) 0
petal width (cm) 0
dtype: int64

#7 Write a program to Load and explore the dataset of .CSV and Excel files using
pandas
import pandas as pd

csv_file_path ="C:\\Users\\NDC-LAB1-30\\Desktop\\data.csv"
excel_file_path ="C:\\Users\\NDC-LAB1-30\\Desktop\\data2.xlsx"

data_csv = pd.read_csv(csv_file_path)
print("CSV file data:")
print(data_csv)

data_excel = pd.read_excel(excel_file_path)
print("\nExcel file data:")
print(data_excel)

print("\n CSV Data Description:")

print(data_csv.describe())

print("\n excel Data Description:")

print(data_excel.describe())

print("\n Datatypes in CSV file")

print(data_csv.dtypes)

print("\n Datatypes in excel file")

print(data_excel.dtypes)

#8 Write a program to implement a linear regression model for regression

tasks and Train the model on a dataset with continuous target Variable

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load California Housing dataset

california = fetch_california_housing()
X = california.data # Features (independent variables)
y = california.target # Target (dependent variable)

# Convert the data to a pandas DataFrame for easier manipulation

california_df = pd.DataFrame(data=X, columns=california.feature_names)
california_df['target'] = y

# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize Linear Regression model

linear_regression = LinearRegression()

# Train the model

linear_regression.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = linear_regression.predict(X_test)

# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Print performance metrics
print("Mean Squared Error:", mse)
print("R-squared Score:", r2)

#9 Write a program to implement a decision tree Classifier using scikit-learn and

visualize the decision tree and understand its splits.

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# Load Iris dataset

iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize Decision Tree classifier

decision_tree = DecisionTreeClassifier(random_state=42)

# Train the classifier

decision_tree.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = decision_tree.predict(X_test)

# Evaluate the classifier's performance

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Display classification report

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Visualize the trained decision tree

plt.figure(figsize=(12, 8))
plot_tree(decision_tree, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True)
plt.title("Decision Tree for Iris Dataset")
plt.show()
#10 write a program to implement K-means Clustering and Visualize clusters .
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generate sample data

X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.8, random_state=42)

# Create a K-Means clusterer with 4 clusters

kmeans = KMeans(n_clusters=4, random_state=42)

# Fit the data

kmeans.fit(X)

# Get cluster labels

labels = kmeans.labels_

# Plot the data with cluster labels

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')

# Plot the centroids

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100,
c='red', label='Centroids')

# Add title and labels

plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')

# Add legend
plt.legend()

# Show the plot

plt.show()

Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
4 pages
K Fold
No ratings yet
K Fold
2 pages
Task 7
No ratings yet
Task 7
14 pages
Untitled2 - Jupyter Notebook
No ratings yet
Untitled2 - Jupyter Notebook
4 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
No ratings yet
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
12 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
ML Expt 2
No ratings yet
ML Expt 2
5 pages
Pre-Processing Techniques - Ipynb - Colab
No ratings yet
Pre-Processing Techniques - Ipynb - Colab
3 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
6 pages
'Iris - CSV': Import As
No ratings yet
'Iris - CSV': Import As
3 pages
Train Test Splitting
No ratings yet
Train Test Splitting
3 pages
Dsbdalab 6
No ratings yet
Dsbdalab 6
5 pages
Machine Learning - Lab Record
No ratings yet
Machine Learning - Lab Record
43 pages
Normalization
No ratings yet
Normalization
4 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
No ratings yet
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
21 pages
Exp 5,6,7
No ratings yet
Exp 5,6,7
2 pages
7 Output
No ratings yet
7 Output
4 pages
Import As Import As From Import Import As Import As From Import From Import From Import
No ratings yet
Import As Import As From Import Import As Import As From Import From Import From Import
6 pages
Karisma 23011101119 Eda Rec
No ratings yet
Karisma 23011101119 Eda Rec
88 pages
Decision Tree PBEL With GridSearchCV
No ratings yet
Decision Tree PBEL With GridSearchCV
12 pages
KNN Classifier on Digits Data
No ratings yet
KNN Classifier on Digits Data
3 pages
Logistic Regression on Iris Dataset
No ratings yet
Logistic Regression on Iris Dataset
7 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
SK Learn 1
No ratings yet
SK Learn 1
11 pages
ML Program 7, 8,9 And10
No ratings yet
ML Program 7, 8,9 And10
12 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
7 pages
DSE 6 - Colab
No ratings yet
DSE 6 - Colab
5 pages
Exercise 10
No ratings yet
Exercise 10
4 pages
BS SRR-3
No ratings yet
BS SRR-3
20 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
KNN052
No ratings yet
KNN052
5 pages
L3 - Classification - RandomForest - Jupyter Notebook
No ratings yet
L3 - Classification - RandomForest - Jupyter Notebook
6 pages
Z-Test Implementation with Pandas
No ratings yet
Z-Test Implementation with Pandas
39 pages
Flores
No ratings yet
Flores
4 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Project Coding-Manish Dwari 1807
No ratings yet
Project Coding-Manish Dwari 1807
1 page
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
Merged
No ratings yet
Merged
35 pages
Pandas Ds
No ratings yet
Pandas Ds
18 pages
Code
No ratings yet
Code
5 pages
EXP - 7 - Prasham Doshi - 22bec097
No ratings yet
EXP - 7 - Prasham Doshi - 22bec097
7 pages
ML#07
No ratings yet
ML#07
21 pages
1 Abril PDF
No ratings yet
1 Abril PDF
10 pages
Iris Dataset: Data Preprocessing
No ratings yet
Iris Dataset: Data Preprocessing
13 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
Model Training
No ratings yet
Model Training
6 pages
Dsa 1
No ratings yet
Dsa 1
8 pages
ML Lab Record
No ratings yet
ML Lab Record
64 pages
Program1 MLA Lab 2025 250109 144615
No ratings yet
Program1 MLA Lab 2025 250109 144615
17 pages
Ploomber Notebook Conversion - 2
No ratings yet
Ploomber Notebook Conversion - 2
14 pages
Neural Network Training with PCA
No ratings yet
Neural Network Training with PCA
8 pages
Random Forest
No ratings yet
Random Forest
5 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Data Science Practical No 03
No ratings yet
Data Science Practical No 03
5 pages
Pandas
No ratings yet
Pandas
20 pages
ML Lab Manual
No ratings yet
ML Lab Manual
23 pages
Demand Forecasting Questions
100% (1)
Demand Forecasting Questions
30 pages
Topic 23 - Linear Regression With Multiple Regressor
No ratings yet
Topic 23 - Linear Regression With Multiple Regressor
3 pages
Gwowen Shieh: Psychometrika
No ratings yet
Gwowen Shieh: Psychometrika
20 pages
Medical Statistics
No ratings yet
Medical Statistics
16 pages
Health Insurance Cost Prediction Models
No ratings yet
Health Insurance Cost Prediction Models
7 pages
Lab Manual - Ai&Ml Csl236 2025
No ratings yet
Lab Manual - Ai&Ml Csl236 2025
45 pages
The Listed Company's Credit Rating Based On Logistic Regression Model Add Non-Financial Factors
No ratings yet
The Listed Company's Credit Rating Based On Logistic Regression Model Add Non-Financial Factors
4 pages
R-Cheatsheet: Help Numerical Summaries Linear Regression
No ratings yet
R-Cheatsheet: Help Numerical Summaries Linear Regression
2 pages
(Ebook) Statistical Analysis Quick Reference Guidebook: With SPSS Examples by Alan C. Elliott, Wayne A. Woodward ISBN 9781412925600, 1412925606 Digital Version 2025
100% (12)
(Ebook) Statistical Analysis Quick Reference Guidebook: With SPSS Examples by Alan C. Elliott, Wayne A. Woodward ISBN 9781412925600, 1412925606 Digital Version 2025
169 pages
Hugging Face
No ratings yet
Hugging Face
67 pages
Group 4 CHM 812 Assgn.
No ratings yet
Group 4 CHM 812 Assgn.
7 pages
Regression Analysis
No ratings yet
Regression Analysis
29 pages
BS Economics Scheme Punjab Univ
No ratings yet
BS Economics Scheme Punjab Univ
64 pages
SEM 4 - 10 - BA-BSc - HONS - ECONOMICS - CC-10 - INTRODUCTORYECONOMETRI C - 10957
No ratings yet
SEM 4 - 10 - BA-BSc - HONS - ECONOMICS - CC-10 - INTRODUCTORYECONOMETRI C - 10957
3 pages
Birth Weight and Infant Growth Optimal Infant Weight Gain
No ratings yet
Birth Weight and Infant Growth Optimal Infant Weight Gain
8 pages
A' Level Hwange Statistics District Based Syllabus
No ratings yet
A' Level Hwange Statistics District Based Syllabus
26 pages
Bank Liquidity and Its Determinants in Romania
No ratings yet
Bank Liquidity and Its Determinants in Romania
6 pages
Machine Learning Basics & History
No ratings yet
Machine Learning Basics & History
458 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Econoch 7
No ratings yet
Econoch 7
32 pages
Costanza 2007 - Biodiversity and Ecosystem Services
No ratings yet
Costanza 2007 - Biodiversity and Ecosystem Services
14 pages
Ecf480 FPD 3 2015 2
No ratings yet
Ecf480 FPD 3 2015 2
15 pages
Obe Syllabus (Oblicon)
No ratings yet
Obe Syllabus (Oblicon)
43 pages
14 310x Data Analysis For Social Scientists
No ratings yet
14 310x Data Analysis For Social Scientists
7 pages
Module 1
No ratings yet
Module 1
138 pages
Soil Conductivity Prediction Models
100% (1)
Soil Conductivity Prediction Models
14 pages
Open Elective Courses for VII Semester Students
No ratings yet
Open Elective Courses for VII Semester Students
31 pages
Cassava Farm Income Analysis
No ratings yet
Cassava Farm Income Analysis
13 pages
Ola Data Analysis for Fare Prediction
No ratings yet
Ola Data Analysis for Fare Prediction
8 pages
Business & Social Stats Guide
No ratings yet
Business & Social Stats Guide
10 pages