0% found this document useful (0 votes)

48 views24 pages

ML LAB Manual

The document serves as a laboratory manual for machine learning, defining machine learning and its types, including supervised and unsupervised learning. It details popular Python libraries used in machine learning, such as NumPy, SciPy, and TensorFlow, and provides exercises for implementing various algorithms like Naïve Bayes, K-Nearest Neighbors, and decision trees. Additionally, it emphasizes the importance of exploratory data analysis and model evaluation techniques.

Uploaded by

srujan3k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views24 pages

ML LAB Manual

Uploaded by

srujan3k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Machine Learning Laboratory Manual

Definition of Machine Learning:

Machine Learning is the science (and art) of programming computers so they can
learn from data.

Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959

A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by
P,improves with experience E.
—Tom Mitchell, 1997
Types of Machine Learning:
Supervised learning
• In supervised learning, the training set being fed to the algorithm includes the
desired solutions, called labels
• The most important supervised learning algorithms:
1. K-Nearest Neighbors
2. Linear Regression
3. Logistic Regression
4. Support Vector Machines (SVMs)
5. Decision Trees and Random Forests
6. Neural networks
Unsupervised learning
• In unsupervised learning, the training data is unlabeled.
• The most important unsupervised learning algorithms:
1. Clustering
• K-Means
• DBSCAN
• Hierarchical Cluster Analysis (HCA)
2. Anomaly detection and novelty detection
• One-class SVM
• Isolation Forest
3. Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA
• Locally Linear Embedding (LLE)
• t-Distributed Stochastic Neighbor Embedding (t-SNE)
4. Association rule learning
• Apriori
• Eclat
Language for Machine Learning Laboratory:

Python is one of the most popular, open source programming language widely adopted by
machine learning community. It was designed by Guido van Rossum and was first released in
1991. The reference implementation of Python, i.e. CPython, is managed by Python Software
Foundation, which is a nonprofit organization.
Python has very strong libraries for advanced mathematical functionalities (NumPy),
algorithms and mathematical tools (SciPy) and numerical plotting (matplotlib). Python
libraries are collections of modules that contain useful codes and functions, eliminating the
need to write them from scratch. There are tens of thousands of Python libraries that help
machine learning developers, as well as professionals working in data science, data
visualization, and more.

Popularly used Python Libraries for Machine Learning:

1. NumPy
NumPy is a popular Python library for multi-dimensional array and matrix processing
because it can be used to perform a great variety of mathematical operations. Its capability to
handle linear algebra, Fourier transform, and more, makes NumPy ideal for machine learning
and artificial intelligence (AI) projects, allowing users to manipulate the matrix to easily
improve machine learning performance. NumPy is faster and easier to use than most other
Python libraries.
2. SciPy
SciPy is a Python library used for scientific and technical computing. It is built on top of
NumPy. It contains different modules for optimization, linear algebra, integration and
statistics. it contains different modules for optimization, linear algebra, integration and
statistics.
3. Matplotlib
Matplotlib is a Python library focused on data visualization and primarily used for creating
beautiful graphs, plots, histograms, and bar charts. It is compatible with plotting data from
SciPy, NumPy, and Pandas.
4. Pandas
Pandas is another Python library that is built on top of NumPy, responsible for preparing
high-level data sets for machine learning and training. It relies on two types of data
structures, one-dimensional (series) and two-dimensional (DataFrame). This allows Pandas to
be applicable in a variety of industries including finance, engineering, and statistics.
5. Seaborn
Seaborn is another open-source Python library, one that is based on Matplotlib (which
focuses on plotting and data visualization) but features Pandas’ data structures. Seaborn is
often used in ML projects because it can generate plots of learning data. Of all the Python
libraries, it produces the most aesthetically pleasing graphs and plots, making it an effective
choice for data analysis.
6. Scikit-learn
Scikit-learn is a very popular machine learning library that is built on NumPy and SciPy. It
supports most of the classic supervised and unsupervised learning algorithms, and it can also
be used for data mining, modelling, and analysis. Scikit-learn’s simple design offers a user-
friendly library for those new to machine learning.
7. TensorFlow
TensorFlow’s open-source Python library specializes in what’s called differentiable
programming, meaning it can automatically compute a function’s derivatives within high-
level language. Both machine learning and deep learning models are easily developed and
evaluated with TensorFlow’s flexible architecture and framework. TensorFlow can be used to
visualize machine learning models on both desktop and mobile
8. Theano
Theano is a Python library that focuses on numerical computation and is specifically made
for machine learning. It is able to optimize and evaluate mathematical models and matrix
calculations that use multi-dimensional arrays to create ML models. Theano is almost
exclusively used by machine learning and deep learning developers or programmers.
9. Keras
Keras is a Python library that is designed specifically for developing neural networks for ML
models. It can run on top of Theano and TensorFlow to train neural networks. Keras is
flexible, portable, user-friendly, and easily integrated with multiple functions.
10. PyTorch
PyTorch is a popular open-source Python machine learning library based on Torch and
developed by Facebook. Torch is an open-source machine learning library implemented in C
with a Lua wrapper. In fact, you can use your favorite Python packages (e.g., Cython,
NumPy, SciPy) to extend PyTorch.
PyTorch has two predominant, high-level features:
1. Tensor computation coupled with strong GPU acceleration
2. Deep neural networks constructed on a tape-based autograd system
PyTorch has a vast selection of tools and libraries that support computer vision, natural
language processing (NLP), and a host of other Machine Learning programs. Pytorch allows
developers to conduct computations on Tensors with GPU acceleration and aids in creating
computational graphs. Considered one of the best deep learning and machine learning
frameworks, it faces stiff competition from TensorFlow.
LIST OF EXERCISES:
1. Introduction to Python machine learning libraries.
Following are the web links for machine learning libraries documents. Work with these
libraries to get acquittance with each library. Refer these documents if required when
solving the problems.
Library Name Web link
Numpy https://numpy.org/doc/stable/user/absolute_beginners.html
SciPy https://docs.scipy.org/doc/scipy/tutorial/index.html
Matplotlib https://matplotlib.org/stable/tutorials/index.html
Pandas https://pandas.pydata.org/docs/getting_started/index.html
Seaborn https://seaborn.pydata.org/tutorial.html
Scikit-learn https://scikit-learn.org/1.1/user_guide.html
TensorFlow https://www.tensorflow.org/guide
Theano https://www.projectpro.io/data-science-in-python-tutorial/theano-deep-learning-
tutorial-
Keras https://keras.io/getting_started/
PyTorch https://pytorch.org/tutorials/beginner/basics/intro.html

2. Use Naïve Bayes classifier to solve the credit card fraud detection problem.
Download the dataset for the credit card fraud detection from following link:
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Save the dataset and note down the path. Follow THE following steps.
I. Problem Understanding
Credit card fraud detection is a classification problem where the goal is to distinguish
between fraudulent and legitimate transactions. The Naïve Bayes classifier is suitable
for this task as it works well with probabilistic models and can handle large datasets
efficiently.
II. Prepare the Environment
Ensure you have the necessary libraries installed. You can install them using pip if
you haven't already:
pip install numpy pandas scikit-learn
III. Load and Explore the Data
Start by loading your dataset and performing initial exploratory data analysis (EDA).
import pandas as pd
# Load the dataset (replace 'path_to_file' with the actual file path)
data = pd.read_csv('path_to_file.csv')
# Display basic information about the dataset
print(data.info())
print(data.head())
IV. Preprocess the Data
Preprocessing is crucial for preparing your data for the model:
• Handle Missing Values: Check for and handle any missing values.
• Feature Selection: Identify which features are relevant. For Naïve Bayes, feature
scaling is generally not required.
• Encode Categorical Variables: Convert categorical variables into numerical
format if any.
• Split the Data: Divide the dataset into features (X) and target (y), then into
training and testing sets.
from sklearn.model_selection import train_test_split
# Assume 'target' is the column with fraud labels
X = data.drop('target', axis=1)
y = data['target']
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
V. Train the Naïve Bayes Classifier
You can use different variants of Naïve Bayes depending on your data characteristics.
For continuous features, Gaussian Naïve Bayes is commonly used.
from sklearn.naive_bayes import GaussianNB
# Initialize the classifier
nb_classifier = GaussianNB()
# Fit the model to the training data
nb_classifier.fit(X_train, y_train)
VI. Make Predictions
After training the model, use it to make predictions on the test set.
# Make predictions
y_pred = nb_classifier.predict(X_test)
VII. Evaluate the Model

Evaluate the model’s performance using various metrics. For fraud detection,
precision, recall, and F1-score are particularly important due to the imbalanced nature
of fraud datasets.
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Print classification report
print(classification_report(y_test, y_pred))
# Print confusion matrix
print(confusion_matrix(y_test, y_pred))
# Print accuracy score
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
3. Implement K-Nearest Neighbor algorithm to solve classification problem.
Follow the steps discussed in problem 2. Use following in Train the model step
from sklearn.neighbors import KNeighborsClassifier
# Initialize the classifier with k=5 (default value)
knn_classifier = KNeighborsClassifier(n_neighbors=5)
# Fit the model to the training data
knn_classifier.fit(X_train, y_train)
Develop the KNN algorithm in Python.
4. Implement CART algorithm for decision tree learning. Use an appropriate data set
for building the decision tree and apply this knowledge to classify a new sample.
Explore the problem of overfitting in decision tree and develop solution using
pruning technique.
Follow the steps discussed in problem 2. Use following in Train the model step.
Train the Decision Tree Classifier
Initialize and train the decision tree classifier. The DecisionTreeClassifier in scikit-learn
is the implementation of the CART algorithm.

from sklearn.tree import DecisionTreeClassifier

# Initialize the classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
# Fit the model to the training data
dt_classifier.fit(X_train, y_train)

Add following 8th step for addressing Overfitting with Pruning

Overfitting Problem: Decision trees tend to overfit the training data by creating complex
trees that capture noise. This results in poor generalization to new data.
Solution: Pruning Techniques
Pruning helps to reduce the complexity of the decision tree and mitigate overfitting.
scikit-learn provides several ways to control the complexity of the tree:
1. Limit Tree Depth:
o Restricting the maximum depth of the tree.
2. Minimum Samples Per Leaf:
o Setting the minimum number of samples required to be at a leaf node.
3. Minimum Samples Split:
o Setting the minimum number of samples required to split an internal node.
4. Maximum Features:
o Limiting the number of features considered when looking for the best split.
Here’s how to apply these parameters:

# Initialize the classifier with pruning parameters

pruned_dt_classifier = DecisionTreeClassifier(
max_depth=5, # Maximum depth of the tree
min_samples_split=10, # Minimum number of samples required to split an internal node
min_samples_leaf=5, # Minimum number of samples required to be at a leaf node
max_features='sqrt', # Use sqrt(n_features) features for each split
random_state=42
)
# Fit the model to the training data
pruned_dt_classifier.fit(X_train, y_train)
In 9th step, Evaluate the Pruned Model to assess the performance of the pruned decision
tree classifier.

# Make predictions with the pruned model

y_pred_pruned = pruned_dt_classifier.predict(X_test)
# Print classification report
print(classification_report(y_test, y_pred_pruned))
# Print confusion matrix
print(confusion_matrix(y_test, y_pred_pruned))
# Print accuracy score
print(f'Accuracy: {accuracy_score(y_test, y_pred_pruned)}')

5. Perform Exploratory Data Analysis on the given dataset. Implement CART

algorithm for decision tree learning. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample.
Follow the following steps. Use the dataset specified by the faculty.
1. Understand the Problem
Clarify the objective of your analysis. In this case, you're using a CART-based decision
tree classifier to understand patterns in the data and gain insights into how different
features affect the target variable.
2. Prepare the Environment
Ensure you have the necessary libraries installed:
pip install numpy pandas scikit-learn matplotlib seaborn
3. Load and Explore the Dataset
Start by loading the dataset and performing initial exploratory data analysis (EDA).

import pandas as pd
# Load the dataset
data = pd.read_csv('path_to_file.csv')
# Display basic information about the dataset
print(data.info())
# Show the first few rows of the dataset
print(data.head())
# Summary statistics
print(data.describe())
4. Visualize Data Distributions
Explore the distribution of features and the target variable using visualizations.

import matplotlib.pyplot as plt

import seaborn as sns
# Plot the distribution of the target variable
sns.countplot(data['target'])
plt.title('Distribution of Target Variable')
plt.show()
# Plot the distribution of numerical features
numerical_features = data.select_dtypes(include=['int64', 'float64'])
for feature in numerical_features.columns:
plt.figure()
sns.histplot(data[feature], kde=True)
plt.title(f'Distribution of {feature}')
plt.show()
# Pairplot to explore relationships between features and target
sns.pairplot(data, hue='target')
plt.show()
5. Preprocess the Data
Prepare the data for modeling:
Handle Missing Values: Impute or drop missing values as necessary.
Encode Categorical Variables: Convert categorical variables into numerical format if
required.
Feature Scaling (if necessary): Though not strictly necessary for decision trees,
standardizing or normalizing can help in visualizations.

from sklearn.model_selection import train_test_split

# Handle missing values (example: fill with median for numerical features)
data.fillna(data.median(), inplace=True)
# Convert categorical variables (example: one-hot encoding)
data = pd.get_dummies(data, drop_first=True)
# Split the dataset into features and target
X = data.drop('target', axis=1)
y = data['target']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
6. Train the Decision Tree Classifier
Train the CART-based decision tree classifier on the training data.

from sklearn.tree import DecisionTreeClassifier

# Initialize the classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
# Fit the model to the training data
dt_classifier.fit(X_train, y_train)
7. Visualize the Decision Tree
Visualize the decision tree to understand the decision rules and how features are split.

from sklearn.tree import plot_tree

# Plot the decision tree
plt.figure(figsize=(20,10))
plot_tree(dt_classifier, feature_names=X.columns, class_names=y.unique(),
filled=True, rounded=True)
plt.title('Decision Tree Visualization')
plt.show()
8. Analyze Feature Importance
Determine which features are most influential in making predictions.

# Get feature importances

importances = dt_classifier.feature_importances_
features = X.columns
# Create a DataFrame for feature importances
feature_importance_df = pd.DataFrame({'Feature': features, 'Importance': importances})
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)
# Plot feature importances
plt.figure(figsize=(10,6))
sns.barplot(x='Importance', y='Feature', data=feature_importance_df)
plt.title('Feature Importance')
plt.show()

9. Evaluate the Model

Assess the performance of the decision tree classifier on the test set.

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Make predictions
y_pred = dt_classifier.predict(X_test)
# Print classification report
print(classification_report(y_test, y_pred))
# Print confusion matrix
print(confusion_matrix(y_test, y_pred))
# Print accuracy score
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

10. Address Overfitting (Optional)

If the model is overfitting (i.e., performing well on training data but poorly on test data),
consider pruning the decision tree.

# Initialize the classifier with pruning parameters

pruned_dt_classifier = DecisionTreeClassifier(
max_depth=5, # Limit the depth of the tree
min_samples_split=10, # Minimum samples required to split a node
min_samples_leaf=5, # Minimum samples required to be at a leaf node
random_state=42
)
# Fit the pruned model
pruned_dt_classifier.fit(X_train, y_train)
# Make predictions with the pruned model
y_pred_pruned = pruned_dt_classifier.predict(X_test)

# Evaluate the pruned model

print(classification_report(y_test, y_pred_pruned))
print(confusion_matrix(y_test, y_pred_pruned))
print(f'Accuracy: {accuracy_score(y_test, y_pred_pruned)}')

6. Train an SVM Classifier with Linear Kernel. Use an appropriate data set for
building the SVM Classifier and apply this knowledge to classify a new sample.
Methodology to Build an SVM Classifier with Linear Kernel
1. Understand the Problem
Support Vector Machine (SVM) classifiers are powerful for classification tasks. A linear
kernel SVM is used when you expect that the data can be separated by a linear decision
boundary.
2. Prepare the Environment
Make sure you have the necessary libraries installed:
pip install numpy pandas scikit-learn
3. Load and Explore the Data
Start by loading the dataset and performing exploratory data analysis (EDA).

import pandas as pd
# Load the dataset (replace 'path_to_file' with your file path)
data = pd.read_csv('path_to_file.csv')
# Display basic information about the dataset
print(data.info())
# Show the first few rows
print(data.head())
# Summary statistics
print(data.describe())
4. Preprocess the Data
Prepare the dataset for training the SVM classifier:
Handle Missing Values: Impute or remove missing values.
Encode Categorical Variables: Convert categorical variables to numerical format.
Feature Scaling: SVMs require feature scaling for optimal performance. Standardize
features so that they have a mean of 0 and a standard deviation of 1.
Split the Data: Divide the dataset into features (X) and target labels (y), then split into
training and testing sets.

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler
# Assume 'target' is the column with labels
X = data.drop('target', axis=1)
y = data['target']
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
5. Train the SVM Classifier with Linear Kernel
Initialize and train the SVM classifier with a linear kernel.

from sklearn.svm import SVC

# Initialize the SVM classifier with a linear kernel
svm_classifier = SVC(kernel='linear', random_state=42)
# Fit the model to the training data
svm_classifier.fit(X_train, y_train)

6. Make Predictions
Use the trained model to make predictions on the test set.

# Make predictions
y_pred = svm_classifier.predict(X_test)

7. Evaluate the Model

Assess the performance of the SVM classifier using appropriate metrics.

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Print classification report
print(classification_report(y_test, y_pred))
# Print confusion matrix
print(confusion_matrix(y_test, y_pred))
# Print accuracy score
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
8. Tune Hyperparameters (Optional)
Although the linear kernel does not have many parameters, you can still tune the
regularization parameter C to optimize model performance.

from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {'C': [0.1, 1, 10, 100, 1000]}
# Initialize GridSearchCV
grid_search = GridSearchCV(SVC(kernel='linear', random_state=42), param_grid, cv=5,
scoring='accuracy')
# Fit GridSearchCV
grid_search.fit(X_train, y_train)
# Print the best parameters and score
print(f'Best parameters: {grid_search.best_params_}')
print(f'Best score: {grid_search.best_score_}')

7. Build linear regression and multiple regression models to predict the price of the
house (Boston House Prices Dataset).
Download the Boston House Price Dataset from following link:
https://www.kaggle.com/datasets/vikrishnan/boston-house-prices/data
Save the dataset to use while building regression models to predict the price.
Methodology for Building Linear and Multiple Regression Models
1. Understand the Problem
The goal is to predict house prices based on features using linear and multiple regression
models. Linear regression predicts a continuous target variable as a function of one or
more input features.
2. Prepare the Environment
Ensure you have the necessary libraries installed:
pip install numpy pandas scikit-learn matplotlib seaborn
3. Load and Explore the Data
Load the Boston House Prices dataset and perform exploratory data analysis (EDA).

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
# Load the dataset
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target
# Display basic information about the dataset
print(data.info())
# Show the first few rows
print(data.head())
# Summary statistics
print(data.describe())

4. Visualize Data Relationships

Explore the relationships between features and the target variable.

# Plot the relationship between features and target

sns.pairplot(data, x_vars=boston.feature_names, y_vars=['PRICE'], height=2.5)
plt.show()
# Correlation matrix
plt.figure(figsize=(12, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

5. Preprocess the Data

Prepare the dataset for modelling:
Handle Missing Values: Impute or remove missing values if necessary.
Feature Selection: Decide which features to include in the model.
Split the Data: Divide the dataset into features (X) and target (y), then split into training
and testing sets.

from sklearn.model_selection import train_test_split

# Define features and target
X = data.drop('PRICE', axis=1)
y = data['PRICE']
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

6. Build and Train Linear Regression Model

Linear regression predicts the target variable based on a single feature.

from sklearn.linear_model import LinearRegression

# Initialize the model
linear_regressor = LinearRegression()
# Train the model on the training data
linear_regressor.fit(X_train[['RM']], y_train) # 'RM' is an example feature

7. Build and Train Multiple Regression Model

Multiple regression uses multiple features to predict the target variable.

# Initialize the model

multiple_regressor = LinearRegression()
# Train the model on the training data
multiple_regressor.fit(X_train, y_train)

8. Make Predictions
Use the trained models to make predictions on the test set.

# Predictions with Linear Regression (using 'RM' feature as an example)

y_pred_linear = linear_regressor.predict(X_test[['RM']])
# Predictions with Multiple Regression
y_pred_multiple = multiple_regressor.predict(X_test)

9. Evaluate the Models

Assess the performance of both models using appropriate metrics such as Mean Absolute
Error (MAE), Mean Squared Error (MSE), and R-squared score.

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Evaluate Linear Regression
mae_linear = mean_absolute_error(y_test, y_pred_linear)
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)
print(f'Linear Regression - MAE: {mae_linear}, MSE: {mse_linear}, R2: {r2_linear}')
# Evaluate Multiple Regression
mae_multiple = mean_absolute_error(y_test, y_pred_multiple)
mse_multiple = mean_squared_error(y_test, y_pred_multiple)
r2_multiple = r2_score(y_test, y_pred_multiple)
print(f'Multiple Regression - MAE: {mae_multiple}, MSE: {mse_multiple}, R2:
{r2_multiple}')

10. Visualize Results

Visualize the predictions versus actual values to understand model performance.

# Plot actual vs predicted values for Multiple Regression

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred_multiple, alpha=0.5, color='blue', label='Predicted')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--', label='Ideal
Line')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices (Multiple Regression)')
plt.legend()
plt.show()

8. Build a polynomial regression model for predicting the salary of the employees.
Download the Employee Salary Dataset from following link:
https://www.kaggle.com/datasets/rkiattisak/salaly-prediction-for-beginer
Save the dataset to use while building a polynomial regression model to predict the
employee salary.
Methodology to Build a Polynomial Regression Model
1. Understand the Problem
The goal is to predict employee salaries based on one or more features using polynomial
regression. Polynomial regression allows you to model more complex relationships than
linear regression by fitting polynomial functions.
2. Prepare the Environment
Make sure you have the necessary libraries installed:
pip install numpy pandas scikit-learn matplotlib
3. Load and Explore the Data
Load your dataset and perform initial exploratory data analysis (EDA) to understand the
structure and relationships within the data.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
# Load the dataset (replace 'path_to_file' with your actual file path)
data = pd.read_csv('path_to_file.csv')
# Display basic information about the dataset
print(data.info())
# Show the first few rows
print(data.head())
# Summary statistics
print(data.describe())
4. Preprocess the Data
Prepare the data for polynomial regression:
Handle Missing Values: Impute or remove missing values.
Feature Selection: Choose the feature(s) to use for predicting salary.
Feature Scaling (if necessary): Polynomial features can benefit from scaling.

# Example feature selection: Assume 'Experience' is the feature and 'Salary' is the target
X = data[['Experience']]
y = data['Salary']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
5. Create Polynomial Features
Transform the feature(s) into polynomial features using PolynomialFeatures from scikit-
learn.

from sklearn.preprocessing import PolynomialFeatures

# Initialize PolynomialFeatures with degree 2 (quadratic) as an example
poly_features = PolynomialFeatures(degree=2)
# Transform the features to polynomial features
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)
6. Train the Polynomial Regression Model

Initialize and train a linear regression model on the transformed polynomial features.

from sklearn.linear_model import LinearRegression

# Initialize the linear regression model
poly_regressor = LinearRegression()
# Train the model on the polynomial features
poly_regressor.fit(X_train_poly, y_train)
7. Make Predictions
Use the trained polynomial regression model to make predictions on the test set.
# Make predictions
y_pred = poly_regressor.predict(X_test_poly)
8. Evaluate the Model
Assess the performance of the polynomial regression model using metrics like Mean
Absolute Error (MAE), Mean Squared Error (MSE), and R-squared score.
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'MAE: {mae}')
print(f'MSE: {mse}')
print(f'R2: {r2}')

9. Visualize the Results

Visualize the polynomial regression fit to understand how well the model captures the
relationship between the feature(s) and the target variable.
# Create a grid of values for plotting
X_grid = np.arange(min(X['Experience']), max(X['Experience']), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
X_grid_poly = poly_features.transform(X_grid)
# Predict salaries for the grid values
y_grid_pred = poly_regressor.predict(X_grid_poly)
# Plot the results
plt.scatter(X, y, color='red', label='Actual data')
plt.plot(X_grid, y_grid_pred, color='blue', label='Polynomial regression fit')
plt.title('Polynomial Regression')
plt.xlabel('Experience')
plt.ylabel('Salary')
plt.legend()
plt.show()
9. Build a neural network that will read the image of a digit and correctly identify the
number.
Download image dataset for digits from (0-9) from following link:
https://www.kaggle.com/datasets/karnikakapoor/digits/data
Save the dataset to use while building a neural network model to correctly identify the
number.
Note: - You can also use standard MNIST Dataset for digit classification readily
available in TensorFlow.
Methodology to Build a Neural Network Model for Digit Classification
1. Understand the Problem
The goal is to build a neural network model that can classify images of digits (0-9). This
is a multi-class classification problem where each image corresponds to one of ten
classes.
2. Prepare the Environment
Ensure you have the necessary libraries installed:
pip install numpy pandas tensorflow matplotlib scikit-learn
3. Load and Explore the Dataset
MNIST is a standard dataset for digit classification, readily available in TensorFlow's
datasets module.

import tensorflow as tf
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Display the shape of the dataset
print(f'Training data shape: {X_train.shape}')
print(f'Test data shape: {X_test.shape}')
# Display a sample image
plt.imshow(X_train[0], cmap='gray')
plt.title(f'Label: {y_train[0]}')
plt.show()

4. Preprocess the Data

Prepare the data for the neural network model:

Normalize the Data: Scale pixel values to the range [0, 1].
Flatten Images: Convert 2D images into 1D vectors.
One-hot Encode Labels: Convert class labels into one-hot encoded vectors.

# Normalize pixel values to [0, 1]

X_train = X_train / 255.0
X_test = X_test / 255.0
# Flatten images (28x28) to vectors (784)
X_train = X_train.reshape(-1, 28 * 28)
X_test = X_test.reshape(-1, 28 * 28)
# One-hot encode labels
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
5. Build the Neural Network Model

Define a neural network architecture. A simple fully connected neural network is suitable
for this task.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense
# Initialize the model
model = Sequential()
# Add layers
model.add(Dense(128, activation='relu', input_shape=(28 * 28,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
6. Train the Model
Train the model using the training data.
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
7. Evaluate the Model
Assess the performance of the model using the test data.

# Evaluate the model

test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')

8. Visualize Training History

Plot the training and validation accuracy and loss to understand the model's learning
progress.

# Plot training & validation accuracy values

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

9. Make Predictions
Use the model to make predictions on new data.

# Predict on the test set

predictions = model.predict(X_test)
# Display predictions for a sample
import numpy as np
sample_index = 0
sample_image = X_test[sample_index].reshape(28, 28)
predicted_label = np.argmax(predictions[sample_index])
true_label = np.argmax(y_test[sample_index])
plt.imshow(sample_image, cmap='gray')
plt.title(f'True label: {true_label}, Predicted label: {predicted_label}')
plt.show()

10. Solve classification problem by constructing a feed forward neural network using
Backpropagation algorithm. (Wheat Seed Data)
Download wheat seed dataset from following link:
https://www.kaggle.com/datasets/jmcaro/wheat-seedsuci
Save the dataset to use while building a feed forward neural network model.
Methodology to Build a Feed-Forward Neural Network with Backpropagation
1. Understand the Problem
The task is to build a feed-forward neural network that classifies wheat seeds into
different types based on their features. This is a multi-class classification problem.
2. Prepare the Environment
Ensure you have the necessary libraries installed:
pip install numpy pandas scikit-learn tensorflow matplotlib
3. Load and Explore the Data
Load the Wheat Seed dataset and perform exploratory data analysis (EDA).

import pandas as pd
# Load the dataset (replace 'path_to_file' with your actual file path)
data = pd.read_csv('path_to_file.csv')
# Display basic information about the dataset
print(data.info())
# Show the first few rows
print(data.head())
# Summary statistics
print(data.describe())
4. Preprocess the Data
Prepare the data for training the neural network:
Handle Missing Values: Impute or remove missing values if necessary.
Encode Categorical Variables: Convert categorical labels into numerical format using
one-hot encoding.
Feature Scaling: Normalize or standardize feature values.
Split the Data: Divide the dataset into features (X) and target labels (y), then split into
training and testing sets.

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Assume 'Type' is the target variable and needs to be one-hot encoded
X = data.drop('Type', axis=1)
y = data['Type']
# Convert categorical labels to numerical format
y = pd.get_dummies(y)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Normalize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
5. Build the Neural Network Model
Define and compile the feed-forward neural network using TensorFlow/Keras.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Initialize the model
model = Sequential()
# Add layers to the model
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu')) # Input layer
model.add(Dense(32, activation='relu')) # Hidden layer
model.add(Dense(y_train.shape[1], activation='softmax')) # Output layer
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
6. Train the Model

Fit the neural network model on the training data.

history = model.fit(X_train, y_train, epochs=50,batch_size=32, validation_split=0.2)
7. Evaluate the Model
Assess the performance of the model on the test data.

# Evaluate the model

test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')
8. Visualize Training History
Plot the training and validation accuracy and loss to understand the model's learning
process.

import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
9. Make Predictions
Use the model to make predictions and evaluate its performance on new data.

# Predict on the test set

y_pred = model.predict(X_test)
# Convert predictions from one-hot encoding to class labels
y_pred_labels = pd.DataFrame(y_pred).idxmax(axis=1)
y_test_labels = pd.DataFrame(y_test).idxmax(axis=1)
from sklearn.metrics import classification_report, confusion_matrix
# Print classification report
print(classification_report(y_test_labels, y_pred_labels))
# Print confusion matrix
print(confusion_matrix(y_test_labels, y_pred_labels))

############33* THE END *#####################

Chapter 6 Python Libraries For Machine Learning
No ratings yet
Chapter 6 Python Libraries For Machine Learning
21 pages
ML Exp
No ratings yet
ML Exp
9 pages
Machine Learning With Python Supervised Learning
No ratings yet
Machine Learning With Python Supervised Learning
114 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Best Python Libraries For Machine Learning - GeeksforGeeks
No ratings yet
Best Python Libraries For Machine Learning - GeeksforGeeks
18 pages
Core Libraries For Machine Learning
No ratings yet
Core Libraries For Machine Learning
5 pages
FDS Lab
No ratings yet
FDS Lab
11 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Machine Learning Crash Course For BCA 5th Semester
No ratings yet
Machine Learning Crash Course For BCA 5th Semester
21 pages
ML Libraries PPT (3.3)
No ratings yet
ML Libraries PPT (3.3)
10 pages
Python Machine Learning - Session 2
No ratings yet
Python Machine Learning - Session 2
6 pages
FDP AIML Day1 Part1
No ratings yet
FDP AIML Day1 Part1
61 pages
Machine Learning Document
No ratings yet
Machine Learning Document
7 pages
Library
No ratings yet
Library
23 pages
Machine Learningusing Python
No ratings yet
Machine Learningusing Python
18 pages
Sec-D ML Practical File PDF
No ratings yet
Sec-D ML Practical File PDF
19 pages
Unit 1
No ratings yet
Unit 1
62 pages
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
No ratings yet
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
1 page
Q1. What Are: Python Standard Library Ans
No ratings yet
Q1. What Are: Python Standard Library Ans
6 pages
Algorithms and Frameworks Used in The Development of Machine Learning Models
No ratings yet
Algorithms and Frameworks Used in The Development of Machine Learning Models
5 pages
Data Science & ML Essentials Guide
No ratings yet
Data Science & ML Essentials Guide
5 pages
Data Sets
No ratings yet
Data Sets
36 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
Top 18 Python Libraries for Data Science
100% (1)
Top 18 Python Libraries for Data Science
11 pages
Unit 6
No ratings yet
Unit 6
58 pages
Unit 4
No ratings yet
Unit 4
105 pages
PDF 1675791423
No ratings yet
PDF 1675791423
11 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
Machine Learning Python Packages
No ratings yet
Machine Learning Python Packages
9 pages
Mrdn-Mi 5
No ratings yet
Mrdn-Mi 5
23 pages
BCS 402 Lesson 5
No ratings yet
BCS 402 Lesson 5
16 pages
Report Print
No ratings yet
Report Print
22 pages
AI/ML Python Modules
No ratings yet
AI/ML Python Modules
17 pages
Python Ai Guide
No ratings yet
Python Ai Guide
52 pages
PDS Labmanualword
No ratings yet
PDS Labmanualword
32 pages
ML LAB Record
No ratings yet
ML LAB Record
51 pages
Summer Project
No ratings yet
Summer Project
54 pages
Top 10 Machine Learning Algorithms
100% (1)
Top 10 Machine Learning Algorithms
12 pages
Data Science
No ratings yet
Data Science
17 pages
Questions Answers Chapter Wise
No ratings yet
Questions Answers Chapter Wise
4 pages
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
No ratings yet
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
30 pages
Casestudy ML
No ratings yet
Casestudy ML
4 pages
Roadmap
No ratings yet
Roadmap
27 pages
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
Essential Python Libraries for Data Science
No ratings yet
Essential Python Libraries for Data Science
4 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Python Libraries
No ratings yet
Python Libraries
9 pages
PPT-Final Project - DT - Done All Final
No ratings yet
PPT-Final Project - DT - Done All Final
14 pages
Machine Learning in Data Science & Big Data Handling"
No ratings yet
Machine Learning in Data Science & Big Data Handling"
55 pages
CCD Chapter 6 Notes
No ratings yet
CCD Chapter 6 Notes
18 pages
Numpy: Explanation
No ratings yet
Numpy: Explanation
21 pages
Libraries For Data Science
No ratings yet
Libraries For Data Science
2 pages
Uttam
No ratings yet
Uttam
29 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Overview of Machine Learning Libraries
No ratings yet
Overview of Machine Learning Libraries
9 pages
Python Machine Learning Machine Learning and Deep Learning From Scratch Illustrated With Python Scikit Learn Keras Theano and Tensorflow 1211083261
No ratings yet
Python Machine Learning Machine Learning and Deep Learning From Scratch Illustrated With Python Scikit Learn Keras Theano and Tensorflow 1211083261
53 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
Module 1 MMC201
No ratings yet
Module 1 MMC201
77 pages
Main Dock Pin
No ratings yet
Main Dock Pin
31 pages
IT402
No ratings yet
IT402
9 pages
BIA Data Science Detailed Brochure - Virar, Mumbai
No ratings yet
BIA Data Science Detailed Brochure - Virar, Mumbai
31 pages
Factors Influencing Book Borrowing in Universities: Nicholas Muriithi
No ratings yet
Factors Influencing Book Borrowing in Universities: Nicholas Muriithi
12 pages
Random Forests
No ratings yet
Random Forests
35 pages
Human Activity Recognition via ML
No ratings yet
Human Activity Recognition via ML
80 pages
A Machine Learning Approach To Fracture Mechanics Problems PDF
No ratings yet
A Machine Learning Approach To Fracture Mechanics Problems PDF
8 pages
Regression and Prediction Basics
No ratings yet
Regression and Prediction Basics
8 pages
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Increasing The Robustness of Uplift Modeling Using Additional Splits and Diversified Leaf Select
No ratings yet
Increasing The Robustness of Uplift Modeling Using Additional Splits and Diversified Leaf Select
9 pages
ML Lab Manual-1
No ratings yet
ML Lab Manual-1
64 pages
Remote Sensing Applications: Society and Environment: Teshome Talema, Binyam Tesfaw Hailu
No ratings yet
Remote Sensing Applications: Society and Environment: Teshome Talema, Binyam Tesfaw Hailu
11 pages
AI-405 T_intelligent and Expert Sytem_Unit 3 Notes
No ratings yet
AI-405 T_intelligent and Expert Sytem_Unit 3 Notes
23 pages
AI FinalExam-2021 PDF
No ratings yet
AI FinalExam-2021 PDF
11 pages
Machine Learning Techniques Guide
No ratings yet
Machine Learning Techniques Guide
16 pages
Mann - A New Methodology To Exploit Predictive Power in (Open High Low Close..
No ratings yet
Mann - A New Methodology To Exploit Predictive Power in (Open High Low Close..
8 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
14 pages
AFRAID: Fraud Detection Via Active Inference in Time-Evolving Social Networks
No ratings yet
AFRAID: Fraud Detection Via Active Inference in Time-Evolving Social Networks
8 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
AIML NOTES Organized
No ratings yet
AIML NOTES Organized
12 pages
Discovering Process Delays with Decision Trees
No ratings yet
Discovering Process Delays with Decision Trees
14 pages
Question Bank
No ratings yet
Question Bank
2 pages
Tree Pruning
No ratings yet
Tree Pruning
3 pages
Interpretable Learning via Linear Programming
No ratings yet
Interpretable Learning via Linear Programming
16 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
Optimizing Decision Trees with GridSearchCV
No ratings yet
Optimizing Decision Trees with GridSearchCV
1 page
CatBoost: Advanced Categorical Boosting
No ratings yet
CatBoost: Advanced Categorical Boosting
23 pages

ML LAB Manual

Uploaded by

ML LAB Manual

Uploaded by

Machine Learning Laboratory Manual

Definition of Machine Learning:

Popularly used Python Libraries for Machine Learning:

from sklearn.tree import DecisionTreeClassifier

Add following 8th step for addressing Overfitting with Pruning

# Initialize the classifier with pruning parameters

# Make predictions with the pruned model

5. Perform Exploratory Data Analysis on the given dataset. Implement CART

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.tree import plot_tree

# Get feature importances

9. Evaluate the Model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

10. Address Overfitting (Optional)

# Initialize the classifier with pruning parameters

# Evaluate the pruned model

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

7. Evaluate the Model

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

from sklearn.model_selection import GridSearchCV

4. Visualize Data Relationships

# Plot the relationship between features and target

5. Preprocess the Data

from sklearn.model_selection import train_test_split

6. Build and Train Linear Regression Model

from sklearn.linear_model import LinearRegression

7. Build and Train Multiple Regression Model

# Initialize the model

# Predictions with Linear Regression (using 'RM' feature as an example)

9. Evaluate the Models

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

10. Visualize Results

# Plot actual vs predicted values for Multiple Regression

from sklearn.preprocessing import PolynomialFeatures

from sklearn.linear_model import LinearRegression

9. Visualize the Results

4. Preprocess the Data

Prepare the data for the neural network model:

# Normalize pixel values to [0, 1]

from tensorflow.keras.models import Sequential

# Evaluate the model

8. Visualize Training History

# Plot training & validation accuracy values

# Predict on the test set

from sklearn.model_selection import train_test_split

Fit the neural network model on the training data.

# Evaluate the model

import matplotlib.pyplot as plt

# Predict on the test set

############33***** THE END *****#####################

You might also like

############33* THE END *#####################