Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
6. SOURCE CODE:
#Loading the iris dataset
from sklearn import datasets
iris = datasets.load_iris()
#Data set Display
import pandas as pd
iris = datasets.load_iris()
df = [Link](data= [Link], columns=iris.feature_names) # Changed 'DatsFrame' to
'DataFrame'
x= [Link]()
y=iris.target_names
print(x)
#Dataset pre-processing
import numpy as np
from sklearn.model_selection import train_test_split
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [0, 1, 0, 1]
X = [Link](X)
y = [Link](y)
print([Link])
print([Link])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
Select the best attribute to split the data based on an attribute selection measure (e.g.,
information gain, Gini index).
1. Split the dataset into subsets where the selected attribute has distinct values.
2. Repeat the process recursively for each child node (until one of the stopping conditions
is met, like maximum depth or a single class in the node).
3. Assign the majority class (for classification) or mean value (for regression) to the leaf
node.
1. Code execution process
STEPS:
1. Opening colab:
Go to Google Colab.
You can either sign in with your Google account or use the "New Notebook”
button if you're already signed in.
2. Log in colab:
If you're not logged in automatically, click the "Sign in" button at the top right
corner of the page.
Enter your Google account credentials to access your Colab environment.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
3. Import Required Libraries:
a. Import Libraries:
Import the necessary libraries for data manipulation, model creation, and
visualization:
pandas for handling datasets.
numpy for numerical operations.
[Link] for plotting.
LabelEncoder from [Link] to encode categorical
variables.
DecisionTreeClassifier from [Link] for building the decision
tree model.
plot_tree to visualize the decision tree.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
4. Load and Visualize the Dataset:
1. Read the Dataset:
Load the dataset using pd.read_csv() by providing the file path (e.g.,
'/content/[Link]').
Display the dataset to verify that it was loaded correctly using df to print
the dataset.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
5. Preprocess the Data
1. Label Encoding:
For each categorical column in the dataset (outlook, temp, humidity, wind,
play), use the LabelEncoder() to convert the categorical values into
numerical values.
Update the DataFrame df by replacing categorical values with their
encoded numerical equivalents for these features.
Steps:
For each feature, call le.fit_transform(column_name) to encode the
values.
Replace the original feature values with the encoded ones.
2. Verify the Preprocessed Data:
Print the updated DataFrame (df) to ensure all categorical features have
been successfully converted into numerical form.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
6. Split Dataset into Features and Target:
Define Independent (Features) and Dependent (Target) Variables:
Define independent_variable by dropping the target column (play) using [Link]().
Define dependent_var by selecting the play column (target variable) from the
DataFrame df.
7. Build and Train the Decision Tree Classifier:
1. Initialize the Decision Tree Classifier:
Create a model object by initializing DecisionTreeClassifier() from [Link].
2. Fit the Model:
Train the decision tree model by calling the fit() method, passing the
independent_variable (features) and dependent_var (target) as arguments.
3. Evaluate the Model:
Check the performance of the trained model on the same dataset using the score()
method.
The [Link]() method will return the accuracy score of the model on the
training data.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
8. Create Input Data for Prediction:
1. Prepare Input Data:
Define the input data for testing or prediction by creating a DataFrame with
the required number of features (replace features list with actual feature
names).
For example, create a DataFrame input_data containing a single row with the
values [1, 2, 0, 1, 0].
9. Visualize the Decision Tree:
1. Plot the Decision Tree:
Create a visual representation of the decision tree using plot_tree() from
[Link].
Specify parameters such as:
filled=True to color the nodes based on class labels.
feature_names to label the nodes with the feature names from
[Link].
class_names to label the output classes using le.classes_.
2. Show the Decision Tree Plot:
Use [Link]() to display the generated decision tree plot.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
10. Posterior Probability:
a. Compute Likelihood for Each Feature:
For or each feature in the test set, calculate the likelihood P(feature∣class)P(feature
| class)P(feature∣ class), i.e., the probability of observing a specific feature value
given the class label.
For example, calculate the likelihood for Outlook = 'Sunny' when PlayTennis =
Yes.
Formula:
P(feature value∣ class value)=Number of feature occurences with class /
Total number of class occurences
11. Bayes Theorem and Posterior Probability Calculation
a. Calculate Posterior Probability for Each Test Instance:
For each test instance, use the prior probabilities and the likelihoods to
compute the posterior probability using Bayes’ Theorem.
Bayes' Theorem:
P(class| features) ∝ P(class) × P(features| class)
For PlayTennis = Yes:
P(Yes| features) ∝ P(Yes) × P(Outlook| Yes) × P(Temperature| Yes) ×
P(Humidity| Yes) × P(Wind| Yes)
For PlayTennis = No:
P(No| features) ∝ P(No) × P(Outlook| No) × P(Temperature| No) ×
P(Humidity| No) × P(Wind| No)
12. Classification Decision:
a. Make Prediction for Each Test Instance:
Compare the computed posterior probabilities for both classes (Yes and No).
Assign the class with the higher posterior probability as the predicted class.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
13. Evaluate Model Performance:
a. Evaluate the Model:
Compute accuracy by comparing the predicted values with the actual values from
the test set.
Use a confusion matrix to understand how well the model classifies each class.
Accuracy Formula:
Accuracy = Number of correct predictions / Total number of predictions
b. Comparison with Built-in Naive Bayes:
For validation, train a GaussianNB model from scikit-learn and compare the
predictions, accuracy, and confusion matrix with the manual implementation.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
6. Preprocess Images with OpenCV (Advanced)
To use OpenCV for custom preprocessing,we can convert images to grayscale,
resize them, or apply filters:
7. Save pre- processed Data
Now we can save the preprocessed data to be reused later.
Department of CSE, Stanley College of Engineering and Technology for Women
Roll No: 160621733015 Experiment No: 02
Name: Dhavanam Sindhu Date: 12/08/2024
Department of CSE, Stanley College of Engineering and Technology for Women