10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
Decsion Tree Classifier
Using Loan Aproval Dataset
import Libraries
In [9]: # Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import LabelEncoder
from [Link] import DecisionTreeClassifier
from [Link] import accuracy_score, classification_report
Create a Simple Dataset
In [13]: # Create the dataset
data = {
'Age': [25, 45, 35, 50, 23, 40, 30, 28, 55, 33],
'Income': ['Low', 'High', 'Medium', 'High', 'Low', 'Medium', 'High
'Credit_Score': [600, 700, 650, 720, 580, 660, 680, 590, 740, 620]
'Owns_House': ['No', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No',
'Approved': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No',
}
# Convert the dictionary to a pandas DataFrame
df = [Link](data)
# Display the dataset
df
Out[13]: Age Income Credit_Score Owns_House Approved
0 25 Low 600 No No
1 45 High 700 Yes Yes
2 35 Medium 650 No Yes
3 50 High 720 Yes Yes
4 23 Low 580 No No
5 40 Medium 660 Yes Yes
6 30 High 680 Yes Yes
7 28 Low 590 No No
8 55 High 740 Yes Yes
9 33 Low 620 No No
localhost:8888/notebooks/Downloads/Decision [Link] 1/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
Step 3: Encode Categorical Features
convert the categorical variables like "Income" and "Owns_House" into numerical values for
the model. We'll use LabelEncoder for this.
In [17]: # Initialize LabelEncoder
le = LabelEncoder()
# Convert categorical columns to numerical ones
df['Income'] = le.fit_transform(df['Income']) # Low=1, Medium=2, High
df['Owns_House'] = le.fit_transform(df['Owns_House']) # No=0, Yes=1
df['Approved'] = le.fit_transform(df['Approved']) # No=0, Yes=1
# Display the dataset after encoding
df
Out[17]: Age Income Credit_Score Owns_House Approved
0 25 1 600 0 0
1 45 0 700 1 1
2 35 2 650 0 1
3 50 0 720 1 1
4 23 1 580 0 0
5 40 2 660 1 1
6 30 0 680 1 1
7 28 1 590 0 0
8 55 0 740 1 1
9 33 1 620 0 0
Split the Dataset into Features and Target
In [19]: # Define features (X) and target (y)
X = [Link]('Approved', axis=1)
y = df['Approved']
Split the Data into Training and Test Sets
localhost:8888/notebooks/Downloads/Decision [Link] 2/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
In [21]: # Split the dataset into training and testing sets (80% train, 20% tes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.
# Print the shapes of training and testing sets
print(f"Training set size: {X_train.shape}")
print(f"Testing set size: {X_test.shape}")
Training set size: (8, 4)
Testing set size: (2, 4)
Train the Decision Tree Classifier
In [23]: # Initialize the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
# Train the model using the training data
[Link](X_train, y_train)
Out[23]: DecisionTreeClassifier(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or
trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with [Link].
Step 7: Make Predictions
In [31]: # Make predictions on the testing set
y_pred = [Link](X_test)
# Print the actual and predicted values side-by-side for comparison
print("Actual vs Predicted values:")
for actual, predicted in zip(y_test, y_pred):
print(f"Actual: {actual}, Predicted: {predicted}")
Actual vs Predicted values:
Actual: 1, Predicted: 1
Actual: 1, Predicted: 1
Step 8: Evaluate the Model
localhost:8888/notebooks/Downloads/Decision [Link] 3/4
10/15/24, 8:57 PM Decision Tree - Jupyter Notebook
In [27]: # Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Accuracy: 100.00%
Classification Report:
precision recall f1-score support
1 1.00 1.00 1.00 2
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
In [ ]:
localhost:8888/notebooks/Downloads/Decision [Link] 4/4