Experiment-8
1. State the ID3 Algo for Decision Tree
2. Solve the given dataset or create a decision tree for a given dataset on paper using ID3
3. Implement this ID3 algorithm in python for the same given dataset
4. visualise the decision tree
5. Test/Validate the tree for any query
1. ID3 Algorithm (Conceptual Summary) ID3 (Iterative Dichotomiser 3) builds a decision tree by
selecting the feature with the highest Information Gain at each node.
Steps:
1. Calculate Entropy of the dataset.
2. For each attribute, calculate Information Gain:
Information Gain = Entropy(Parent) − ∑ ( ∣ Subset ∣ ∣ Total ∣ × Entropy(Subset) ) Information
Gain=Entropy(Parent)−∑( ∣Total∣ ∣Subset∣×Entropy(Subset))
3. Choose the attribute with the highest Information Gain.
4. Repeat recursively for each subset until:
All samples belong to one class
No attributes are left
2. Manually Solving (on Paper)
You can solve the decision tree on paper using the following steps:
Start with full dataset entropy (target = "PlayTennis")
For each attribute (Outlook, Humidity, etc.), compute the information gain
Choose attribute with max gain as root
Repeat for each branch (subset) until classified
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv("/content/PlayTennis.csv")
print("Dataset:\n", df)
# Encode categorical features
label_encoders = {}
for col in df.columns:
if df[col].dtype == 'object':
le = LabelEncoder()
df[col] = le.fit_transform(df[col])
label_encoders[col] = le
# Features and target
X = df.drop('Play Tennis', axis=1)
y = df['Play Tennis']
Dataset:
Outlook Temperature Humidity Wind Play Tennis
0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Mild High Strong No
# Train the Decision Tree using ID3 (entropy)
clf = DecisionTreeClassifier(criterion="entropy", random_state=0)
clf.fit(X, y)
# Visualize the decision tree
plt.figure(figsize=(12, 8))
plot_tree(clf, feature_names=X.columns, class_names=label_encoders['Play Tennis'].classes
plt.title("Decision Tree using ID3 (Entropy)")
plt.show()
# Predict on the training data (or a query below)
y_pred = clf.predict(X)
print("\nClassification Report:\n", classification_report(y, y_pred, target_names=label_e
Classification Report:
precision recall f1-score support
No 1.00 1.00 1.00 5
Yes 1.00 1.00 1.00 9
accuracy 1.00 14
macro avg 1.00 1.00 1.00 14
weighted avg 1.00 1.00 1.00 14
query = {
'Outlook': 'Sunny',
'Temperature': 'Cool',
'Humidity': 'High',
'Wind': 'Strong'
}
# Encode the input
query_encoded = [label_encoders[col].transform([query[col]])[0] for col in X.columns]
prediction = clf.predict([query_encoded])
predicted_label = label_encoders['Play Tennis'].inverse_transform(prediction)
print("Prediction for query:", predicted_label[0])
Prediction for query: No
/usr/local/lib/python3.11/dist-packages/sklearn/utils/validation.py:2739: UserWarning
warnings.warn(