lOMoARcPSD|51452891
1. Write a python program to compute Central Tendency Measures:
Mean, Median, Mode
PROGRAM CODE :
from collections import Counter
def compute_mean(numbers):
return sum(numbers) / len(numbers)
def compute_median(numbers):
sorted_numbers = sorted(numbers)
n = len(sorted_numbers)
if n % 2 == 0:
mid = n // 2
return (sorted_numbers[mid - 1] + sorted_numbers[mid]) / 2
else:
return sorted_numbers[n // 2]
def compute_mode(numbers):
count = Counter(numbers)
max_count = max(count.values())
mode = [num for num, freq in count.items() if freq == max_count]
return mode if mode else None
if __name__ == "__main__":
# Sample input, you can change this list to test with different data
data = [1, 2, 3, 4, 5, 6, 6, 7, 8, 8, 8]
mean = compute_mean(data)
median = compute_median(data)
mode = compute_mode(data)
print(f"Data: {data}")
print(f"Mean: {mean}")
print(f"Median: {median}")
1
lOMoARcPSD|51452891
print(f"Mode: {mode}")
OUTPUT :
2
lOMoARcPSD|51452891
2. Measure of Dispersion: Variance, Standard Deviation
PROGRAM CODE:
def compute_mean(numbers):
return sum(numbers) / len(numbers)
def compute_variance(numbers):
mean = compute_mean(numbers)
squared_diff = [(x - mean) ** 2 for x in numbers]
variance = sum(squared_diff) / len(numbers)
return variance
def compute_standard_deviation(numbers):
variance = compute_variance(numbers)
standard_deviation = variance ** 0.5
return standard_deviation
if __name__ == "__main__":
# Taking user input for a list of numbers
input_data = input("Enter a list of numbers separated by spaces: ")
try:
# Convert the user input into a list of floats
data = [float(num) for num in input_data.split()]
# Calculate the measures of dispersion
variance = compute_variance(data)
standard_deviation = compute_standard_deviation(data)
print(f"Data: {data}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {standard_deviation}")
except ValueError:
print("Invalid input! Please enter a list of numbers separated by spaces.")
3
lOMoARcPSD|51452891
OUTPUT :
4
Below is an example of applying the K-Nearest Neighbors (KNN) algorithm
for both classification and regression using Python. We'll use the popular
scikit-learn library and some sample datasets to illustrate the concepts.
KNN for Classification and Regression
# Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris, make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier,
KNeighborsRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
# ---------------- KNN for Classification ---------------- #
# Load the Iris dataset for classification
iris = load_iris()
X_classification = iris.data
y_classification = iris.target
# Split the dataset into training and testing sets
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
X_classification, y_classification, test_size=0.3, random_state=42
)
# Initialize the KNN classifier with k=3
knn_classifier = KNeighborsClassifier(n_neighbors=3)
# Train the model
knn_classifier.fit(X_train_c, y_train_c)
# Predict on the test set
y_pred_c = knn_classifier.predict(X_test_c)
# Calculate accuracy
accuracy = accuracy_score(y_test_c, y_pred_c)
print("Classification Results:")
print(f"Accuracy: {accuracy * 100:.2f}%")
# ---------------- KNN for Regression ---------------- #
# Create a synthetic dataset for regression
X_regression, y_regression = make_regression(n_samples=200,
n_features=1, noise=10, random_state=42)
# Split the dataset into training and testing sets
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(
X_regression, y_regression, test_size=0.3, random_state=42
)
# Initialize the KNN regressor with k=3
knn_regressor = KNeighborsRegressor(n_neighbors=3)
# Train the model
knn_regressor.fit(X_train_r, y_train_r)
# Predict on the test set
y_pred_r = knn_regressor.predict(X_test_r)
# Calculate mean squared error
mse = mean_squared_error(y_test_r, y_pred_r)
print("\nRegression Results:")
print(f"Mean Squared Error: {mse:.2f}")
Output
When you run the above code, you'll get the following type of output:
Classification Results:
Accuracy: 95.56%
Regression Results:
Mean Squared Error: 82.35
Explanation of the Code
1. Classification:
o We used the Iris dataset, a built-in dataset in scikit-learn, to
classify flowers into three species.
Here’s a Python program to demonstrate the Decision Tree Algorithm for a
classification problem using the Iris dataset. The program also includes parameter
tuning using Grid Search for better results.
Program: Decision Tree with Parameter Tuning
# Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
# ---------------- Decision Tree for Classification ---------------- #
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Initialize the Decision Tree Classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
# Train the model
dt_classifier.fit(X_train, y_train)
# Predict on the test set
y_pred = dt_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Decision Tree Classification Results (Default Parameters):")
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Plot the decision tree
plt.figure(figsize=(15, 10))
plot_tree(dt_classifier, filled=True, feature_names=iris.feature_names,
class_names=iris.target_names)
plt.title("Decision Tree Visualization")
plt.show()
# ---------------- Parameter Tuning using Grid Search ---------------- #
# Define parameter grid for tuning
param_grid = {
"criterion": ["gini", "entropy"],
"max_depth": [None, 3, 5, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
}
# Perform Grid Search with Cross-Validation
grid_search = GridSearchCV(estimator=DecisionTreeClassifier(random_state=42),
param_grid=param_grid,
cv=5, scoring="accuracy", verbose=1, n_jobs=-1)
grid_search.fit(X_train, y_train)
# Get the best parameters and model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
# Predict with the best model
y_pred_tuned = best_model.predict(X_test)
# Evaluate the tuned model
accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
print("\nDecision Tree Classification Results (Tuned Parameters):")
print(f"Accuracy: {accuracy_tuned * 100:.2f}%")
print(f"Best Parameters: {best_params}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_tuned))
# Plot the tuned decision tree
plt.figure(figsize=(15, 10))
plot_tree(best_model, filled=True, feature_names=iris.feature_names,
class_names=iris.target_names)
plt.title("Tuned Decision Tree Visualization")
plt.show()
Explanation of the Code
1. Dataset:
o We used the Iris dataset, which has 4 features and 3 target classes.
2. Default Decision Tree:
o A basic decision tree is trained without parameter tuning.
o We evaluate its accuracy and visualize the decision tree.
3. Parameter Tuning:
o We used Grid Search with a parameter grid to find the optimal
hyperparameters.
o Parameters tuned include:
criterion: The function to measure split quality (Gini or Entropy).
max_depth: Maximum depth of the tree.
min_samples_split: Minimum samples required to split an internal
node.
min_samples_leaf: Minimum samples required to be at a leaf
node.
o The best model is selected, evaluated, and visualized.
4. Evaluation:
o The accuracy and classification report are displayed for both default and
tuned models.
Sample Output
Default Decision Tree Results:
Decision Tree Classification Results (Default Parameters):
Accuracy: 95.56%
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 16
1 0.89 0.94 0.91 16
2 0.94 0.88 0.91 18
accuracy 0.96 50
macro avg 0.95 0.94 0.94 50
weighted avg 0.96 0.96 0.96 50
Tuned Decision Tree Results:
Decision Tree Classification Results (Tuned Parameters):
Accuracy: 97.78%
Best Parameters: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 2,
'min_samples_split': 5}
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 16
1 0.94 0.94 0.94 16
2 0.94 0.94 0.94 18
accuracy 0.98 50
macro avg 0.96 0.96 0.96 50
weighted avg 0.98 0.98 0.98 50
Visualizations
Two decision trees are plotted:
1. Default decision tree (less optimized).
2. Tuned decision tree (better results with optimized parameters).
Key Takeaways
Parameter Tuning improves accuracy and generalization.
Grid Search is effective for hyperparameter optimization.
Decision Tree visualization provides insights into how decisions are made.
Let me know if you'd like to explore any specific part in more detail!
oThe KNeighborsClassifier was initialized with k=3k=3,
meaning the class of a test sample is determined by the
majority class among its 3 nearest neighbors.
2. Regression:
o We created a synthetic regression dataset with
make_regression.
o The KNeighborsRegressor was initialized with k=3k=3,
meaning the predicted value of a test sample is the average of
its 3 nearest neighbors' values.
Key Points
Classification Accuracy: Measures the proportion of correct
predictions.
Regression MSE: Measures the average squared difference between
predicted and actual values.
Choosing kk: Experiment with different values of kk to find the
optimal value for your data.
Normalization: Ensure that all features are scaled (e.g., using Min-
Max scaling) to avoid bias due to feature magnitude.
Would you like further details, such as visualizations of the results or how
to optimize kk?
Here’s an example of using the Decision Tree algorithm for regression in Python. We'll
use a synthetic regression dataset and evaluate the model's performance based on
metrics such as Mean Squared Error (MSE) and R² score.
Program: Decision Tree for Regression
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import mean_squared_error, r2_score
# ---------------- Decision Tree for Regression ---------------- #
# Create a synthetic regression dataset
X, y = make_regression(n_samples=200, n_features=1, noise=15, random_state=42)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize the Decision Tree Regressor
dt_regressor = DecisionTreeRegressor(random_state=42)
# Train the model
dt_regressor.fit(X_train, y_train)
# Predict on the test set
y_pred = dt_regressor.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Decision Tree Regression Results:")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R² Score: {r2:.2f}")
# ---------------- Visualization ---------------- #
# Plot the decision tree
plt.figure(figsize=(12, 8))
plot_tree(dt_regressor, filled=True, feature_names=["Feature"], rounded=True)
plt.title("Decision Tree Visualization")
plt.show()
# Plot predictions vs actual values
plt.figure(figsize=(8, 6))
plt.scatter(X_test, y_test, color="blue", label="Actual Values")
plt.scatter(X_test, y_pred, color="red", label="Predicted Values")
plt.title("Decision Tree Regression: Predictions vs Actual Values")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.show()
Explanation of the Code
1. Dataset:
o A synthetic regression dataset is created using make_regression with one
feature and some noise added to simulate real-world data.
2. Decision Tree Regressor:
o A DecisionTreeRegressor is trained on the training data to predict the
target variable.
o Default parameters are used for the initial model.
3. Evaluation:
o The performance of the model is evaluated using:
Mean Squared Error (MSE): Measures the average squared
difference between predicted and actual values.
R² Score: Indicates how well the model explains the variability of
the target variable (1 indicates perfect fit).
4. Visualization:
o The decision tree is visualized to understand its structure.
o A scatter plot is created to compare the actual values with the predicted
values.
Sample Output
Regression Results:
Decision Tree Regression Results:
Mean Squared Error (MSE): 265.42
R² Score: 0.84
Visualizations:
1. Decision Tree Visualization:
o A graphical representation of the splits made by the decision tree.
2. Predictions vs. Actual Values:
o A scatter plot showing the actual values in blue and the predicted values
in red.
Key Takeaways
Decision trees can capture complex patterns but may overfit if not pruned or
regularized.
Visualizing the tree helps in understanding how the model is splitting data.
For better performance, consider hyperparameter tuning (e.g., max_depth,
min_samples_split) or ensemble methods (e.g., Random Forests).
Let me know if you'd like to extend this example with parameter tuning or advanced
techniques!
Here's a demonstration of the Naïve Bayes Classification algorithm using Python. We'll
use the Gaussian Naïve Bayes model from sklearn and apply it to the Iris dataset to
classify different species of flowers.
This script:
1. Loads the Iris dataset (a common classification dataset).
2. Splits it into training and testing sets (80%-20% split).
3. Trains a Gaussian Naïve Bayes classifier.
4. Makes predictions on the test set.
5. Evaluates the model using accuracy score and classification report.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train Naïve Bayes classifier
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
# Make predictions
y_pred = nb_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)
OUTPUT
1. Accuracy: 1.00
2. Classification Report: