1. The probability that it is Friday and that a student is absent is 3 %.
Since there are 5 school
days in a week, the probability that it is Friday is 20%. What is the probability that a student
is absent given that today is Friday? Apply Baye's rule in python to get the result.
Program:
pAF = 0.3
pF = 0.2
print("The probability that it is Friday: ", pF)
pResult = (pAF / pF)
print("The probability that the student is absent given that it is Friday : ", pResult)
print("The probability that it is Friday and that a student is absent: ", pAF)
pF = 0.2
print("The probability that it is Friday: ", pF)
pResult = (pAF / pF)
print("The probability that the student is absent given that it is Friday : ", pResult)
Output:
The probability that it is Friday: 0.2
The probability that the student is absent given that it is Friday : 1.4999999999999998
The probability that it is Friday and that a student is absent: 0.3
The probability that it is Friday: 0.2
The probability that the student is absent given that it is Friday : 1.4999999999999998
2. Extract the data from database using python.
Program:
import mysql.connector
# Connect to the database
conn = mysql.connector.connect(
host="localhost",
user="root",
password="1234",
database="mydatabase"
)
# Create a buffered cursor
cursor = conn.cursor(buffered=True)
# Replace 'users' with your actual table name
query = "SELECT * FROM users;"
# 1. Fetch all rows
cursor.execute(query)
all_rows = cursor.fetchall()
print("All rows:")
for row in all_rows:
print(row)
# 2. Fetch a single row
cursor.execute(query)
single_row = cursor.fetchone()
print("\nSingle row:")
print(single_row)
# 3. Fetch many rows (limit to 2)
cursor.execute(query)
limited_rows = cursor.fetchmany(2)
print("\nFetch many (2 rows):")
for row in limited_rows:
print(row)
# Close cursor and connection
cursor.close()
conn.close()
Output:
All rows:
(1, 'alice', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 18))
(2, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 18))
(3, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(4, 'alice', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(5, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(6, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(7, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(8, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 19))
(9, 'alice', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 31))
(10, 'bob', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 31))
Single row:
(1, 'alice', '[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 18))
Fetch many (2 rows):
(1, 'alice', '
[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 18))
(2, 'bob', '
[email protected]', datetime.datetime(2025, 6, 3, 13, 49, 18))
3.Implement K-Nearest Neighbours Classification using Python.
Program:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Load the iris dataset
irisData = load_iris()
X = irisData.data
y = irisData.target
# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the KNN classifier with 7 neighbors
knn = KNeighborsClassifier(n_neighbors=7)
# Fit the model on the training data
knn.fit(X_train, y_train)
# Predict the labels for the test set
predictions = knn.predict(X_test)
# Print the predictions
print(predictions)
Output:
[1 0 2 1 0 1 2 1 0 0 0 1 1 2 0 0 2 2 2 2 2 2 2 0 0]
4. Given the following data, which specify classifications for nine combinations of VARI and VAR2
predict a classification for a case where VAR1=0.906 and VAR2-0.606, using the result of k-means
clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
Program:
from sklearn.cluster import KMeans
import numpy as np
# Define the dataset
X = np.array([
[1.713, 1.586],
[0.180, 1.786],
[0.353, 1.240],
[0.948, 1.566],
[0.486, 0.759],
[1.266, 1.106],
[1.540, 0.419],
[0.459, 1.799],
[0.773, 0.186]
])
# Optional: true labels, if needed for comparison (not used in clustering)
y = np.array([0, 1, 1, 0, 1, 0, 1, 1, 1])
# Create and fit the KMeans model
kmeans = KMeans(n_clusters=3, random_state=0)
kmeans.fit(X)
# Predict the cluster for a new point
new_point = np.array([[0.906, 0.606]])
predicted_cluster = kmeans.predict(new_point)
# Output result
print(f"The point {new_point[0]} belongs to cluster: {predicted_cluster[0]}")
Output:
The point [0.906 0.606] belongs to cluster: 0
5. The following training examples map descriptions of individuals on high, medium and low
Program:
# Unconditional probability of golf
totalRecords = 18
numberGolfRecreation = 4
probGolf = numberGolfRecreation / totalRecords
print("Unconditional probability of golf: {:.2f}".format(probGolf))
# Conditional probability of single given medRisk
numberMedRisk = 3
numberMedRiskSingle = 2
conditionalProbability = numberMedRiskSingle / numberMedRisk
print("Conditional probability of single given medRisk: {:.2f}".format(conditionalProbability))
Output:
Unconditional probability of golf: 0.22
Conditional probability of single given medRisk: 0.67
6. Implement linear regression using python.
Program:
import matplotlib.pyplot as plt
import numpy as np
# Function to estimate coefficients
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross deviation and deviation about x
SS_xy = np.sum(y * x) - n * m_y * m_x
SS_xx = np.sum(x * x) - n * m_x * m_x
# calculating regression coefficients
b1 = SS_xy / SS_xx
b0 = m_y - b1 * m_x
return (b0, b1)
# Function to plot regression line
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color="m", marker="o", s=30)
# predicted response vector
y_pred = b[0] + b[1] * x
# plotting the regression line
plt.plot(x, y_pred, color="g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# show plot
plt.title('Linear Regression')
plt.grid(True)
plt.show()
# Sample data
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 7, 11, 13])
# Estimating coefficients
b = estimate_coef(x, y)
# Plotting regression line
plot_regression_line(x, y, b)
Output:
7. Implement Naïve Bayes theorem to classify the English text.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
# Load dataset
msg = pd.read_csv('new_document.csv', names=['message', 'label'])
print("Total Instances of Dataset:", msg.shape[0])
# Convert labels to numeric
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
# Split data
X = msg.message
y = msg.labelnum
xtrain, xtest, ytrain, ytest = train_test_split(X, y)
# Vectorization
count_v = CountVectorizer()
train_dm = count_v.fit_transform(xtrain)
test_dm = count_v.transform(xtest)
# Train the classifier
clf = MultinomialNB()
clf.fit(train_dm, ytrain)
# Predict
pred = clf.predict(test_dm)
# Evaluation
print("Accuracy Metrics:")
print("Accuracy:", accuracy_score(ytest, pred))
print("Recall:", recall_score(ytest, pred))
print("Precision:", precision_score(ytest, pred))
print("Confusion Matrix:", confusion_matrix(ytest, pred))
Output:
Total Instances of Dataset: 18
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.6666666666666666
Precision: 0.6666666666666666
Confusion Matrix:
[[1 1]
[1 2]]
8. Implement the finite words classification system using back-propagation algorithm
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
# Load dataset
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset:", msg.shape[0])
# Convert labels to numerical format
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
# Split into features and target
X = msg.message
y = msg.labelnum
# Train-test split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
# Vectorize the text
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
# Convert to DataFrame (optional, for feature names)
df = pd.DataFrame(Xtrain_dm.toarray(), columns=count_v.get_feature_names_out())
# Define and train the model
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(Xtrain_dm, ytrain)
# Predict
pred = clf.predict(Xtest_dm)
# Evaluation
print('Accuracy Metrics:')
print('Accuracy:', accuracy_score(ytest, pred))
print('Recall:', recall_score(ytest, pred))
print('Precision:', precision_score(ytest, pred))
print('Confusion Matrix:\n', confusion_matrix(ytest, pred))
Output:
Total Instances of Dataset: 18
Accuracy Metrics:
Accuracy: 0.8
Recall: 1.0
Precision: 0.75
Confusion Matrix:
[[1 1]
[0 3]]