Slip3Q1 Write a R program to reverse a number and also calculate the sum ofdigits of that
number.
n = as.integer(readline(prompt = "Enter a number :"))
rev = 0
s=0
while (n > 0) {
r = n %% 10
rev = rev * 10 + r
s=s+rev
n = n %/% 10
print(paste("Reverse number is :", rev))
print(paste("Sum of the digits is :", s))
Q2Consider following observations/data. And apply simple linear regression and find
out estimated coefficients b0 and b1.( use numpy package)
x= [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]
y = ([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18]
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Slip5Q1 Write a R program to concatenate two given factors.
f1 <- factor(sample(LETTERS, size=6, replace=TRUE))
f2 <- factor(sample(LETTERS, size=6, replace=TRUE))
print("Original factors:")
print(f1)
print(f2)
f = factor(c(levels(f1)[f1], levels(f2)[f2]))
print("After concatenate factor becomes:")
print(f)
Q2. Write a Python program build Decision Tree Classifier using Scikit- learn package for
diabetes data set (download database from https://www.kaggle.com/uciml/pimaindians-diabetes-
database)
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
pima = pd.read_csv("../input/diabetes.csv")
pima.head()
Slip6Q1Write a R program to create a data frame using two given vectors and display the duplicate
elements.
a = c(10,20,10,10,40,50,20,30)
b = c(10,30,10,20,0,50,30,30)
print("Original data frame:")
ab = data.frame(a,b)
print(ab)
print("Duplicate elements of the said data frame:")
print(duplicated(ab))
Q2. Write a python program to implement hierarchical Agglomerative clusteringalgorithm.
(Download Customer.csv dataset from github.com)
Ansdataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
Slip7Q1Write a R program to create a sequence of numbers from 20 to 50 and find the mean of
numbers from 20 to 60 and sum of numbers from 51 to 91.
print("Sequence of numbers from 20 to 50:")
print(seq(20,50))
print("Mean of numbers from 20 to 60:")
print(mean(20:60))
print("Sum of numbers from 51 to 91:")
print(sum(51:91))
Q2Consider the following observations/data. And apply simple linear regression and find out
estimated coefficients b1 and b1 Also analyse the performance of the model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
n = np.size(x)
x_mean = np.mean(x)
y_mean = np.mean(y)
x_mean,y_mean
Sxy = np.sum(x*y)- n*x_mean*y_mean
Sxx = np.sum(x*x)-n*x_mean*x_mean
b1 = Sxy/Sxx
b0 = y_mean-b1*x_mean
print('slope b1 is', b1)
print('intercept b0 is', b0)
plt.scatter(x,y)
plt.xlabel('Independent variable X')
plt.ylabel('Dependent variable y')
Slip8Q1Write a R program to get the first 10 Fibonacci numbers.
Fibonacci <- numeric(10)
Fibonacci[1] <- Fibonacci[2] <- 1
for (i in 3:10) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]
print("First 10 Fibonacci numbers:")
print(Fibonacci)
Q2Write a python program to implement k-means algorithm to build prediction model (Use
Credit Card Dataset CC GENERAL.csv Download from kaggle.com)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('../input/CC GENERAL.csv')
X = dataset.iloc[:, 1:].values
Slip9Q1Write an R program to create a Data frames which contain details of 5 employees and display
summary of the data.
Employees = data.frame(Name=c("Anastasia S","Dima R","Katherine S", "JAMES A","LAURA
MARTIN"),
Gender=c("M","M","F","F","M"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"0),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
print("Summary of the data:")
print(summary(Employees))
Q2. Write a Python program to build an SVM model to Cancer dataset. The dataset is
available in the scikit-learn library. Check the accuracyof model with precision and
recall.
#Import scikit-learn dataset library
from sklearn import datasets
#Load dataset
cancer = datasets.load_breast_cancer()
# print the names of the 13 features
print("Features: ", cancer.feature_names)
# print the label type of cancer('malignant' 'benign')
print("Labels: ", cancer.target_names)
# print data(feature)shape
cancer.data.shape
# print the cancer data features (top 5 records)
print(cancer.data[0:5])
# print the cancer labels (0:malignant, 1:benign)
print(cancer.target)
# Import train_test_split function
from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
test_size=0.3,random_state=109) # 70% training and 30% test
#Import svm model
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model using the training sets
clf.fit(X_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(X_test)
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Slip10Q1 Write a R program to find the maximum and the minimum value of a given
vector [10
Marks]
nums = c(10, 20, 30, 40, 50, 60)
print('Original vector:')
print(nums)
print(paste("Maximum value of the said vector:",max(nums)))
print(paste("Minimum value of the said vector:",min(nums)))
Q2. Write a Python Programme to read the dataset (“Iris.csv”). dataset download from
(https://archive.ics.uci.edu/ml/datasets/iris) and apply Apriori algorithm.
"cells": [
"cell_type": "markdown",
"id": "b58228cb",
"metadata": {},
"source": [
"\n",
},
"cell_type": "code",
"execution_count": 1,
"id": "31f28134",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from apyori import apriori"
},
"cell_type": "code",
"execution_count": null,
"id": "91ef7af6",
"metadata": {},
"outputs": [],
"source": [
"store_data=pd.read_csv('iris.csv',header=None)"
},
"cell_type": "code",
"execution_count": null,
"id": "cd4c9ed9",
"metadata": {},
"outputs": [],
"source": [
"store_data.head()\n"
},
"cell_type": "code",
"execution_count": null,
"id": "88d01808",
"metadata": {},
"outputs": [],
"source": [
"records = []\n",
"for i in range(0,300):\n",
records.append([str(store_data.values[i,j]) for j in range(0,20)])\n"
},
"cell_type": "code",
"execution_count": null,
"id": "ba30cca3",
"metadata": {},
"outputs": [],
"source": [
"association_rules=apriori(records,min_support=0.0045,min_confidence=0.2,min_lift=3,min
_length=2)\n",
"association_results=list(association_rules)\n"
},
"cell_type": "code",
"execution_count": null,
"id": "8ab0102a",
"metadata": {},
"outputs": [],
"source": [
"print(len(association_results))\n"
},
{
"cell_type": "code",
"execution_count": null,
"id": "daa923d5",
"metadata": {},
"outputs": [],
"source": [
"print(association_results[0])\n"
},
"cell_type": "code",
"execution_count": null,
"id": "4f9ceaad",
"metadata": {},
"outputs": [],
"source": [
"for item in association_results:\n",
" pair = item[0]\n",
" items = [x for x in pair]\n",
" print(\"Rule:\"+items[0]+\"->\"+items[1])\n",
" \n",
" print(\"Support:\"+str(item[1]))\n",
"\n",
" print(\"Confidence:\"+str(item[2][0][2]))\n",
" print(\"Lift:\"+str(item[2][0][3]))\n",
" print(\"========================================\")"
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
},
"nbformat": 4,
"nbformat_minor": 5
SLIP11Q1 Write a R program to find all elements of a given list that are not in another given list.
= list("x", "y", "z")
= list("X", "Y", "Z", "x", "y", "z")
l1 = list("x", "y", "z")
l2 = list("X", "Y", "Z", "x", "y", "z")
print("Original lists:")
print(l1)
print(l2)
print("All elements of l2 that are not in l1:")
setdiff(l2, l1)
Q2. Write a python program to implement hierarchical clustering algorithm.(Download
Wholesale customers data dataset from github.com).
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Wholesale customers data.csv')
dataset
x = dataset.iloc[:, [3, 4]].values
print(x)
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Milk')
mtp.ylabel('Grocery')
mtp.legend()
mtp.show()
Slip12Q1Write a R program to create a Dataframes which contain details of 5employees and
display the details.
Employee contain (empno,empname,gender,age,designation)
Employees = data.frame(Name=c("Anastasia S","Dima R","Katherine S", "JAMES A","LAURA
MARTIN"),
Gender=c("M","M","F","F","M"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
print("Details of the employees:")
print(Employees)
Q2. Write a python program to implement multiple Linear Regression modelfor a car dataset.
Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
import pandas
from sklearn import linear_model
df = pandas.read_csv("d:dmdataset\carsm.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
Slip13Q2
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Data Mining Assignment-3 SET-B-1.ipynb",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"cells": [
"cell_type": "markdown",
"source": [
"### SET-B\n",
"\n",
],
"metadata": {
"id": "0hhW5uEs_wK2"
},
"cell_type": "code",
"source": [
"# Import required libraries\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n"
],
"metadata": {
"id": "W61H7Yo7E_sP"
},
"execution_count": 2,
"outputs": []
},
"cell_type": "code",
"source": [
"# Read the downloaded dataset\n",
"store_data=pd.read_csv('StudentsPerformance.csv',header=None)"
],
"metadata": {
"id": "uC2jGgIFFVa3"
},
"execution_count": null,
"outputs": []
},
"cell_type": "code",
"source": [
"# To display the shape of dataset. (By Using shape method)\n",
"store_data.shape"
],
"metadata": {
"id": "wU6-JdtCF3ar"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# To display the top rows of the dataset with their columns.(By using head method\n",
"store_data.head()"
],
"metadata": {
"id": "xHtDSrSsGT2v"
},
"execution_count": null,
"outputs": []
},
"cell_type": "code",
"source": [
"# To display the number of rows randomly.(By using sample method)\n",
"store_data.sample(10)"
],
"metadata": {
"id": "2Gwsi4oTG9QN"
},
"execution_count": null,
"outputs": []
},
"cell_type": "code",
"source": [
"# To display the number of columns and names of the columns. (By using columns
method)\n",
"store_data.columns()"
],
"metadata": {
"id": "ZdXc3aoUHO80"
},
"execution_count": null,
"outputs": []
Slip14Q1. Write a script in R to create a list of employees (name) and perform thefollowing:
a. Display names of employees in the list.
b. Add an employee at the end of the list
c. Remove the third element of the list.
> list_data <- list("Ram Sharma","Sham Varma","Raj Jadhav", "Ved Sharma")
#display list
> print(list_data)
#create new employee
new_Emp <-"Kavya Anjali"
#Add new employee at the end
list_data <-append(list_data,new_Emp)
print(list_data)
#remove 3 employee
list_data[3] <- NULL
print(list_data)
Q2Q2. Write a Python Programme to apply Apriori algorithm on Groceries dataset. Dataset
can be downloaded from
(https://github.com/amankharwal/Websitedata/blob/master/Groceries
_dataset.csv).
Also display support and confidence for each rule.
Slip15Q1.Write a R program to add, multiply and divide two vectors of integertype. (Vector
length should be minimum 4)
x = c(10, 20, 30,40)
y = c(20, 10, 50,40)
print("Original Vectors:")
print(x)
print(y)
print("After Adding Vectors:")
a=x+y
print(a)
print("After Multiplying Vectors:")
b=x*y
print(b)
print("After dividing Vectors:")
c=x/y
print(c)
Q2Write a Python program build Decision Tree Classifier forshows.csvfrom pandas and
predict class label for show starring a 40 years old American comedian, with 10
years of experience, and a comedy ranking of 7? Create a csv file as shown in
https://www.w3schools.com/python/python_ml_decision_tree.asp
importpandasfromsklearnimporttreeimportpydotplusfromsklearn.treeimportDecisionTreeClassifieri
mportmatplotlib.pyplotaspltimportmatplotlib.imageaspltimgdf =
pandas.read_csv("shows.csv")print(df
Slip16Q2 Write a Python program build Decision Tree Classifier using Scikit-learnpackage for
diabetes data set (download database from https://www.kaggle.com/uciml/pima-indiansdiabetes-
database
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
pima = pd.read_csv("../input/diabetes.csv")
pima.head()
SLIP17Q1 Write a R program to get the first 20 Fibonacci numbers.
Fibonacci <- numeric(20)
Fibonacci[1] <- Fibonacci[2] <- 1
for (i in 3:10) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]
print("First 20 Fibonacci numbers:")
print(Fibonacci)
Q2Write a python programme to implement multiple linear regression modelfor stock market
data frame as follows:
Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2
016,20,16,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5
.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,
965,943,958,971,949,884,866,876,822,704,719] }
And draw a graph of stock market price verses interest rate
import pandas as pd
data = {'year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,201
6,2016,2016,2016,2016,2016,2016],
'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'interest_rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.
75],
'unemployment_rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'index_price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,8
66,876,822,704,719]
df = pd.DataFrame(data)
print(df)
Slip18Q1Write a R program to find the maximum and the minimum value of a given vector
nums = c(10, 20, 30, 40, 50, 60)
print('Original vector:')
print(nums)
print(paste("Maximum value of the said vector:",max(nums)))
print(paste("Minimum value of the said vector:",min(nums)))
Q2Consider the following observations/data. And apply simple linear regression and find out
estimated coefficients b1 and b1 Also analyse theperformance of the model
(Use sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
n = np.size(x)
x_mean = np.mean(x)
y_mean = np.mean(y)
x_mean,y_mean
Sxy = np.sum(x*y)- n*x_mean*y_mean
Sxx = np.sum(x*x)-n*x_mean*x_mean
b1 = Sxy/Sxx
b0 = y_mean-b1*x_mean
print('slope b1 is', b1)
print('intercept b0 is', b0)
plt.scatter(x,y)
plt.xlabel('Independent variable X')
plt.ylabel('Dependent variable y')