Home / My courses / UGRD-CYBS6101-2333T / MIDTERM EXAMINATION / Midterm Exam
Started on Friday, 21 June 2024, 6:12 PM
State Finished
Completed on Friday, 21 June 2024, 6:42 PM
Time taken 29 mins 43 secs
Marks 39.00/50.00
Grade 78.00 out of 100.00
Question 1
Correct
Mark 1.00 out of 1.00
What is an example of a latent variable?
Select one:
a. A hidden or unobserved variable that affects the observed variables
b. The features of a model
c. The weights of a model
d. The output of a model
Question 2
Correct
Mark 1.00 out of 1.00
How is the Hebb rule used in the training of a neural network?
Select one:
a. It is used to determine the input to the neural network
b. It is used to calculate the output of the neural network
c. It is used to determine the structure of the neural network
d. It is used to adjust the weights of the neural network based on the input and output
Question 3
Incorrect
Mark 0.00 out of 1.00
How does the least squares method handle outliers in the data set?
Select one:
a. It ignores them
b. It gives them less weight
c. It gives them more weight
d. It removes them
Question 4
Correct
Mark 1.00 out of 1.00
What is an example of a regression task in supervised learning?
Select one:
a. Predicting the price of a house based on its characteristics
b. Grouping customers into different segments based on their spending habits
c. Determining whether an email is spam or not
d. Predicting the stock price for the next day based on historical data
Question 5
Incorrect
Mark 0.00 out of 1.00
Can the least squares method be used for nonlinear data sets?
Select one:
a. Yes
b. It depends on the data set
c. It depends on the method used to transform the data set
d. No
Question 6
Correct
Mark 1.00 out of 1.00
What is the main goal of the k-means algorithm?
Select one:
a. To partition a dataset into a specified number of clusters
b. To classify data into predefined categories
c. To predict the value of a continuous target variable
d. To discover patterns or relationships within a dataset
Question 7
Incorrect
Mark 0.00 out of 1.00
What is an example of a batch learning algorithm used for dimensionality reduction tasks?
Select one:
a. Multidimensional scaling
b. t-SNE
c. Principal component analysis
d. All of the above
Question 8
Correct
Mark 1.00 out of 1.00
Can the Naive Bayes classifier handle missing or incomplete data?
Select one:
a. It can handle missing data but not incomplete data
b. It can handle incomplete data but not missing data
c. Yes, it can handle missing or incomplete data
d. No, it cannot handle missing or incomplete data
Question 9
Correct
Mark 1.00 out of 1.00
Can the least squares method be used for multiple linear regression?
Select one:
a. It depends on the data set
b. No
c. Yes
d. It depends on the method used to transform the data set
Question 10
Correct
Mark 1.00 out of 1.00
How is KNIME different from other data analysis tools?
Select one:
a. It has a user-friendly interface
b. It allows users to build custom data pipelines
c. It is free
d. It is open source
Question 11
Correct
Mark 1.00 out of 1.00
What is the process of transforming data into a consistent format called?
Select one:
a. Normalizing
b. Sampling
c. Filtering
d. Cleaning
Question 12
Correct
Mark 1.00 out of 1.00
Which of the following file types can be imported into KNIME?
Select one:
a. All of the above
b. CSV
c. XML
d. Excel
Question 13
Correct
Mark 1.00 out of 1.00
The ______________ linkage criterion is a popular choice for hierarchical clustering, which merges the two clusters based on the distance
between their centroids.
Select one:
a. Average
b. Centroid
c. Single
d. Complete
Question 14
Correct
Mark 1.00 out of 1.00
What is the main goal of the EM algorithm?
Select one:
a. To maximize the prediction accuracy of the model
b. To maximize the likelihood of a model given the data
c. To minimize the error between the predicted and actual values of the data
d. To minimize the cost or loss function of a model
Question 15
Correct
Mark 1.00 out of 1.00
What is the Naive Bayes classifier used for?
Select one:
a. All of the above
b. To classify data into different categories based on certain features
c. To predict the value of a continuous variable
d. To predict the probability of an event occurring
Question 16
Incorrect
Mark 0.00 out of 1.00
The KL distance is often used in machine learning to evaluate the performance of a classification model. In this context, a low KL
distance indicates that the model's predicted class probabilities are:
Select one:
a. Very similar to the true class probabilities
b. Somewhat different from the true class probabilities
c. Somewhat similar to the true class probabilities
d. Very different from the true class probabilities
Question 17
Correct
Mark 1.00 out of 1.00
Which of the following is NOT a common application of the k-means algorithm?
Select one:
a. Customer segmentation
b. Anomaly detection
c. Image compression
d. Regression analysis
Question 18
Correct
Mark 1.00 out of 1.00
Is the least squares method a deterministic or a probabilistic method?
Select one:
a. Both deterministic and probabilistic
b. Probabilistic
c. Neither deterministic nor probabilistic
d. Deterministic
Question 19
Correct
Mark 1.00 out of 1.00
Which of the following is NOT a node type in KNIME?
Select one:
a. Output node
b. Source node
c. Sink node
d. Processor node
Question 20
Incorrect
Mark 0.00 out of 1.00
What is a Bayesian network used for?
Select one:
a. To model and predict the behavior of systems
b. To optimize the use of resources
c. All of the above
d. To perform machine learning tasks
Question 21
Correct
Mark 1.00 out of 1.00
What are some disadvantages of batch learning algorithms?
Select one:
a. They require a small amount of data
b. They are prone to overfitting
c. They are slow to adapt to changes in the data
d. They require a large amount of resources
Question 22
Correct
Mark 1.00 out of 1.00
What is supervised learning used for?
Select one:
a. Regression tasks
b. Both classification and regression tasks
c. Classification tasks
d. Unsupervised learning tasks
Question 23
Correct
Mark 1.00 out of 1.00
What is the Hebb rule?
Select one:
a. A rule used to determine the structure of a neural network
b. A rule used to adjust the weights in a neural network
c. A rule used to determine the input to a neural network
d. A rule used to calculate the output of a neural network
Question 24
Correct
Mark 1.00 out of 1.00
What is a parent node in a Bayesian network?
Select one:
a. A node that is a direct descendant of another node in the network
b. A node that has no parents or children in the network
c. A node that is a direct ancestor of another node in the network
d. None of the above
Question 25
Correct
Mark 1.00 out of 1.00
The ______________ linkage criterion is a popular choice for hierarchical clustering, which merges the two clusters based on the mean
distance between their points.
Select one:
a. Complete
b. Centroid
c. Single
d. Average
Question 26
Incorrect
Mark 0.00 out of 1.00
What is the learning rule for a perceptron called?
Select one:
a. The Backpropagation Algorithm
b. The Perceptron Learning Algorithm
c. The Delta Rule
d. The Hebbian Rule
Question 27
Correct
Mark 1.00 out of 1.00
The KL distance between two discrete probability distributions P and Q is defined as:
Select one:
a. The sum of the differences between the probabilities of each event in P and Q
b. The sum of the logarithm of the ratio of the probabilities of each event in P and Q
c. The sum of the products of the probabilities of each event in P and Q
d. The sum of the ratio of the probabilities of each event in P and Q
Question 28
Incorrect
Mark 0.00 out of 1.00
What is a directed acyclic graph (DAG)?
Select one:
a. A graph in which the edges have a direction and there are no cycles
b. A graph in which the edges do not have a direction and there are no cycles
c. A graph in which the edges have a direction and there are cycles
d. A graph in which the edges do not have a direction and there are cycles
Question 29
Correct
Mark 1.00 out of 1.00
What is the process of evaluating the performance of a trained perceptron on unseen data called?
Select one:
a. Training
b. Testing
c. Pruning
d. Validation
Question 30
Correct
Mark 1.00 out of 1.00
What is the process of applying machine learning algorithms to data called?
Select one:
a. Data mining
b. Data modeling
c. Data visualization
d. Data analysis
Question 31
Correct
Mark 1.00 out of 1.00
What is the EM algorithm used to estimate in the "E" step?
Select one:
a. The model parameters
b. The likelihood of the model
c. The latent variables
d. The prediction accuracy of the model
Question 32
Correct
Mark 1.00 out of 1.00
Which of the following is NOT a feature of KNIME?
Select one:
a. Machine learning
b. Data storage
c. Data transformation
d. Flow-based programming
Question 33
Correct
Mark 1.00 out of 1.00
What is an example of a classification task in supervised learning?
Select one:
a. Predicting the stock price for the next day based on historical data
b. Grouping customers into different segments based on their spending habits
c. Determining whether an email is spam or not
d. Predicting the price of a house based on its characteristics
Question 34
Correct
Mark 1.00 out of 1.00
What is the Kullback-Leibler (KL) distance used for?
Select one:
a. To measure the uncertainty of a probability distribution
b. To measure the similarity between two probability distributions
c. To measure the dissimilarity between two probability distributions
d. To measure the predictability of a probability distribution
Question 35
Correct
Mark 1.00 out of 1.00
Which of the following is NOT a disadvantage of the k-means algorithm?
Select one:
a. It can handle categorical variables
b. It can be computationally expensive for large datasets
c. It is sensitive to the initial placement of centroids
d. It may produce suboptimal results if the clusters are not spherical
Question 36
Correct
Mark 1.00 out of 1.00
In hierarchical clustering, the distance between clusters is typically measured using the ______________ criterion.
Select one:
a. Linkage criterion
b. Cosine similarity
c. Euclidean distance
d. Manhattan distance
Question 37
Correct
Mark 1.00 out of 1.00
The KL distance is often used in machine learning and artificial intelligence to compare two probability distributions, such as a model's
predicted distribution and the true distribution. In this context, the KL distance can be used as a:
Select one:
a. Loss function
b. Kernel function
c. Cost function
d. Activation function
Question 38
Incorrect
Mark 0.00 out of 1.00
How can the problem of producing suboptimal results if the clusters are not spherical be addressed in the k-means algorithm?
Select one:
a. By using a different clustering algorithm
b. By using a hierarchical clustering approach
c. By normalizing the data prior to clustering
d. By using the k-means++ initialization method
Question 39
Correct
Mark 1.00 out of 1.00
How can users access the KNIME Marketplace?
Select one:
a. From the KNIME interface
b. From the KNIME forum
c. From the KNIME website
d. All of the above
Question 40
Incorrect
Mark 0.00 out of 1.00
The KL distance is often used in natural language processing to compare the distribution of words in a document with the distribution
of words in a reference corpus. In this context, a low KL distance indicates that the document is:
Select one:
a. Very different from the reference corpus
b. Somewhat similar to the reference corpus
c. Somewhat different from the reference corpus
d. Very similar to the reference corpus
Question 41
Correct
Mark 1.00 out of 1.00
How is the final set of clusters determined in the k-means algorithm?
Select one:
a. By selecting the set of clusters that maximize the sum of squared errors
b. By selecting the set of clusters that minimize the within-cluster variance
c. By selecting the set of clusters that maximize the within-cluster variance
d. By selecting the set of clusters that minimize the sum of squared errors
Question 42
Correct
Mark 1.00 out of 1.00
How can the sensitivity to the initial placement of centroids be addressed in the k-means algorithm?
Select one:
a. By normalizing the data prior to clustering
b. By using a different clustering algorithm
c. By using the k-means++ initialization method
d. By using a hierarchical clustering approach
Question 43
Correct
Mark 1.00 out of 1.00
What is the process of selecting a subset of data for analysis called?
Select one:
a. Cleaning
b. Filtering
c. Normalizing
d. Sampling
Question 44
Correct
Mark 1.00 out of 1.00
What is the minimum required Java version to run KNIME?
Select one:
a. Java 8
b. Java 10
c. Java 7
d. Java 9
Question 45
Incorrect
Mark 0.00 out of 1.00
The KL distance can be used to measure the information lost when approximating one distribution with another. In this context, the
distribution being approximated is known as the:
Select one:
a. Approximation distribution
b. Target distribution
c. Base distribution
d. Reference distribution
Question 46
Correct
Mark 1.00 out of 1.00
What is the "M" step in the EM algorithm?
Select one:
a. The step where the likelihood of the model is maximized
b. The step where the model parameters are updated
c. The step where the prediction accuracy of the model is calculated
d. The step where the expectation of the latent variables is calculated
Question 47
Correct
Mark 1.00 out of 1.00
What is the process of identifying and removing duplicate data called?
Select one:
a. De-duplication
b. Filtering
c. Cleaning
d. Sampling
Question 48
Correct
Mark 1.00 out of 1.00
What is the process of adjusting the weights of a perceptron based on the error calculated during validation called?
Select one:
a. Training
b. Pruning
c. Validation
d. Testing
Question 49
Incorrect
Mark 0.00 out of 1.00
How is the Hebb rule different from the delta rule?
Select one:
a. The Hebb rule uses the input and output to update the weights, while the delta rule uses the error between the output and
target
b. The Hebb rule uses the error between the output and target to update the weights, while the delta rule uses the input and
output
c. The Hebb rule uses the input and output to update the weights, while the delta rule uses the error between the output
and target
d. The Hebb rule uses the output and target to update the weights, while the delta rule uses the input and output
Question 50
Correct
Mark 1.00 out of 1.00
What is the assumption made by the Naive Bayes classifier?
Select one:
a. That the features in the data are independent of each other
b. That the features in the data are dependent on each other
c. That the features in the data are uniformly distributed
d. That the features in the data are normally distributed
◄ Prelim Lab Exam
Jump to...
Midterm Lab Exam ►