Top
5 Machine Learning Quiz Questions with Answers explanation, Interview
questions on machine learning, quiz questions for data scientist answers
explained, machine learning exam questions, question bank in machine
learning, k-means, elbow method, decision tree, entropy calculation
Machine
learning MCQ - Set 20
1. Which of the
following clustering algorithm requires the number of clusters to be
pre-specified?
a) hierarchical clustering
b) k-means clustering
c) DBSCAN
d) Markov
clustering algorithm
Click here to view answer
Ans : (b)
Answer: (b) k-means clustering
We need to choose
the number of clusters beforehand in k-means clustering algorithm.
Hierarchical
clustering considers each data point as individual cluster and groups similar
objects into clusters.
DBSCAN (Density-Based
Spatial Clustering of Applications with Noise) works based on the density of
data points in a region.
Markov Clustering
Algorithm works based on the simulation of flow in graphs. Based on the
weights of edges between vertices in a graph, it groups similar points.
|
2. Identify the best
method that is used for finding optimal clusters in k-means algorithm.
a) Euclidean method
b) Manhattan method
c) Elbow method
d) Silhouette method
Click here to view answer
Ans : (c)
Answer: (c) Elbow method
Elbow method measures
the compactness of the clustering using the total within-cluster sum of
square (wss) and we expect it should be minimum. The k that results in smallest
wss will be chosen as required k value.
Calculate the
Within-Cluster-Sum of Squared Errors (WSS) for different values of k, and
choose the k for which WSS becomes first starts to diminish. In the plot of
WSS-versus-k, this is visible as an elbow.
|
3. We are dealing
with samples x where x is a single value. We would like to test two alternative
regression models:
1) y = ax + e
2) y = ax + bx2
+ e
Which of these
regression models is more appropriate to fit the training data better?
a) model 1
b) model 2
c) both will
equally fit
d) not enough data
Click here to view answer
Ans : (b)
Answer: (b) model 2
Since it has more
parameters it is likely to provide a better fit for the training data.
Additional
information
Increasing the
number of parameters may lead to overfitting problem. Also, an overfit model
is too complex for the data you're analyzing.
If we have too
many features in the formula, the learned hypothesis will try very hard to
find the decision boundary to fit the training data set well but it fails to
generalize to make the accurate predictions on new, previously unseen
examples.
|
4. If we would like
to produce learning rules that are easily interpreted by humans, which of the
following machine learning task would we use?
a) Logistic
regression
b) Nearest neighbor
c) Decision tree
learning
d) Support Vector
Machine
Click here to view answer
Ans : (c)
Answer: (c) Decision tree learning
They are
intuitive and follow the same pattern of thinking that humans use when making
decisions.
|
5. Following are
the target values predicted by a decision tree in a training dataset which we
used to find whether a person have passed in interview or not.
[T,
T, T, F, F, T, T, T]
What is the entropy
H(pass)?
a) –(2/8 log22/8
+ 6/8 log26/8)
b) –(2/8 log22/8
+ 4/8 log24/8)
c) –(2/6 log22/6
+ 6/2 log26/2)
d) 2/8 log22/8
+ 6/8 log26/8
Click here to view answer
Ans : (a)
Answer: (a) –(2/8 log22/8 + 6/8 log26/8)
Entropy H(X) of a
random variable X can be calculated as follows;
Here, n is the
total number of possible values for variable X. For the given problem, n is 2
because we have two possible values ‘T’ and ‘F’.
H(X) is the
expected number of bits needed to encode a randomly drawn value of X(under
most efficient code)
|
**********************
Related links:
How to calculate the entropy of a target variable
Which of the machine learning method would produce rules that are easily interpreted by humans
Which regression model best fit the training data better
Why the regression model with more parameter better fit the training data
Optimal cluster finding method in k-means
Why do we need to specify the number of clusters beforehand in k-means clustering
No comments:
Post a Comment