TOPICS (Click to Navigate)

Pages

Friday, May 8, 2020

Machine Learning Multiple Choice Questions and Answers 08

Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scientist answers explained, machine learning exam questions



Machine learning MCQ - Set 08



1. Which among the following prevents overfitting when we perform bagging?

a) The use of sampling with replacement as the sampling technique
b) The use of weak classifiers
c) The use of classification algorithms which are not prone to overfitting
d) The practice of validation performed on every classifier trained

View Answer

Answer: (b) the use of weak classifiers
The presence of over-training (which leads to overfitting) is not generally a problem with weak classifiers. For example, in decision stumps, i.e., decision trees with only one node (the root node), there is no real scope for overfitting. This helps the classifier which combines the outputs of weak classifiers in avoiding overfitting..

2. Averaging the output of multiple decision trees helps ________.
a) Increase bias
b) Decrease bias
c) Increase variance
d)  Decrease variance

View Answer

Answer: (d) decrease variance

Averaging out the predictions of multiple classifiers will drastically reduce the variance.

Averaging is not specific to decision trees; it can work with many different learning algorithms. But it works particularly well with decision trees.

Why averaging?

If two trees pick different features for the very first split at the top of the tree, then it’s quite common for the trees to be completely different. So decision trees tend to have high variance. To fix this, we can reduce the variance of decision trees by taking an average answer of a bunch of decision trees.

3. If N is the number of instances in the training dataset, nearest neighbors has a classification run time of


a) O(1)
b) O( N )
c) O(log N )
d) O( N 2 )

View Answer

Answer: (b) O(N)
Nearest neighbors needs to compute distances to each of the N training instances. Hence, the classification run time complexity is O(N).

4. Which among the following is/are some of the assumptions made by the k-means algorithm (assuming Euclidean distance measure)?


a) Clusters are spherical in shape
b) Clusters are of similar sizes
c) Data points in one cluster are well separated from data points of other clusters
d) There is no wide variation in density among the data points

View Answer

Answer: (a) and (b) clusters are spherical in shape and of similar sizes
The Euclidean distance measure ensures that areas around a cluster centroid comprising points closest to that centroid (which is a cluster) is spherical in shape. Also, this particular distance measure prevents arbitrarily sized clusters since this typically violates the clustering criterion.

5. Which of the following is more appropriate to do feature selection?

a) Ridge
b) Lasso
c) both (a) and (b)
d) neither (a) nor (b)

View Answer

Answer: (b) lasso
For feature selection, we would prefer to use lasso since solving the optimization problem when using lasso will cause some of the coefficients to be exactly zero (depending of course on the data) whereas with ridge regression, the magnitude of the coefficients will be reduced, but won't go down to zero.

Ridge and Lasso

Ridge and Lasso are types of regularization techniques. They are the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression.



**********************

Related links:


top 5 questions in machine learning

quiz questions for data scientists

data science quiz online

online quiz questions on machine learning

MCQs on machine learning and data science

data science interview questions

data science previous question papers

machine learning multiple choice questions

test on machine learning skills

top 5 machine learning interview questions

machine learning exam questions

2 comments: