Top 5 Machine Learning Quiz Questions with Answers explanation, Interview
questions on machine learning, quiz questions for data scientist answers
explained, machine learning exam questions, question bank in machine learning, cross-validation, conditional probability, credit card fraud detection
Machine
learning MCQ - Set 19
1. Which of the
following cross validation versions may not be suitable for very large datasets
with hundreds of thousands of samples?
a) k-fold
cross-validation
b) Leave-one-out
cross-validation
c) Holdout method
d) All of the above
Click here to view answer
Ans : (b)
Answer: (b) Leave-one-out cross-validation
Leave-one-out
cross-validation (LOO cross-validation) is not suitable for very large
datasets due to the fact that this validation technique requires one model for
every sample in the training set to be created and evaluated.
Cross validation
It is a technique
to evaluate a machine learning model and it is the basis for whole class of
model evaluation methods. The goal of cross-validation is to test the model's
ability to predict new data that was not used in estimating it. It works by
the idea of splitting dataset into number of subsets, keep a subset aside,
train the model, and test the model on the holdout subset.
Leave-one-out
cross validation
Leave-one-out
cross validation is K-fold cross validation taken to its logical extreme,
with K equal to N, the number of data points in the set. That means that N
separate times, the function approximator is trained on all the data except
for one point and a prediction is made for that point. As before the average
error is computed and used to evaluate the model. The evaluation given by
leave-one-out cross validation is very expensive to compute at first pass.
[For more information on other cross-validation techniques you may refer here]
|
2. Assume that A
and B are two events. If P(A, B) increases while P(A) decreases, then which of
the following must be true?
a) P(A|B)
decreases.
b) P(B|A) increases.
c) P(B) decreases.
d) P(A|B) increases.
Click here to view answer
Ans : (b)
Answer: (b) P(B|A) increases
The traditional
approach for defining conditional probability is through joint probability.
This can be expressed as follows;
P(B | A) = P(A, B)
/ P(A)
In this equation,
if P(A) decreases, then only the increase in P(B|A) will result in increase
of P(A, B).
|
3. Which of the
following cross validation versions is suitable quicker cross-validation for
very large datasets with hundreds of thousands of samples?
a) k-fold
cross-validation
b) Leave-one-out
cross-validation
c) Holdout method
d) All of the above
Click here to view answer
Ans : (c)
Answer: (c) Holdout method
Holdout
cross-validation method is suitable for very large dataset because it is the
simplest and quicker to compute version of cross-validation.
What is cross-validation?
Refer the answer for question 1 in this page.
Holdout method
In this method,
the dataset is divided into two sets namely the training and the test set
with the basic property that the training set is bigger than the test set.
Later, the model is trained on the training dataset and evaluated using the
test dataset.
|
4. Which of the
following is a disadvantage of k-fold cross-validation method?
a) The variance of
the resulting estimate is reduced as k is increased.
b) This usually does
not take longer time to compute
c) Reduced bias
d) The training
algorithm has to rerun from scratch k times
Click here to view answer
Ans : (d)
Answer: (d) The training algorithm has to rerun from scratch k
times
In k-fold
cross-validation, the dataset is divided into k subsets. Like in holdout
method, these subsets are divided into training and test sets as follows;
a)
One of the subsets is chosen as the test set and the
other subsets put together forms the training set.
b)
Train a model on training set and test using test set
c)
Keep the score to calculate the average error.
d)
Repeat (a) to (c) for all individual subsets as test
sets
Here, as there is
a change in the training set in every cycle, the training algorithms has to
rerun from scratch k times. Hence, it takes k times as much computation to make an evaluation.
|
5. Consider that
you are analyzing a large collection of fraudulent credit card transactions to
discover if there are sub-types of these transactions. Which of the following
learning methods best describes the given learning problem?
a) Reinforcement
Learning
b) Supervised
Learning
c) Unsupervised
Learning
d) Semi-supervised
learning
Click here to view answer
Ans : (c)
Answer: (c) Unsupervised learning
Unsupervised
learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses.
It can be thought
of as self-learning process where the algorithm can find previously unknown
patterns in datasets that do not have any sort of labels.
k-means
clustering and Apriori algorithm are the unsupervised learning techniques.
Anomaly detection
and clustering are some of the applications of unsupervised learning.
|
**********************
Related links:
What are the applications of unsupervised learning
What type of learning is the credit card fraud detection
What are the disadvantages of k-fold cross-validation
Why the leave-one-out cross-validation (loocv) is not best suited for very large databases
Explain cross-validation
List the different cross validation methods
Which cross validation methods does not consume longer times to complete. Fastest cross-validation method.
Discuss the steps of k-fold cross-validation
Why k-fold cross-validation takes more time than holdout method
great
ReplyDelete