Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq
Data Warehousing and Data Mining - MCQ Questions and Answers SET 03
1. Which of the following practices can help in handling overfitting problem?
a) Use of faster processor
b) Increasing the number of training examples
c) Reducing the number of training instances
d) Increasing the model complexity
Answer: (b) Increasing the number of training examples Once we increase the number of training examples we will have lower test-error (variance of the model decrease) and this results in reduced overfitting.If our model does not generalize well from our training data to unseen data, we denote this as overfitting. An overfit model will have extremely low training error but a high testing error. |
2. Which of the following statements is INCORRECT about the SVM and kernels?
a. Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the mapped space
b. Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the original space
c. Using kernels allows us to obtain non linear decision boundaries for a classification problem
d. The kernel trick allows us to perform computations in the original space and enhances speed of SVM learning.
Answer: (b) Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the original space SVM transforms the original feature space into a higher-dimensional space based on a user-defined kernel function and then finds support vectors to maximize the separation (margin) between two classes in the higher-dimensional space. |
3. Dimensionality reduction reduces the data set size by removing ____________.
a) Relevant attributes.
b) Irrelevant attributes.
c) Support vector attributes.
d) Mining attributes
Answer: (b) Irrelevant attributes We remove those attributes or features that are irrelevant and redundant in order to reduce the dimension of the feature set. Dimensionality reduction Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data. [Wikipedia] The process of dimensionality reduction is divided into two components, feature selection and feature extraction. In feature selection, smaller subsets of features are chosen from a set of many dimensional data to represent the model by filtering, wrapping or embedding. Feature extraction reduces the number of dimensions in a dataset in order to model variables and perform component analysis. [For more please refer here] |
4. What is the Hamming distance between the binary vectors a = 0101010001 and b = 0100011001?
a) 2
b) 3
c) 5
d) 10
Answer: (a) 2 For binary data, the Hamming distance is the number of bits that are different between two binary vectors. |
5. What is the Jaccard similarity between the binary vectors a = 0111010101 and b = 0100011111?
a) 0.5
b) 1.5
c) 2.5
d) 3
Answer: (a) 0.5 For binary data, the Jaccad similarity is a measure of similarity between two binary vectors.Jaccard similarity between binary vectors can be calculated using the following equation; Jsim = C11 / (C01 + C10 + C11) Here, C11 is the count of matching 1’s between two vectors, C01 and C10 is the count of dissimilar binary values between two vectors For the given question, C11 = the number of bit positions that has matching 1’s = 4 C10 = the number of bit positions where the first binary vector (vector a) is 1 and second vector (vector b) is 0 = 2 C01 = the number of bit positions where the first binary (vector b) vector is 0 and second vector (vector b) is 1 = 2 Jsim(a, b) = 4/(2+2+4) = 4/8 = ½ = 0.5 |
**********************
Related links:
Machine learning MCQ questions and answers home
Machine learning TRUE/FALSE questions and answers home
No comments:
Post a Comment