Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 03

1. Which of the following practices can help in handling overfitting problem?

a) Use of faster processor

b) Increasing the number of training examples

c) Reducing the number of training instances

d) Increasing the model complexity

Answer: (b) Increasing the number of training examples

Once we increase the number of training examples we will have lower test-error (variance of the model decrease) and this results in reduced overfitting.

If our model does not generalize well from our training data to unseen data, we denote this as overfitting. An overfit model will have extremely low training error but a high testing error.

2. Which of the following statements is INCORRECT about the SVM and kernels?

a. Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the mapped space

b. Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the original space

c. Using kernels allows us to obtain non linear decision boundaries for a classification problem

d. The kernel trick allows us to perform computations in the original space and enhances speed of SVM learning.

Answer: (b) Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the original space

SVM transforms the original feature space into a higher-dimensional space based on a user-defined kernel function and then finds support vectors to maximize the separation (margin) between two classes in the higher-dimensional space.

3. Dimensionality reduction reduces the data set size by removing ____________.

a) Relevant attributes.

b) Irrelevant attributes.

c) Support vector attributes.

d) Mining attributes

Answer: (b) Irrelevant attributes

We remove those attributes or features that are irrelevant and redundant in order to reduce the dimension of the feature set.

Dimensionality reduction

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data. [Wikipedia]

The process of dimensionality reduction is divided into two components, feature selection and feature extraction. In feature selection, smaller subsets of features are chosen from a set of many dimensional data to represent the model by filtering, wrapping or embedding. Feature extraction reduces the number of dimensions in a dataset in order to model variables and perform component analysis. [For more please refer here]

4. What is the Hamming distance between the binary vectors a = 0101010001 and b = 0100011001?

a) 2

b) 3

c) 5

d) 10

Answer: (a) 2

For binary data, the Hamming distance is the number of bits that are different between two binary vectors.

5. What is the Jaccard similarity between the binary vectors a = 0111010101 and b = 0100011111?

a) 0.5

b) 1.5

c) 2.5

d) 3

Answer: (a) 0.5

For binary data, the Jaccad similarity is a measure of similarity between two binary vectors.

Jaccard similarity between binary vectors can be calculated using the following equation;

J_sim = C₁₁ / (C₀₁ + C₁₀ + C₁₁)

Here, C11 is the count of matching 1’s between two vectors,

C01 and C10 is the count of dissimilar binary values between two vectors

For the given question,

C11 = the number of bit positions that has matching 1’s = 4

C10 = the number of bit positions where the first binary vector (vector a) is 1 and second vector (vector b) is 0 = 2

C01 = the number of bit positions where the first binary (vector b) vector is 0 and second vector (vector b) is 1 = 2

TOPICS (Click to Navigate)

Pages

Monday, October 12, 2020

Data warehousing and mining quiz questions and answers set 03

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 03

Once we increase the number of training examples we will have lower test-error (variance of the model decrease) and this results in reduced overfitting.

b. Kernels map the original dataset into a higher dimensional space and then find a hyper-plane in the original space

3. Dimensionality reduction reduces the data set size by removing ____________.

For binary data, the Hamming distance is the number of bits that are different between two binary vectors.

For binary data, the Jaccad similarity is a measure of similarity between two binary vectors.

Related links:

Data Warehousing and Data Mining Questions and Answers Home page

Machine learning MCQ questions and answers home

Machine learning TRUE/FALSE questions and answers home

What is the impact of increasing training sample in overfitting?

What is the impact of overfitting?

How to calculate Jaccard similarity between two binary vectors

Calculate Hamming distance

List down the components of dimensionality reduction

SVM transforms the original feature space into a higher-dimensional space

No comments:

Post a Comment