Monday, October 12, 2020

Data warehousing and mining quiz questions and answers set 02

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 02


1. In non-parametric models

a) There are no parameters

b) The parameters are fixed in advance

c) A type of probability distribution is assumed, then its parameters are inferred

d) The parameters are flexible

Answer: (d) The parameters are flexible

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

In non-parametric models, no fixed set of parameters and no probability distribution is assumed. They have parameters that are flexible.

 

2. The goal of clustering analysis is to:

a) Maximize the inter-cluster similarity

b) Maximize the intra-cluster similarity

c) Maximize the number of clusters

d) Minimize the intra-cluster similarity

Answer: (b) Maximize the intra-cluster similarity

One of the goals of a clustering algorithm is to maximize the intra-cluster similarity.

A clustering algorithm with small intra-cluster distance (high intra-cluster similarity) and high inter-cluster distance (low inter-cluster similarity) is said to be a good clustering algorithm.

Clustering analysis is a technique for grouping similar observations into a number of clusters based on multiple variables for each individual observed value. It is an unsupervised classification.

Inter-cluster distance – the distance between two objects from two different clusters.

Intra-cluster distance – the distance between two objects from the same cluster.

 

3. In decision tree algorithms, attribute selection measures are used to

a) Reduce the dimensionality

b) Select the splitting criteria which best separate the data

c) Reduce the error rate

d) Rank attributes

Answer: (b) Select the splitting criteria which best separate the data

Attribute selection measures in decision tree algorithms are mainly used to select the splitting criterion that best separates the given data partition.  

During the induction phase of the decision tree, the attribute selection measure is determined by choosing the attribute that will best separate the remaining samples of the nodes partition into individual classes.

The data set is partitioned according to a splitting criterion into subsets.  This procedure is repeated recursively for each subset  until  each  subset  contains  only  members  belonging  to  the  same  class  or  is sufficiently small.

Information gain, Gain ratio and Gini index are the popular attribute selection measures.

 

4. Pruning a decision tree always 

a) Increases the error rate

b) Reduces the size of the tree

c) Provides the partitions with lower entropy

d) Reduces classification accuracy

Answer: (b) Reduces the size of the tree

Pruning means simplifying/compressing and optimizing a decision tree by removing sections of the tree that are uncritical and redundant to classify instances. It helps in significantly reducing the size of the decision tree.

Decision trees are the most susceptible machine learning algorithm to overfitting (the undesired induction of noise in the tree). Pruning can reduce the likelihood of overfitting problem.

 

5. Which of the following classifiers fall in the category of lazy learners:

a) Decision trees

b) Bayesian classifies

c) k-NN classifiers

d) Rule-based classifiers

Answer: (c) k-NN classifier

k-nearest neighbor (k-NN) classifier is a lazy learner because it doesn’t learn a discriminative function from the training data but “memorizes” the training dataset instead.

Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple. When it does, classification is conducted based on the most related data in the stored training data.

Lazy learning is also referred as “just-in-time learning”.

The other categories of classifiers is “Eager learners”.

 

************************

Related links:

 

 

What is lazy learning in data mining?

Which of the data noise problem is reduced through pruning in decision trees?

What is the role of attribute selection measure in data mining.

What are the popular attribute selection measure

Why non-parametric models are said to be flexible

Which machine learning algorithm is most susceptible to overfitting

Define inter-cluster and intra-cluster distance 

Machine learning algorithms MCQ with answers

Machine learning question banks and answers

 

No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery