TOPICS (Click to Navigate)

Pages

Monday, October 12, 2020

Data warehousing and mining quiz questions and answers set 01

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 01


1. In a data mining task when it is not clear about what type of patterns could be interesting, the data mining system should:

a) Perform all possible data mining tasks

b) Handle different granularities of data and patterns

c) Perform both descriptive and predictive tasks

d) Allow interaction with the user to guide the mining process

Answer: (d) Allow interaction with the user to guide the mining process  

Users have a good sense of which “direction” of mining may lead to interesting patterns and the “form” of the patterns or rules they want to find. They may also have a sense of “conditions” for the rules, which would eliminate the discovery of certain rules that they know would not be of interest. Thus, a good heuristic is to have the users specify such intuition or expectations as constraints to confine the search space. This strategy is known as constraint-based mining.

 

2. To detect fraudulent usage of credit cards, the following data mining task should be used:

a) Feature selection

b) Prediction

c) Outlier analysis

d) All of the above

Answer: (c) Outlier analysis

Fraudulent usage of credit cards can be detected using outlier analysis or outlier detection.

Outlier

A data element that stands out from the rest of the data. The values that deviate from other observations on data are called outliers. In data distribution, they are not part of the pattern. Sometimes referred to as abnormalities, anomalies, or deviants, outliers can occur by chance in any given distribution.

Outlier analysis

The analysis used to find unusual patterns in a dataset. There are many outlier detection algorithms proposed under these broad categories; statistical based approaches, distance-based approaches, fuzzy approaches and kernel functions.

 

 

3. In high dimensional spaces, the distance between data points becomes meaningless because:

a) It becomes difficult to distinguish between the nearest and farthest neighbors

b) The nearest neighbor becomes unreachable

c) The data becomes sparse

d) There are many uncorrelated features

Answer: (a) It becomes difficult to distinguish between the nearest and farthest neighbors

Curse of dimensionality

The dimensionality curse phenomenon states that in high dimensional spaces distances between nearest and farthest points from query points become almost equal. Therefore, nearest neighbor calculations cannot discriminate candidate points.

By high dimensional spaces, we are talking about hundreds to thousands of dimensions for a dense vector (sparse vectors are a completely different topic). Basically once you get up to high-dimensionality, pairwise distance between all of your points approaches a constant.

 

 

4. The difference between supervised learning and unsupervised learning is given by:

a) Unlike unsupervised learning, supervised learning needs labeled data

b) Unlike unsupervised leaning, supervised learning can form new classes

c) Unlike unsupervised learning, supervised learning can be used to detect outliers

d) Unlike supervised learning, unsupervised learning can predict the output class from among the known classes

Answer: (a) Unlike unsupervised learning, supervised learning needs labeled data

Supervised learning: Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It is basically a synonym for classification. The supervision in the learning comes from the labeled examples in the training data set.

Unsupervised learning: Unsupervised learning is essentially a synonym for clustering. The learning process is unsupervised since the input examples are not class labeled. Typically, we may use clustering to discover classes within the data. The goal of unsupervised learning is to model the hidden patterns in the given input data in order to learn about the data.

 

 

5. Which of the following is used to find inherent regularities in data?

a) Clustering

b) Frequent pattern analysis

c) Regression analysis

d) Outlier analysis

Answer: (b) Frequent pattern analysis

Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It is an intrinsic and important property of datasets.

Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis are some of the applications of frequent pattern analysis.

 

**********************

 

Related links:

 

 

What are the applications of frequent pattern analysis

Difference between supervised and unsupervised learning

What is curse of dimensionality

Why the distance between data points are meaningless in high dimensional spaces?

Application of outlier analysis is to detect fraudulent credit card usage

No comments:

Post a Comment