TOPICS (Click to Navigate)

Pages

Monday, October 12, 2020

Data warehousing and mining quiz questions and answers set 05

Data warehousing and Data mining solved quiz questions and answers, multiple choice questions MCQ in data mining, questions and answers explained in data mining concepts, data warehouse exam questions, data mining mcq

Data Warehousing and Data Mining - MCQ Questions and Answers SET 05


1. Which of the following best describes the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters?

a) Training dataset

b) Test dataset

c) Validation dataset

d) Holdout dataset

Answer: (c) Validation dataset

Validation dataset is the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters.

It is usually used for parameter selection and to avoid overfitting. It helps in tuning the parameters of the model. For example, in neural network, it is used to choose the number of hidden units.

Validation dataset is different from test dataset.

The validation set is also known as the Development set.

 

2. In which of the following, data are stored, retrieved and updated?

a) OLAP

b) MOLAP

c) HTTP

d) OLTP

 

Answer: (d) OLTP

Online Transaction Processing (OLTP) is a type of data processing in information systems that typically facilitate transaction oriented applications. A system to handle inventory of a super market, ticket booking system, and financial transaction systems are some examples of OLTP.

OLAP is Online Analytical Processing system used primarily for data warehouse environments.

 

3. Data warehouse deals with which type of data that is never found in the operational environment?

a) Normalized

b) Informal

c) Summarized

d) Denormalized

Answer: (c) Summarized

Data warehouse handles summarized (aggregated) data that are aggregated from OLTP systems.

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data.

Data warehouses are large databases that are specifically designed for OLAP and business analytics workloads.

As per definition of Ralph Kimball, a data warehouse is “a copy of transaction data specifically structured for query and analysis.”

 

4. Classification is a data mining task that maps the data into _________ .

a) predefined group

b) real valued prediction variable

c) time series

d) clusters

Answer: (a) predefined group

Classification is a data mining function that assigns items in a collection to target categories or classes that are predefined. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks. [for more on sample classification problems]

k-nearest neighbor (knn), naïve bayes and support vector machine (svm) are few of the classification algorithms.

 

5. Which of the following clustering techniques start with as many clusters as there are records or observations with each cluster having only one observation at the starting?

a) Agglomerative clustering

b) Fuzzy clustering

c) Divisive clustering

d) Model-based clustering

Answer: (a) Agglomerative clustering

This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Agglomerative clustering starts with single object clusters (singletons) and proceeds by progressively merging the most similar clusters, until a stopping criterion (which could be a predefined number of groups k) is reached. In some cases, the procedure ends only when all the clusters are merged into a single one, which is when one aims at investigating the overall granularity of the data structure.

You may refer here for applications of hierarchical clustering

 

**********************

 

Related links:

 

 

Which of the clustering technique works in bottom-up approach

List few classification algorithms

What type of data used by data warehouse

Difference between OLAP and OLTP

How validation set is different from test set and training set

Validation dataset is used for parameter selection and avoid overfitting 

No comments:

Post a Comment