Top
5 quiz questions in IR, Information retrieval quiz, information
retrieval mcqs with answers, information retrieval, stemming,
query expansion, recall, click relevance feedback, TF-IDF, IR questions answers for GATE exams, vector space retrieval
Information Retrieval MCQs - SET 05
1. Which of the following is a
disadvantage of click relevance feedback method?
a) Easy availability
b) Less noisy
c) Very noisy
d) Very expensive to obtain
Click here to view answer and explanation
Ans : (c)
Answer: (c)
Very noisy
Click relevance feedback
(implicit feedback) is said to be noisier due to one or more of the following
reasons;
A user clicks on
a result based on his perusal of the snippet he is presented with.
A user may click
on the top k results simply because he trusts that the top results will
satisfy his information need.
A user may get
side-tracked and click a result simply because it looks interesting, even
though it is not relevant to the current information need.
A user may omit
possibly relevant results simply because their sources are not as
authoritative as the ones that he is previously aware of.
[Refer here for
more]
|
2. Let us suppose that the number of
documents in a corpus is M, the average length of a document (in terms of
number of words) is N, the size of vocabulary (number of unique keywords in the
corpus) is V, the average length of a query is Q, and the average number of documents
in which a query word appears is D. What is the time complexity of query
processing with inverted index in vector space information retrieval?
a) V * M
b) V * D
c) Q * D
d) Q * N
Click here to view answer and explanation
Ans : (c)
Answer: (c)
Q * D
Inverted index
helps in fast access. With inverted index, for each query keyword, we need to
look at the documents that contain the keyword. Hence, the time complexity is
QD.
|
3. Which of the following is true
about Stemming?
a) It increases the recall and reduces
the precision
b) It increases the precision and
reduces the recall
c) Recall and precision are equal if
you use stemming
d) None of the above
Click here to view answer and explanation
Ans : (a)
Answer: (a)
It increases the recall and reduces the precision
Stemming is a
rule-based process of reducing the inflected words to their root word/stem. It
is a technique to provide ways of finding morphological variants of search
terms. If we apply stemming on a word in a user query, it might group
different word types together. For example, the words ‘clip’ (a metal holder)
and ‘clipping’ (a small piece trimmed from something) will be stemmed to ‘clip’.
Hence, it would match more documents than expected (both the documents
consist of the word ‘clip’ as tool and ‘clipping’ as technique will be
included in the result). This increases the recall and reduces the precision.
|
4. The TF-IDF weight a term t will be
______ when t occurs many times within a small number of documents.
a) Lowest
b) Highest
c) Cannot determine
d) Lower
Click here to view answer and explanation
Ans : (b)
Answer: (b)
Highest
If the word is
rare and appears in very few documents, the score will approach 1.
TF-IDF of a term
t is calculated by multiplying the Term Frequency and Inverse Document
Frequency as follows;
tf(t,d) *
idf(t,D) = (frequency of word t in document d) * log((Total number of documents
in the corpus)/(number of documents containing the term t))
If the term t
appears in small number of documents, then the idf value for large N will be
high. Hence, the tf-idf is highest.
|
5. The TF-IDF weight of a term t will
be ______ when t occurs in virtually all documents.
a) Lowest
b) Highest
c) Cannot determine
d) Lower
Click here to view answer and explanation
Ans : (a)
Answer: (a)
Lowest
If the word is
very common and appears in many documents, the score will approach 0.
Please refer to
the answer of question 4.
|
********************
Related links
Keywords
What
is the time complexity of query processing with inverted index in vector space
information retrieval?
Stemming increases recall and reduces precision.
Why recall is referred as non-decreasing function of the number of documents retrieved?
Why does the process of stemming reduces the value of recall?
Behavior of tf-idf for small number of documents with the term t against the large number of documents
Why does the value of tfidf decreases for a term that occurs in all the documents.
What are the reasons for click relevance feedback is very noisy?
Why is the clickthrough relevance feedback (implicit relevance feedback) causes more noise / considered more noisy.