Top
5 quiz questions in IR, Information retrieval quiz, information
retrieval mcqs with answers, information retrieval, inverted index, zipf's law, fallout measure, term frequency, formal definition of information retrieval system
Information Retrieval MCQs - SET 03
1. A data structure that maps terms
back to the parts of a document in which they appear is called
a) Lexicon
b) Dictionary
c) Inverted index
d) All of the above
Click here to view answer and explanation
Ans : (c)
Answer: (c)
Inverted index
An inverted index
(also referred to as a postings file or inverted file) is a database index
storing a mapping from content, such as words or numbers, to its locations in
a table, or in a document or a set of documents (named in contrast to a
forward index, which maps from documents to content). The purpose of an
inverted index is to allow fast full-text searches, at a cost of increased
processing when a document is added to the database. Refer here for more.
[Source: Wikipedia]
|
2. How the information retrieval
problem can be defined formally?
a) a triple
b) a quadruple
c) a couple
d) None of the above
Click here to view answer and explanation
Ans : (b)
Answer: (b)
a quadruple (4-tuple)
IR model can be
defined as 4-tuple [D, Q, F, R(q,j)] where D refers to the collection of
documents, Q refers to the query collection, F refers to the framework for
modeling documents and queries, and R refers to the ranking function to
associate a rank to the query and the document.
|
3. The count of occurrences of a word
in a document is referred as
a) document frequency
b) term frequency
c) collection frequency
d) change frequency
Click here to view answer and explanation
Ans : (b)
Answer: (b)
term frequency
How many times a
term occurs in a document is called the term frequency (TF). It is the count
of occurrence of a term t in a document d.
For example, in
this answer box (the above paragraph), the term frequency of “occurrence” is
1, “document” is 2.
|
4. Suppose the frequency of the most
frequent word in a corpus of Tamil documents is 10000. What would be the
estimated frequency of second most frequent in the given corpus as per Zipf’s
law?
a) 10000
b) 2500
c) 5000
d) Cannot be determined
Click here to view answer and explanation
Ans : (c)
Answer: (c)
5000
Frequency of second most frequent word = frequency of
most frequent word / 2
= 10000/2 = 5000
As per Zipf’s
law, the frequency of certain words is inversely proportional to their rank.
In simple terms, a word of rank r occurs 1/r times as the most frequent
words. That is, the rank 2 word occurs ½ times as the most frequent word, the
rank 3 word occurs 1/3 times as the most frequent word and so on.
|
5. The proportion of non-relevant
items that has been retrieved in a given search is
a) Precision
b) Recall
c) Generality
d) Fallout
Click here to view answer and explanation
Ans : (d)
Answer: (d)
Fallout
Fallout ratio
refers to the proportion of non-relevant documents that are retrieved. It is
used to measure how well the IR system filters out non-relevant documents.
For a good
information retrieval system, the fallout ratio should be low.
If N is the total
number of documents in the collection, y is the number of non-relevant
documents retrieved, and x is the number of relevant documents, then fallout
ratio F can be calculated as follows;
F = y / (N - x)
|
********************
Related links
Keywords
For what values of fallout ratio, we would say that the IR system is good?
Formal definition of information retrieval system as a quadruple
How to find the frequency of a second most frequent word using Zipf's law?
What is an inverted index? how to construct inverted index?
No comments:
Post a Comment