Top
5 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions,
Solved questions in natural language processing, NLP practitioner exam
questions, Add-1 smoothing, MLE, inverse document frequency
Multiple Choice Questions in NLP
1. Let us assume
that we use the words ‘study’ ‘computer’ and ‘abroad’. It these are only
informative words to classify that a mail is spam or not. Which of the
following represent the maximum-likelihood estimate using add-one smoothing for
P(study|spam)? Use the following table to answer the question;
‘study’
|
‘computer’
|
‘abroad’
|
Class
|
1
0
1
1
0
0
0
0
0
0
|
0
1
0
1
0
0
0
1
0
0
|
0
1
0
0
0
0
1
0
0
0
|
Not spam
Not spam
Not spam
Not spam
Spam
Spam
Spam
Spam
Spam
Spam
|
a) 0/6
b) 0/8
c) 1/6
d) 1/8
Click here to view answer
Ans : (d)
Answer: (d) 1/8
Add-1 smoothing
(Laplace smoothing) can be calculated using the following equation;
For this
question, count(study) under the class ‘Spam’ = 0, N = 6 (frequency of
‘spam’), and V = 2 (count of possible classes).
Refer here for more on Add-1 smoothing.
|
2. What is the
probability P (‘computer in abroad’ | spam) as per the data in the table given
in question 1?
a) 1/6
b) 2/6
c) 1/36
d) 1/18
Click here to view answer
Ans : (c)
Answer: (c) 1/36
To estimate the
probability of X given a class, we can use this formula as per Naïve bayes
classifier;
P(A1,
A2, …, An |C) = P(A1| Cj) P(A2|
Cj)… P(An| Cj)
In our problem,
P(computer|spam)
= count(computer true under spam)/total spam = 1/6.
P(abroad|spam) =
count(abroad true under spam)/total spam = 1/6.
P(‘computer in
abroad’|spam) = P(computer|spam)*p(abroad|spam) = 1/6 * 1/6 = 1/36.
Here, the word
‘in’ is ignored because as per the question it does not contribute well in
classification.
|
3. What is the
unsmoothed maximum likelihood estimate of P(Spam) for the data given in
question 1?
a) 1
b) 6/10
c) 4/6
d) 3/5
Click here to view answer
Ans : (b)
Answer: (b) 6/10 and (d) 3/5
Unsmoothed MLE of
P(Spam);
P(Spam) = (Number of time ‘Spam’ class appears
in the dataset)/(Total occurrence of all classes)
= 6/10
Unsmoothed denotes
the use of raw data. For example, the count of certain words may be zero.
|
4. Which of the
following increases the weight of rarely occurring terms in the document set?
a) Term frequency
b) Word frequency
c) Inverse document
frequency
d) Bi-gram
frequency
Click here to view answer
Ans : (c)
Answer: (c) Inverse document frequency (IDF)
Inverse Document
Frequency (IDF)
It is a
statistical weight used for measuring the importance of a term in a text
document collection. It measures about how much information the word
provides, i.e., if it's common or rare across all documents. IDF is
calculated as follows;
idft =
log (N/dft)
In this equation,
N is the total number of documents in a collection and dft is the
number of documents in which the term t appears.
The IDF of a rare
term is high, whereas the IDF of a frequent term is likely to be low.
The idea would be
to reduce the tf weight of a term by a factor that grows with its collection
frequency in TF-IDF calculation.
[For more, you
can refer here Wikipedia]
|
5. The act of
converting a text document into a set of individual words is referred as ______
.
a) Tokenization
b) Stemming
c) Lemmatization
d) All of the above
Click here to view answer
Ans : (a)
Answer: (a) Tokenization
Tokenisation is
the process of breaking up a given text into units called tokens. Tokens can
be individual words, phrases or even whole sentences.
Given a character
sequence and a defined document unit, tokenization is the task of chopping it
up into pieces, called tokens , perhaps at the same time throwing away certain
characters, such as punctuation. For example,
“How are you?”
If the above
sentence is tokenized, the result will consist of the following tokens;
‘How’, ‘are’,
‘you’.
|
*************
Top interview questions in NLP
NLP quiz questions with answers explained
Online NLP quiz with solutions
question and answers in natural language processing
unsmoothed maximum likelihood estimation
What is inverse document frequency
how inverse document frequency helps in tf-idf calculation
Top 5 important questions with answers in natural language processing
No comments:
Post a Comment