Top
3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions,
Solved questions in natural language processing, NLP practitioner exam
questions, MLE, Smoothing, Laplace smoothing
Multiple Choice Questions in NLP - SET 19
1. Which of the following is not a problem when using Maximum Likelihood Estimation to obtain parameters in a language model?
a) Unreliable estimates where there is little training data
b) Out-of-vocabulary terms
c) Overfitting
d) Smoothing
Answer: (d) Smoothing Options (a) to (c) are possible problems when we use MLE in a language model. Relative frequency estimation assigns all probability mass to events in the training corpus. Smoothing is a technique to handle unknown words. It adjusts the maximum likelihood estimate of probabilities to produce more accurate probabilities. |
2. Which of the following is the main advantage of neural transition-based dependency parsers over non-neural transition-based dependency parsers?
a) It chooses transitions using more words in the stack and buffer
b) It generates a larger class of dependency parses
c) It relies on dense feature representations
d) It models a grammar whereas traditional parsers do not
Answer: (c) Rely on dense feature representations The main advantage of neural dependency parsers is that they offer a dense representation instead of a spare representation of the parser. Neural and traditional parsers are not different in what input information they can use, or what kinds of parses they can output (both can output any parse), but they differ in their representation of the features they use. [Stanford question] Dependency parsing is the process of analyzing the grammatical structure of a sentence based on the dependencies between the words in a sentence. |
3. Which of the following equations is used to find the unigram probabilities using Add-1 smoothing?
a) Count (wi)/N
b) Count (wi)/(N+1)
c) (Count (wi)+1)/(N+1)
d) (Count (wi)+1)/(N+V)
Answer: (c) (Count (wi)+1)/(N+V) Smoothing is a technique to handle unknown words. In Add-1 smoothing, we add 1 to the count of all n-grams in the training set before normalizing into probabilities. As we have added 1 to the numerator (1 to the unigram count), we have to normalize that by adding the count of unique words with the denominator in order to normalize. Hence, V, the size of vocabulary is added in the denominator. |