Monday, June 14, 2021

Calculate the TF-IDF weight for tems of a given document

Find the TF-IDF of terms of a given document and a collection of documents, how to calculate tf-idf, the use of tf-idf in finding the importance of a term in a document, term frequency-inverse document frequency

Question:

Given a document X containing terms t1, t2 and t3 with frequencies (inside brackets) as follows;

t1(3), t2(2), t3(1)

Let us assume that the collection contains 10,000 documents and document frequencies of these terms are as follows;

t1(50), t2(1300), t3(250)

Then, find the TF-IDF weight of terms t1, t2, and t3 in the document X.

 

Solution:

TF-IDF (Term Frequency-Inverse Document Frequency) is a measure to calculate “how relevant a term is in a given document”.

TFt,d counts the number of times a term t occurs in a document d. It can be calculated as follows;


For example, if the document D1 contains the term ‘quick’ 10 times, and it has 54 words in it, then the TF’quick’, D1 = 10/54 = 0.19.

DFt refers to the number of documents in which t presents.

For example, if 120 documents consist of the word ‘quick’, then the DF’quick’ = 120.

IDFt is the inverse measure used to calculate the informativeness of the given term t. This means, how common or rare a word is in the entire document set. The closer it is to 0, the more common a word is. This can be calculated as follows;


Here, N is the number of documents in the given collection, and DFt is the document frequency of term t.

The TF-IDF weight of a term is the product of its TF weight and its IDF weight.

TF-IDF for term t1;

TFt1 = (number of times t1 occurs in X)/(number of words in X) = 3/3

IDFt1 = log(No. of docs in the collection/No. of docs t1 appears) = log(10000/50) = 5.3

TF-IDF for t1 = 5.3

 

TF-IDF for term t2;

TFt2 = 2/3

IDFt2 = log (10000/1300) = 2.0

TF-IDF for t2 = 1.3

 

TF-IDF for term t3;

TFt3 = 1/3

IDFt3 = log (10000/250) = 3.7

TF-IDF for t3 = 1.23

 

******************

 

how to find term frequency? how to find inverse document frequency? how to calculate tf-idf weight? what is the importance of tf-idf weights? solved exercise in information retrieval, define term frequency, define document frequency, define inverse document frequency


No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery