TOPICS (Click to Navigate)

Pages

Monday, June 14, 2021

Calculate the TF-IDF weight for tems of a given document

Find the TF-IDF of terms of a given document and a collection of documents, how to calculate tf-idf, the use of tf-idf in finding the importance of a term in a document, term frequency-inverse document frequency

Question:

Given a document X containing terms t1, t2 and t3 with frequencies (inside brackets) as follows;

t1(3), t2(2), t3(1)

Let us assume that the collection contains 10,000 documents and document frequencies of these terms are as follows;

t1(50), t2(1300), t3(250)

Then, find the TF-IDF weight of terms t1, t2, and t3 in the document X.

 

Solution:

TF-IDF (Term Frequency-Inverse Document Frequency) is a measure to calculate “how relevant a term is in a given document”.

TFt,d counts the number of times a term t occurs in a document d. It can be calculated as follows;


For example, if the document D1 contains the term ‘quick’ 10 times, and it has 54 words in it, then the TF’quick’, D1 = 10/54 = 0.19.

DFt refers to the number of documents in which t presents.

For example, if 120 documents consist of the word ‘quick’, then the DF’quick’ = 120.

IDFt is the inverse measure used to calculate the informativeness of the given term t. This means, how common or rare a word is in the entire document set. The closer it is to 0, the more common a word is. This can be calculated as follows;


Here, N is the number of documents in the given collection, and DFt is the document frequency of term t.

The TF-IDF weight of a term is the product of its TF weight and its IDF weight.

TF-IDF for term t1;

TFt1 = (number of times t1 occurs in X)/(number of words in X) = 3/3

IDFt1 = log(No. of docs in the collection/No. of docs t1 appears) = log(10000/50) = 5.3

TF-IDF for t1 = 5.3

 

TF-IDF for term t2;

TFt2 = 2/3

IDFt2 = log (10000/1300) = 2.0

TF-IDF for t2 = 1.3

 

TF-IDF for term t3;

TFt3 = 1/3

IDFt3 = log (10000/250) = 3.7

TF-IDF for t3 = 1.23

 

******************

 

how to find term frequency? how to find inverse document frequency? how to calculate tf-idf weight? what is the importance of tf-idf weights? solved exercise in information retrieval, define term frequency, define document frequency, define inverse document frequency


No comments:

Post a Comment