Saturday, February 20, 2016

Database Indexing - Solved Exercise set 2

Indexing and Tuning - solved exercises - Indexing examples - Tuning examples - How to choose an appropriate index?


Question:

Consider a relation Customer of a large super market chain with the relational schema Customer (CID, CName, City, PointsCollected). Here CID is the customer identification number that stores unique values.
The following query is the most frequent one;
SELECT * FROM Customer WHERE CID = v1;
Assume that the relation Customer does not have any indexes. We need to create necessary indexes. Which of the following indexes is best suited for executing the above query efficiently? Why?
(a) A sparse primary index on CID           (b) A dense primary index on CID

Solution:

Definition: Primary index can be sparse (only few values of indexing column of data file is stored sparsely in index table along with pointer in sorted order) or dense (all the values of indexing column of the data file is stored in index table along with pointer in sorted order).

The query given above is called point queries. Point query is searching for records that satisfies a condition of the form “attribute_name = value”.

To explain further, let us use the given query with a value as follows;

SELECT * FROM Customer WHERE CID = 123;

This query, according to the information provided in the question, will give exactly one record.

Among sparse index and dense index, the dense index is most suitable index for CID attribute. Let us discuss this with the above query.

The following will happen,

If we use sparse index – the system will look for the matching CID value in the index file. [As index file is always sorted, we can skip further scan once we reach the required record]. If we find the matching record,
1. we can read the pointer (address) stored adjacent and
2. locate the actual record (or the block where the record is stored) in the data file directly using the address.
If the matching value is not found in the index file (because it is sparse),
1. we have to find the largest CID value that is smaller than 123,
2. use the pointer of that largest CID value, and
3. locate the record (or the block where the record is stored) of largest CID in the data file directly using the address, and
4. perform linear scan until you reach the required value/record.

If we use dense index - the system will look for the matching CID value in the index file. [As index file is always sorted, we can skip further scan once we reach the required record]. If we find the matching record,
1. we can read the pointer (address) stored adjacent and
2. locate the actual record (or the block where the record is stored) in the data file directly using the address.

The difference between these two is, dense index stores all the values in index file so linear scan in index file is enough to find any record, whereas sparse index stores few records than dense index in the index file so linear scan is required both in index file and in data file (in many cases).

Hence, dense primary index is the best suited for executing the above query.










c

No comments:

Post a Comment

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents