How an SQL Query is executed by a Database Management System?
Introduction:
In this post, I would like to discuss the Query Execution process in simple terms. The detailed diagram of Query Execution process is given in Figure 1. I have mentioned two main components in the figure, namely Component A (Primary Storage) and Component B (Secondary Storage). First of all, let us have an abstracted view on these components on their roles in the process of Query Execution.A - Primary Storage:
Primary
storage (it is your RAM) also called as Primary Memory, Main Memory, Memory,
etc. All of us know that the Operating System brings all the files needed to be
processed into main memory first. A DBMS too actually maintains database in
number of different files. A DBMS have many modules for managing data and
executing queries. Among them two modules are given in the Figure. They are
Query Processor and Storage Manager. These two in turn have many sub-modules in
them (they are not discussed here directly).
After receiving
the query, the Query Processor module does the following;
- It verifies the syntax of the query,
- It checks for the table names and columns specified in the query against in Data Dictionary,
- It generates various possible cost based plans for the given query,
- It chooses one among the generated plans which would be best and,
- It instructs the Buffer manager to bring the required data into the buffer for execution.
Upon receiving
the instruction from the programs that executes query in DBMS, the Buffer
Manager (which
is part of Storage Manager) does the following;
- It checks for the availability of the requested block in buffer,
- If the requested block is not in the buffer, then allocates the space in the buffer,
- It then locates the data block in the disk, and transfers to the buffer, and
- It forwards the address of the block in main memory to the program which initiated the request.
As a whole,
we can define the major roles of these two components as follows;
Query
Processor – identifies the cost effective plan to execute a query.
Buffer
Manager – tries to reduce the number disk I/O between disk and main memory.
B - Secondary Storage:
It is your
hard disk where all the files are stored. Basically, hard disk storage units
are divided into tracks and sectors physically. Operating systems treats the basic
data units as Block, which is collection of one or more contiguous sectors. And,
when transfers files from disk to main memory, operating systems handle them in
units of blocks. The data transfer rate of hard disk drive is several orders of
magnitude slower than that of the main memory. This is where buffers and Buffer
managers are helpful in handling the data very efficiently when they are
needed. Disk subsystem is the system which consists of disk controller and set
of disks. It handles disks for disk I/O.
With this introduction, let us discuss the process with an example query in the following section.
Let us now
discuss the control flow for executing the given query,
SELECT * FROM
Employee WHERE Emp_ID = ‘E101’;
Step 1: Query Input
The query is
given to the Query processor module through any one of the various SQL
interfaces (SQL tool, Application programs with embedded SQL, etc.)
Step 2: At Query Processor
Query
Processor verifies the syntax of the query.
Then it
checks for the correctness of the table name Employee and the attribute Emp_ID in
the Data Dictionary. Also, it checks for the user authentication. That is, if
the user is privileged to access the requested data.
It generates
various possible equivalent queries (relational algebra equivalents) using the
meta information (for example, size of the table, keys of the table, integrity
constraints, indexes on various fields, and much more). For our query, we could
write the following relational algebra expression;
σEmp_ID
= ‘E101’(Employee)
As mentioned
above, using the indexes we could generate many other equivalent queries.
Finally, it
chooses one expression as execution plan for the given query. Let us assume the
above relational algebra expression is chosen to execute the given query.
Now the
instruction is given to the Buffer Manager to fetch the data needed by the
query.
Step 3: At Buffer Manager
It checks for
the data needed in the buffer itself. If the requested data not available with
buffer, then it initiates a request to disk subsystem to fetch the requested
data.
As discussed
above, the requested data alone cannot be transferred, instead the whole block
where the data is available will be transferred. In our example, the block with
4 records are transferred by the disk subsystem to main memory, which will be
linear scanned to locate the actual record.
Selection query
is processed as discussed. If the query involves UPDATE operation, then the
other steps must be included. For example, let us consider the following query;
UPDATE
Employee SET Department = ‘Marketing’ WHERE Emp_ID = ‘E101’;
For this
query, fetching the required data from disk to main memory is done as discussed
above. But, to ensure the changes made to the data to be reflected in the
database permanently, we need to follow the other steps too as discussed below.
Step 4&5: Logging
If the query actually
changes the data (i.e, if the operation is UPDATE), logging the changes made to
the data is an important step. This ensures the safety of data from abnormal
loss of data.
If an entry
of changes is written into the Log cache (which is actually one of several
caches handled in Buffer), it also written into the Log File which is available
in the disk.
Step 6: Actual Updating of Data in disk
Once the data
is successfully written into the log file, it is time to make the actual
changes and write the updated record into disk. The modified data which is
available in Buffer and not yet written to disk is called Dirty pages. When a
successful log write is completed, then these dirty pages are written to disk
and treated as clean pages.
As discussed
above, one of the major goals of a database system is to minimize the number of
disk accesses. Hence, it is decided by the Buffer Manager to retain the data in
the buffer according to several algorithms.
No comments:
Post a Comment