Some
Keywords in Parallel Database Management System
Coarse Grain Parallel Machine
It is a parallel machine architecture in which parallelism is achieved
by connecting small number of powerful processors. Such a system is used for
executing more queries/transactions in parallel (in case of Interquery
Parallelism).
Degree of Parallelism
The number of resources (say processors) need to do certain things in
parallel without affecting (degrading) the performance of the parallel system.
Fine Grain Parallel Machine
It is a parallel machine architecture in which parallelism is achieved
by connecting more number of simple processors. Such a system is used for parallelizing
and executing a single query in parallel (in case of Intraquery Parallelism).
Typically, scientific queries or more complex queries are executed this way.
Hash Partitioning
In parallel database, Hash partitioning is one of the data
partitioning techniques (others, Round-robin and Range partitioning) primarily
used for distributing tuples (records) among the chosen disks using a hash
function. It works by using a hash function which is defined on an attribute or
set of attributes on which partitions need to be made. Hash function should be
designed to distribute the data into n disks which is finite.
Interference
Newly generated requests must compete with ongoing processes for
sharing the resources like system bus, disks, or locks that are held by the
ongoing processes. This activity might cause slow down in the overall
processes. This kind of intervention is called interference.
Linear Speed-up
If the speedup of a parallel database system is N when N times larger
system used for processing the requests handled by a smaller system is called
Linear Speed-up.
Pipelined parallelism
It is a form of parallelism where the output of one processor is
consumed as the input of other processor immediately after the first record of
the result is generated.
Range Partitioning
In parallel database, Range partitioning is one of the data
partitioning techniques (others, Hash and Round-robin partitioning) primarily
used for distributing tuples (records) among the chosen disks using set of
ranges. It works by choosing a range vector. The elements of range vector are
the values of the partitioning attribute. Range vectors are chosen based on the
number of disks available, which is finite.
Response
Time
The amount of time consumed by a single machine (parallel machines in the
case of parallel database) to finish a job since it is submitted for execution.
This is one of the two performance measures of a database management system
(the other one is Throughput).
Round-robin Partitioning
In parallel database, Round-robin partitioning is one of the data
partitioning techniques (others, Hash and Range partitioning) primarily used
for distributing tuples (records) among the chosen disks. It works by
distributing the records of a table into n different disks in round-robin
fashion like first record to first disk, second record to second disk and so on,
iteratively. Here, n, the number of disks is finite value.
Scale-UP
It is the execution of larger task using parallel implementation of more
resources in the same amount of time which was consumed by smaller task
(increasing the degree of parallelism). Scaleup means enhanced throughput.
Shared Disk Architecture
It is one of the parallel architecture in parallel database management
system. Many processors share a common disk/disk setup. Here, processors have
their own memory. That is, every processor has its private memory.
Shared Memory Architecture
Shared Memory Architecture is one of parallel architecture in parallel
database management system. Many processors share a common memory (RAM) and
disk setup in this architecture, which really helps in inter processor
communication.
Shared Nothing Architecture
This is yet another parallel architecture in parallel database
systems. Every processor has its own memory and disk which are connected through
high speed network. This is the architecture widely used in the case of
Distributed Database Systems.
Skew
Skew is the effect causes the performance degradation of parallel
system due to the uneven distribution of work load.
Speed-Up
It is the execution of a given task faster than before using parallel implementation
of more resources (increasing the degree of parallelism). Speedup means improved response
time.
Start-up Cost
In parallel database systems, the start-up cost is the cost associated
with the task (queries/transactions) initiation. That is, if a task is to be
done in parallel, then it has to be broken into pieces (equal) where every
piece of task can be done on different processors.
Sub-linear Speed-up
A parallel system cannot show linear speedup always. It is due to the
reasons like Skew, Interference, Startup and Assembling costs involved in
parallelizing the process. Hence, the speedup is going to be little lesser than
N (see Linear Speed-up). Such a speedup is called Sub-linear Speed-up.
Throughput
The number of jobs (jobs can be queries/transactions) that can be
completed in a given amount of time by a machine (a parallel machine in the
case of parallel database). This is one of the two performance measures of a
database management system (the other one is Response Time).