Parallel Database Architecture
Today everybody interested in storing the information they have got. Even
small organizations collect data and maintain mega databases. Though the databases
eat space, they really helpful in many ways. For example, they are helpful in
taking decisions through a decision support system. To handle such a voluminous
data through conventional centralized system is bit complex. It means, even
simple queries are time consuming queries. The solution is to handle those
databases through Parallel Database Systems, where a table / database is
distributed among multiple processors possibly equally to perform the queries
in parallel. Such a system which share resources to handle massive data just to
increase the performance of the whole system is called Parallel Database
Systems.
We need certain architecture to handle the above said. That is, we
need architectures which can handle data through data distribution, parallel
query execution thereby produce good throughput of queries or Transactions. Figure
1, 2 and 3 shows the different architecture proposed and successfully
implemented in the area of Parallel Database systems. In the figures, P
represents Processors, M represents Memory, and D represents Disks/Disk setups.
1. Shared Memory Architecture
Figure 1 - Shared Memory Architecture |
In Shared Memory architecture, single memory is shared among many
processors as show in Figure 1. As shown in the figure, several processors are
connected through an interconnection network with Main memory and disk setup.
Here interconnection network is usually a high speed network (may be Bus, Mesh,
or Hypercube) which makes data sharing (transporting) easy among the various
components (Processor, Memory, and Disk).
Advantages:
- Simple implementation
- Establishes effective communication between processors through single memory addresses space.
- Above point leads to less communication overhead.
Disadvantages:
- Higher degree of parallelism (more number of concurrent operations in different processors) cannot be achieved due to the reason that all the processors share the same interconnection network to connect with memory. This causes Bottleneck in interconnection network (Interference), especially in the case of Bus interconnection network.
- Addition of processor would slow down the existing processors.
- Cache-coherency should be maintained. That is, if any processor tries to read the data used or modified by other processors, then we need to ensure that the data is of latest version.
- Degree of Parallelism is limited. More number of parallel processes might degrade the performance.
2. Shared Disk Architecture
Figure 2 - Shared Disk Architecture |
In Shared Disk architecture, single disk or single disk setup is
shared among all the available processors and also all the processors have
their own private memories as shown in Figure 2.
Advantages:
- Failure of any processors would not stop the entire system (Fault tolerance)
- Interconnection to the memory is not a bottleneck. (It was bottleneck in Shared Memory architecture)
- Support larger number of processors (when compared to Shared Memory architecture)
Disadvantages:
- Interconnection to the disk is bottleneck as all processors share common disk setup.
- Inter-processor communication is slow. The reason is, all the processors have their own memory. Hence, the communication between processors need reading of data from other processors’ memory which needs additional software support.
Example Real Time Shared Disk Implementation
- DEC clusters (VMScluster) running Rdb
3. Shared Nothing Architecture
Figure 3 - Shared Nothing Architecture |
In Shared Nothing architecture, every processor has its own memory and
disk setup. This setup may be considered as set of individual computers
connected through high speed interconnection network using regular network
protocols and switches for example to share data between computers. (This
architecture is used in the Distributed Database System). In Shared Nothing
parallel database system implementation, we insist the use of similar nodes
that are Homogenous systems. (In distributed database System we may use Heterogeneous
nodes)
Advantages:
- Number of processors used here is scalable. That is, the design is flexible to add more number of computers.
- Unlike in other two architectures, only the data request which cannot be answered by local processors need to be forwarded through interconnection network.
Disadvantages:
- Non-local disk accesses are costly. That is, if one server receives the request. If the required data not available, it must be routed to the server where the data is available. It is slightly complex.
- Communication cost involved in transporting data among computers.
Example Real Time Shared Nothing Implementation
- Teradata
- Tandem
- Oracle nCUBE
Nice Article !
ReplyDeleteThis is my pleasure to read your article.
Really this will help to people of Database Community.
I have also prepared one article about, What is Parallel Query Processing or Parallel Database System
You can also visit my article, your comments and reviews are most welcome.
http://www.dbrnd.com/2016/11/database-theory-what-is-parallel-query-processing-parallel-database-system/
Dear Mr.Anvesh, I have gone through your article recently (Sorry). Its simple and resourceful.
DeleteNice one
ReplyDeleteVery helpful..... Thanks
ReplyDeleteawsm
ReplyDeleteSimple and easy to understand. Great work.
ReplyDelete