TOPICS (Click to Navigate)

Pages

Friday, March 24, 2017

Primary horizontal fragmentation in distributed database

Primary Horizontal fragmentation in distributed database, example exercise for primary horizontal fragmentation, correctness of primary horizontal fragmentation, Simple predicates, min-term predicates



1. Horizontal Fragmentation:


A relation (table) is partitioned into multiple subsets horizontally using simple conditions.

Let us take a relation of schema Account(Acno, Balance, Branch_Name, Type). If the permitted values for Branch_Name attribute are 'New Delhi', 'Chennai', and 'Mumbai', then the following SQL query would fragment the bunch of tuples (records) satisfying a simple condition.

SELECT * FROM account WHERE branch_name = 'Chennai';

This query would get you all the records pertaining to the 'Chennai' branch, without any changes in the schema of the table. We could get three such bunch of records if we change the branch_name value in the WHERE clause of the above query, one for 'Chennai', one for 'New Delhi', and one for 'Mumbai'.

This way of horizontally slicing the whole table into multiple subsets without altering the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations. 

Horizontal Fragmentation has two variants as follows;
  1. Primary Horizontal Fragmentation (PHF)
  2. Derived Horizontal Fragmentation (DHF)

1.1 Primary Horizontal Fragmentation (PHF)



Primary Horizontal Fragmentation is about fragmenting a single table horizontally (row wise) using a set of simple predicates (conditions).

What is simple predicate?

Given a table R with set of attributes [A1, A2, …, An], a simple predicate Pi can be expressed as follows;
Pi : Aj θ Value
Where θ can be any of the symbols in the set {=, <, >, ≤, ≥, ≠}, value can be any value stored in the table for the attributed Ai. For example, consider the following table Account given in Figure 1;
Acno
Balance
Branch_Name
A101
5000
Mumbai
A103
10000
New Delhi
A104
2000
Chennai
A102
12000
Chennai
A110
6000
Mumbai
A115
6000
Mumbai
A120
2500
New Delhi
Figure 1: Account table

For the above table, we could define any simple predicates like, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000 etc using the above expression “Aj θ Value”.



What is set of simple predicates?

Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as;
P = { P1, P2, …, Pn}
Example 1
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000, then,
Set of simple predicates P1 = {Balance < 10000, Balance ≥ 10000}

Example 2
As another example, if simple conditions are, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000, then,
Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000}

What is Min-term Predicate?

When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter the data.Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows (relational algebra expression);
Fragment, Ri = σFi(R), 1 ≤ i ≤ n
where Fi is the set of simple predicates represented in conjunctive normal form, otherwise called as Min-term predicate which can be written as follows;
Min-term predicate, Mi=P1 Λ P2 Λ P3 Λ … Λ Pn
Here, P1 means both P1 or ¬(P1), P2 means both P2 or ¬(P2), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates.

For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows;
We will get 2n min-term predicates, where n is the number of simple predicates in the given predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (22) possible combinations of min-term predicates as follows;

m1 = {Balance < 10000 Λ Balance ≥ 10000}
m2 = {Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3 = {¬(Balance < 10000) Λ Balance ≥ 10000}
m4 = {¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}

Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table, and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;
Account1 = σBalance< 10000 Λ Balance ≥ 10000(Account)
which can be written in equivalent SQL query as,
Account1 <-- SELECT * FROM account WHERE balance < 10000 AND balance ≥ 10000;

Account2 = σBalance< 10000 Λ ¬(Balance ≥ 10000)(Account)
which can be written in equivalent SQL query as,
Account2 <-- SELECT * FROM account WHERE balance < 10000 AND NOT balance ≥ 10000;
where NOT balance ≥ 10000 is equivalent to balance < 10000.

Account3 = σ¬(Balance< 10000) Λ Balance ≥ 10000(Account)
which can be written in equivalent SQL query as,
Account3 <-- SELECT * FROM account WHERE NOT balance < 10000 AND balance ≥ 10000;
where NOT balance < 10000 is equivalent to balance ≥ 10000.

Account4 = σ¬(Balance< 10000) Λ ¬(Balance ≥ 10000)(Account)
which can be written in equivalent SQL query as,
Account4 <-- SELECT * FROM account WHERE NOT balance < 10000 AND NOT balance ≥ 10000;
where NOT balance < 10000 is equivalent to balance ≥ 10000 and NOT balance ≥ 10000 is equivalent to balance < 10000. This is exactly same as the query for fragment Account1.

From these examples, it is very clear that the first query for fragment Account1 (min-term predicate m1) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Balance < 10000 Λ Balance ≥ 10000) requires that the value for balance must both be less than 10000 and greater and equal to 10000, which is not possible. Hence the condition violates and can be eliminated. For fragment Account2 (min-term predicate m2), the condition is (balance<10000 and balance<10000) which ultimately means balance<10000 which is correct. Likewise, fragment Account3 is valid and Account4 must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment the Account relation. The fragments can be derived as follows for Account;

SELECT * FROM account WHERE balance < 10000;


Account2
Acno
Balance
Branch_Name
A101
5000
Mumbai
A104
2000
Chennai
A120
2500
New Delhi
A110
6000
Mumbai
A115
6000
Mumbai


SELECT * FROM account WHERE balance ≥ 10000;


Account3
Acno
Balance
Branch_Name
A103
10000
New Delhi
A102
12000
Chennai



Correctness of Fragmentation

We have chosen set of min-term predicates which would be used to horizontally fragment a relation (table) into pieces. Now, our next step is to validate the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment.
1. Completeness – If a relation R is fragmented into set of fragments, then a tuple (record) of R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during fragmentation.
2. Reconstruction – After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back.
3. Disjointness – If a relation R is fragmented into a set of sub-tables R1, R2, …, Rn, a record belongs to R1 is not found in any other sub-tables. This ensures that R1 ≠ R2.

For example, consider the Account table in Figure 1 and its fragments Account2, and Account3 created using the min-term predicates we derived.
From the tables Account2, and Account3 it is clear that the fragmentation is Complete. That is, we have not missed any records. Just all are included into one of the sub-tables.
When we use an operation, say Union between Account2, and Account3 we will be able to get the original relation Account.

(SELECT * FROM account2) Union (SELECT * FROM account3);

The above query will get us Account back without loss of any information. Hence, the fragments created can be reconstructed.
Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied.

(SELECT * FROM account2) Intersect (SELECT * FROM account3);

We get a null set as result for this query because, there is no record common in both relations Account2 and Account3.



For the example 2, recall the set of simple predicates which was as follows;

Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000}

We can derive the following min-term predicates;

m1 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ Balance ≥ 10000}
m2 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000}
m4 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ Balance ≥ 10000}
mn = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}

As in the previous example, out of 16 (24) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid min-term predicates.
m1 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000}
m2 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3 = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000}
m4 = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}


m5 = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000}

m6 = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}
 
The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;
Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;
Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;
Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;



The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;
Account1
Acno
Balance
Branch_Name
A102
12000
Chennai

Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;
Account2
Acno
Balance
Branch_Name
A102
2000
Chennai

Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;

Account3
Acno
Balance
Branch_Name




Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;
Account4
Acno
Balance
Branch_Name
A101
5000
Mumbai
A110
6000
Mumbai
A115
6000
Mumbai



In the ACCOUNT table we have the third branch ‘New Delhi’, which was not specified in the set of simple predicates. Hence, in the fragmentation process we must not leave the tuple with the value ‘New Delhi’. That is the reason we have included the min-term predicates m5 and m6 which can be derived as follows;

Fragment 5: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance ≥ 10000;
Account5
Acno
Balance
Branch_Name
A103
10000
New Delhi

Fragment 6: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance < 10000;
Account6
Acno
Balance
Branch_Name
A120
2500
New Delhi

Correctness of fragmentation:

Completeness: The tuple of the table Account is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Account table fragments Account1, Account2, Account3, and Account4, we will be able to get Account back without any information loss. Hence, the above fragmentation is Complete.

Reconstruction: As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied.

Disjointness: When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. Hence, disjointness property is satisfied.

*************

Go to Distributed Database page

Go to Fragmentation page

Go to Derived Horizontal Fragmentation page











5 comments:

  1. sir please make a vedio about "derived horizontal fragmentation" if already make please provide me a link

    ReplyDelete
  2. Uploaded. Here is the link.
    http://www.exploredatabase.com/2013/10/derived-horizontal-fragmentation-in-distributed-database.html

    Youtube: https://youtu.be/3LUcLYOTWlU

    ReplyDelete