Primary horizontal fragmentation in distributed database

Primary Horizontal fragmentation in distributed database, example exercise for primary horizontal fragmentation, correctness of primary horizontal fragmentation, Simple predicates, min-term predicates

1. Horizontal Fragmentation:

A relation (table) is partitioned into multiple subsets horizontally using simple conditions.

Let us take a relation of schema Account(Acno, Balance, Branch_Name, Type). If the permitted values for Branch_Name attribute are 'New Delhi', 'Chennai', and 'Mumbai', then the following SQL query would fragment the bunch of tuples (records) satisfying a simple condition.

SELECT * FROM account WHERE branch_name = 'Chennai';

This query would get you all the records pertaining to the 'Chennai' branch, without any changes in the schema of the table. We could get three such bunch of records if we change the branch_name value in the WHERE clause of the above query, one for 'Chennai', one for 'New Delhi', and one for 'Mumbai'.

This way of horizontally slicing the whole table into multiple subsets without altering the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations.

Horizontal Fragmentation has two variants as follows;

Primary Horizontal Fragmentation (PHF)
Derived Horizontal Fragmentation (DHF)

1.1 Primary Horizontal Fragmentation (PHF)

Primary Horizontal Fragmentation is about fragmenting a single table horizontally (row wise) using a set of simple predicates (conditions).

What is simple predicate?

Given a table R with set of attributes [A₁, A₂, …, A_n], a simple predicate P_i can be expressed as follows;

P_i : A_j θ Value

Where θ can be any of the symbols in the set {=, <, >, ≤, ≥, ≠}, value can be any value stored in the table for the attributed A_i. For example, consider the following table Account given in Figure 1;

Acno	Balance	Branch_Name
A101	5000	Mumbai
A103	10000	New Delhi
A104	2000	Chennai
A102	12000	Chennai
A110	6000	Mumbai
A115	6000	Mumbai
A120	2500	New Delhi

Figure 1: Account table

For the above table, we could define any simple predicates like, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000 etc using the above expression “Aj θ Value”.

What is set of simple predicates?

Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as;

P = { P₁, P₂, …, P_n}

Example 1
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000, then,

Set of simple predicates P1 = {Balance < 10000, Balance ≥ 10000}

Example 2
As another example, if simple conditions are, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000, then,

Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000}

What is Min-term Predicate?

When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter the data.Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows (relational algebra expression);

Fragment, R_i = σ_Fi(R), 1 ≤ i ≤ n

where F_i is the set of simple predicates represented in conjunctive normal form, otherwise called as Min-term predicate which can be written as follows;

Min-term predicate, M_i=P₁ Λ P₂ Λ P₃ Λ … Λ P_n

Here, P₁ means both P₁ or ¬(P₁), P₂ means both P₂ or ¬(P₂), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates.

For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows;

We will get 2ⁿ min-term predicates, where n is the number of simple predicates in the given predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (2²) possible combinations of min-term predicates as follows;

m₁ = {Balance < 10000 Λ Balance ≥ 10000}

m₂= {Balance < 10000 Λ ¬(Balance ≥ 10000)}

m₃= {¬(Balance < 10000) Λ Balance ≥ 10000}

m₄ = {¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}

Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table, and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;

Account₁ = σ_{Balance< 10000 Λ Balance ≥
10000}(Account)

which can be written in equivalent SQL query as,

Account₁ <-- SELECT * FROM account WHERE balance < 10000 AND balance ≥ 10000;

Account₂ = σ_{Balance< 10000 Λ ¬(Balance ≥
10000)}(Account)

which can be written in equivalent SQL query as,

Account₂ <-- SELECT * FROM account WHERE balance < 10000 AND NOT balance ≥ 10000;

where NOT balance ≥ 10000 is equivalent to balance < 10000.

Account₃ = σ_{¬(Balance< 10000) Λ Balance ≥
10000}(Account)

which can be written in equivalent SQL query as,

Account₃ <-- SELECT * FROM account WHERE NOT balance < 10000 AND balance ≥ 10000;

where NOT balance < 10000 is equivalent to balance ≥ 10000.

Account₄ = σ_{¬(Balance< 10000) Λ ¬(Balance
≥ 10000)}(Account)

which can be written in equivalent SQL query as,

Account₄ <-- SELECT * FROM account WHERE NOT balance < 10000 AND NOT balance ≥ 10000;

where NOT balance < 10000 is equivalent to balance ≥ 10000 and NOT balance ≥ 10000 is equivalent to balance < 10000. This is exactly same as the query for fragment Account₁.

From these examples, it is very clear that the first query for fragment Account₁ (min-term predicate m₁) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Balance < 10000 Λ Balance ≥ 10000) requires that the value for balance must both be less than 10000 and greater and equal to 10000, which is not possible. Hence the condition violates and can be eliminated. For fragment Account₂ (min-term predicate m₂), the condition is (balance<10000 and balance<10000) which ultimately means balance<10000 which is correct. Likewise, fragment Account₃ is valid and Account₄ must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment the Account relation. The fragments can be derived as follows for Account;

SELECT * FROM account WHERE balance < 10000;

Account₂

Acno	Balance	Branch_Name
A101	5000	Mumbai
A104	2000	Chennai
A120	2500	New Delhi
A110	6000	Mumbai
A115	6000	Mumbai

SELECT * FROM account WHERE balance ≥ 10000;

Account₃

Acno	Balance	Branch_Name
A103	10000	New Delhi
A102	12000	Chennai

Correctness of Fragmentation

We have chosen set of min-term predicates which would be used to horizontally fragment a relation (table) into pieces. Now, our next step is to validate the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment.

1. Completeness – If a relation R is fragmented into set of fragments, then a tuple (record) of R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during fragmentation.

2. Reconstruction – After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back.

3. Disjointness – If a relation R is fragmented into a set of sub-tables R₁, R₂, …, R_n, a record belongs to R₁ is not found in any other sub-tables. This ensures that R₁ ≠ R₂.

For example, consider the Account table in Figure 1 and its fragments Account₂, and Account₃ created using the min-term predicates we derived.

From the tables Account₂, and Account₃ it is clear that the fragmentation is Complete. That is, we have not missed any records. Just all are included into one of the sub-tables.

When we use an operation, say Union between Account₂, and Account₃ we will be able to get the original relation Account.

(SELECT * FROM account2) Union (SELECT * FROM account3);

The above query will get us Account back without loss of any information. Hence, the fragments created can be reconstructed.

Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied.

(SELECT * FROM account2) Intersect (SELECT * FROM account3);

We get a null set as result for this query because, there is no record common in both relations Account₂ and Account₃.

For the example 2, recall the set of simple predicates which was as follows;

Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000}

We can derive the following min-term predicates;

m₁ = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ Balance ≥ 10000}

m₂ = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}

m₃ = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000}

m₄ = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ Balance ≥ 10000}

…

m_n = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}

As in the previous example, out of 16 (2⁴) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid min-term predicates.

m₁ = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000}

m₂ = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}

m₃ = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000}

m₄ = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}

m₅ = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000}

m₆ = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)}

The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;

Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;

Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;

Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;

The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;

Account₁

Acno	Balance	Branch_Name
A102	12000	Chennai

Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000;

Account₂

Acno	Balance	Branch_Name
A102	2000	Chennai

Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;

Account₃

Acno	Balance	Branch_Name

Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000;

Account₄

Acno	Balance	Branch_Name
A101	5000	Mumbai
A110	6000	Mumbai
A115	6000	Mumbai

In the ACCOUNT table we have the third branch ‘New Delhi’, which was not specified in the set of simple predicates. Hence, in the fragmentation process we must not leave the tuple with the value ‘New Delhi’. That is the reason we have included the min-term predicates m₅ and m₆ which can be derived as follows;

Fragment 5: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance ≥ 10000;

Account₅

Acno	Balance	Branch_Name
A103	10000	New Delhi

Fragment 6: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance < 10000;

Account₆

Acno	Balance	Branch_Name
A120	2500	New Delhi

Correctness of fragmentation:

Completeness: The tuple of the table Account is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Account table fragments Account₁, Account₂, Account₃, and Account₄, we will be able to get Account back without any information loss. Hence, the above fragmentation is Complete.

Reconstruction: As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied.

Disjointness: When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. Hence, disjointness property is satisfied.

*************

Go to Distributed Database page

Go to Fragmentation page

Go to Derived Horizontal Fragmentation page

5 comments:

UnknownJune 24, 2019 at 10:26 PM
excellent
UnknownNovember 28, 2019 at 6:42 AM
Very good nd hlpful
MuhammadApril 1, 2020 at 2:27 AM
sir please make a vedio about "derived horizontal fragmentation" if already make please provide me a link
Saravanakumar KandasamyApril 1, 2020 at 11:41 AM
Uploaded. Here is the link.
http://www.exploredatabase.com/2013/10/derived-horizontal-fragmentation-in-distributed-database.html

Youtube: https://youtu.be/3LUcLYOTWlU
UnknownApril 28, 2020 at 11:08 PM
thank you it is helpful

TOPICS (Click to Navigate)

Friday, March 24, 2017