Primary Horizontal fragmentation in distributed database, example exercise for primary horizontal fragmentation, correctness of primary horizontal fragmentation, Simple predicates, min-term predicates
1. Horizontal Fragmentation:
A relation (table) is partitioned into multiple subsets horizontally using simple conditions.
Let us take a relation of schema Account(Acno, Balance, Branch_Name, Type). If the permitted values for Branch_Name
attribute are 'New Delhi', 'Chennai', and 'Mumbai', then the following
SQL query would fragment the bunch of tuples (records) satisfying a
simple condition.
SELECT * FROM account WHERE branch_name = 'Chennai';
This
query would get you all the records pertaining to the 'Chennai' branch,
without any changes in the schema of the table. We could get three such
bunch of records if we change the branch_name value in the WHERE clause
of the above query, one for 'Chennai', one for 'New Delhi', and one for
'Mumbai'.
This
way of horizontally slicing the whole table into multiple subsets
without altering the table structure is called Horizontal Fragmentation.
The concept is usually used to keep tuples (records) at the places
where they are used the most, to minimize data transfer between far
locations.
Horizontal Fragmentation has two variants as follows;
- Primary Horizontal Fragmentation (PHF)
- Derived Horizontal Fragmentation (DHF)
1.1 Primary Horizontal Fragmentation (PHF)
Primary Horizontal Fragmentation is about fragmenting a single table
horizontally (row wise) using a set of simple predicates (conditions).
What is simple predicate?
Given a table R with set of attributes [A1, A2, …, An], a simple predicate Pi can be expressed as follows;
Pi : Aj
θ Value
Where θ can be any of the symbols in the set {=, <, >, ≤, ≥, ≠},
value can be any value stored in the table for the attributed Ai.
For example, consider the following table Account
given in Figure 1;
Acno
|
Balance
|
Branch_Name
|
A101
|
5000
|
Mumbai
|
A103
|
10000
|
New Delhi
|
A104
|
2000
|
Chennai
|
A102
|
12000
|
Chennai
|
A110
|
6000
|
Mumbai
|
A115
|
6000
|
Mumbai
|
A120
|
2500
|
New Delhi
|
Figure 1:
Account table
For the above table, we could define any simple predicates like, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000 etc using the above expression “Aj θ Value”.
What is set of simple predicates?
Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as;
P = { P1,
P2, …, Pn}
Example 1
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000, then,
As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000, then,
Set of simple
predicates P1 = {Balance < 10000, Balance ≥ 10000}
Example 2
As another example, if simple conditions are, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000, then,
Set of simple predicates
P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000,
Balance ≥ 10000}
What is Min-term Predicate?
When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter the data.Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows (relational algebra expression);
Fragment, Ri
= σFi(R), 1 ≤ i ≤ n
where Fi is the set of simple predicates represented in
conjunctive normal form, otherwise called as Min-term predicate which can be
written as follows;
Min-term
predicate, Mi=P1 Λ P2 Λ P3 Λ … Λ Pn
Here, P1 means both P1 or ¬(P1), P2
means both P2 or ¬(P2), and so on. Using the conjunctive
form of various simple predicates in different combination, we can derive many
such min-term predicates.
For the example 1 stated previously, we can derive set of min-term predicates
using the rules stated above as follows;
We will get 2n min-term predicates, where n is the number of simple predicates in
the given predicate set. For P1, we have 2 simple predicates. Hence, we will
get 4 (22) possible combinations of min-term predicates as follows;
m1
= {Balance < 10000 Λ Balance ≥ 10000}
m2
= {Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3
= {¬(Balance < 10000) Λ Balance ≥ 10000}
m4
= {¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}
Our next step is to choose the min-term predicates which can satisfy
certain conditions to fragment a table, and eliminate the others which are not
useful. For example, the above set of min-term predicates can be applied each
as a formula Fi stated in the above rule for fragment Ri as follows;
Account1
= σBalance< 10000 Λ Balance ≥
10000(Account)
which can be written in equivalent SQL query as,
Account1
<-- SELECT * FROM account WHERE balance < 10000
AND balance ≥ 10000;
Account2 = σBalance< 10000 Λ ¬(Balance ≥ 10000)(Account)
which can be written in equivalent SQL query as,
Account2
<-- SELECT * FROM account WHERE balance < 10000
AND NOT balance ≥ 10000;
where NOT balance ≥ 10000 is equivalent to balance < 10000.
Account3 = σ¬(Balance< 10000) Λ Balance ≥ 10000(Account)
which can be written in equivalent SQL query as,
Account3
<-- SELECT * FROM account WHERE NOT balance <
10000 AND balance ≥ 10000;
where NOT balance < 10000 is equivalent to balance ≥ 10000.
Account4 = σ¬(Balance< 10000) Λ ¬(Balance ≥ 10000)(Account)
which can be written in equivalent SQL query as,
Account4
<-- SELECT * FROM account WHERE NOT balance <
10000 AND NOT balance ≥ 10000;
where NOT balance < 10000 is equivalent to balance ≥ 10000 and NOT balance ≥ 10000 is equivalent to balance
< 10000. This is exactly same as the query for fragment Account1.
From these examples, it is very clear that the first query for
fragment Account1 (min-term
predicate m1) is invalid as any record in a table cannot have two
values for any attribute in one record. That is, the condition (Balance
< 10000 Λ Balance ≥ 10000) requires that the value for balance must both
be less than 10000 and greater and equal to 10000, which is not possible. Hence
the condition violates and can be eliminated. For fragment Account2 (min-term predicate m2), the
condition is (balance<10000 and balance<10000) which ultimately means balance<10000
which is correct. Likewise, fragment Account3
is valid and Account4 must
be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment
the Account relation. The fragments
can be derived as follows for Account;
SELECT * FROM account WHERE balance < 10000;
Account2
Acno
|
Balance
|
Branch_Name
|
A101
|
5000
|
Mumbai
|
A104
|
2000
|
Chennai
|
A120
|
2500
|
New Delhi
|
A110
|
6000
|
Mumbai
|
A115
|
6000
|
Mumbai
|
SELECT * FROM account WHERE balance ≥ 10000;
Account3
Acno
|
Balance
|
Branch_Name
|
A103
|
10000
|
New Delhi
|
A102
|
12000
|
Chennai
|
Correctness of Fragmentation
We have chosen set of min-term predicates which would be used to
horizontally fragment a relation (table) into pieces. Now, our next step is to
validate the chosen fragments for their correctness. We need to verify did we
miss anything? We use the following rules to ensure that we have not changed
semantic information about the table which we fragment.
1. Completeness – If a relation R is fragmented into set of fragments,
then a tuple (record) of R must be found in any one or more of the fragments. This
rule ensures that we have not lost any records during fragmentation.
2. Reconstruction – After fragmenting a table, we must be able to reconstruct
it back to its original form without any data loss through some relational
operation. This rule ensures that we can construct a base table back from its
fragments without losing any information. That is, we can write any queries
involving the join of fragments to get the original relation back.
3. Disjointness – If a relation R is fragmented into a set of sub-tables
R1, R2, …, Rn, a record belongs to R1
is not found in any other sub-tables. This ensures that R1 ≠ R2.
For example, consider the Account
table in Figure 1 and its fragments Account2,
and Account3 created using
the min-term predicates we derived.
From the tables Account2,
and Account3 it is clear
that the fragmentation is Complete. That is, we have not missed any records. Just
all are included into one of the sub-tables.
When we use an operation, say Union between Account2, and Account3
we will be able to get the original relation Account.
(SELECT * FROM
account2) Union (SELECT * FROM account3);
The above
query will get us Account back
without loss of any information. Hence, the fragments created can be reconstructed.
Finally, if
we write a query as follows, we will get a Null set as output. It ensures that
the Disjointness property is satisfied.
(SELECT * FROM
account2) Intersect (SELECT * FROM account3);
We get a
null set as result for this query because, there is no record common in both
relations Account2 and Account3.
For the
example 2, recall the set of simple predicates which was as follows;
Set of simple
predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance <
10000, Balance ≥ 10000}
We can
derive the following min-term predicates;
m1 = { Branch_name
= ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ
Balance < 10000 Λ Balance ≥ 10000}
m2 = { Branch_name
= ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ
Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3 = { Branch_name
= ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ
¬(Balance < 10000) Λ Balance ≥ 10000}
m4 = { Branch_name
= ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ
Balance < 10000 Λ Balance ≥ 10000}
…
…
…
mn = { ¬(Branch_name
= ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ
¬(Balance < 10000) Λ ¬(Balance ≥ 10000)}
As in the previous example, out of 16 (24) min-term predicates, the set
of min-term predicates which are not valid should be eliminated. At last, we
would have the following set of valid min-term predicates.
m1 = { Branch_name
= ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ
¬(Balance < 10000) Λ Balance ≥ 10000}
m2 = { Branch_name
= ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ
Balance < 10000 Λ ¬(Balance ≥ 10000)}
m3 = { ¬(Branch_name
= ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ
¬(Balance < 10000) Λ Balance ≥ 10000}
m4 = { ¬(Branch_name
= ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ
Balance < 10000 Λ ¬(Balance ≥ 10000)}
m5 = { ¬(Branch_name
= ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ
¬(Balance < 10000) Λ Balance ≥ 10000}
m6 = { ¬(Branch_name
= ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ
Balance < 10000 Λ ¬(Balance ≥ 10000)}
The horizontal fragments using the above set of min-term predicates can
be generated as follows;
Fragment 1: SELECT * FROM
account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;
Fragment 2: SELECT * FROM
account WHERE branch_name = ‘Chennai’ AND balance < 10000;
Fragment 3: SELECT * FROM
account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;
Fragment 4: SELECT * FROM
account WHERE branch_name = ‘Mumbai’ AND balance < 10000;
The horizontal fragments using the above set of min-term predicates can
be generated as follows;
Fragment 1: SELECT * FROM
account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000;
Account1
Acno
|
Balance
|
Branch_Name
|
A102
|
12000
|
Chennai
|
Fragment 2: SELECT * FROM
account WHERE branch_name = ‘Chennai’ AND balance < 10000;
Account2
Acno
|
Balance
|
Branch_Name
|
A102
|
2000
|
Chennai
|
Fragment 3: SELECT * FROM
account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000;
Account3
Acno
|
Balance
|
Branch_Name
|
Fragment 4: SELECT * FROM
account WHERE branch_name = ‘Mumbai’ AND balance < 10000;
Account4
Acno
|
Balance
|
Branch_Name
|
A101
|
5000
|
Mumbai
|
A110
|
6000
|
Mumbai
|
A115
|
6000
|
Mumbai
|
In the ACCOUNT table we have the third branch ‘New Delhi’, which was
not specified in the set of simple predicates. Hence, in the fragmentation
process we must not leave the tuple with the value ‘New Delhi’. That is the
reason we have included the min-term predicates m5 and m6
which can be derived as follows;
Fragment 5: SELECT * FROM
account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND
balance ≥ 10000;
Account5
Acno
|
Balance
|
Branch_Name
|
A103
|
10000
|
New Delhi
|
Fragment 6: SELECT * FROM
account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND
balance < 10000;
Account6
Acno
|
Balance
|
Branch_Name
|
A120
|
2500
|
New Delhi
|
Correctness of fragmentation:
Completeness: The tuple of the table Account is distributed into different fragments. No records were
omitted. Otherwise, by performing the union operation between all the Account table fragments Account1, Account2, Account3, and Account4, we will be able to
get Account back without any
information loss. Hence, the above fragmentation is Complete.
Reconstruction: As said before, by performing Union operation between
all the fragments, we will be able to get the original table back. Hence, the fragmentation
is correct and the reconstruction property is satisfied.
Disjointness: When we perform Intersect operation between all the
above fragments, we will get null set as result, as we do not have any records
in common for all the fragments. Hence, disjointness property is satisfied.
Go to Distributed Database page
Go to Fragmentation page
Go to Derived Horizontal Fragmentation page
*************
Go to Distributed Database page
Go to Fragmentation page
Go to Derived Horizontal Fragmentation page
excellent
ReplyDeleteVery good nd hlpful
ReplyDeletesir please make a vedio about "derived horizontal fragmentation" if already make please provide me a link
ReplyDeleteUploaded. Here is the link.
ReplyDeletehttp://www.exploredatabase.com/2013/10/derived-horizontal-fragmentation-in-distributed-database.html
Youtube: https://youtu.be/3LUcLYOTWlU
thank you it is helpful
ReplyDelete