↑ INDEX NEXT - First Normal Form (1NF) →
A database which is designed using a model (Example – ER model) is perfect to some extent. This is true as for as the design is concerned. That is, it depends on the chosen attributes. But, there might be some hidden problems hidden in the design. One has to identify and remove those problems. Then only we could say that the design is good. The actual problems are due to the data stored or the dependency of the attribute over the other, etc. If we eliminate those problems, then we could say that the final design is perfect. Let us analyze the possible problems with the following table (designed using ER model) which is populated with sample data.
This table looks error free according to its design, but has many hidden problems related to data stored. To identify those problems let us discuss about Modification Anomalies.
What is Normalization? / Why do we need to normalize a table? / What are modification anomalies?
Why do we need to Normalize a table?
Introduction
A database which is designed using a model (Example – ER model) is perfect to some extent. This is true as for as the design is concerned. That is, it depends on the chosen attributes. But, there might be some hidden problems hidden in the design. One has to identify and remove those problems. Then only we could say that the design is good. The actual problems are due to the data stored or the dependency of the attribute over the other, etc. If we eliminate those problems, then we could say that the final design is perfect. Let us analyze the possible problems with the following table (designed using ER model) which is populated with sample data.
The following table (STUDENT)
shows information about the students of an Engineering college. The student
information like Registration Number (RegNo), Name(SName), Gender(Gen), Program
Joined(PR), Course Number(CNo), Course Name(CName), Professor Name(PN), and Professor
Office Address(POA) are stored in the table. Also assume the following;
1. Every course is offered by only one professor.
2. Student can register many courses (say maximum
6)
3. POA gives the office address of the professor
which is also unique for every professor.
4. All the students’ information who joined for any
program will be included in this table.
For every table, it is good to
have a key to uniquely identify some information. The key for the table STUDENT
is (RegNo, CNo), a composite primary key. Because, no single attribute can
identify the records uniquely.
RegNo
|
SName
|
Gen
|
PR
|
CNo
|
CName
|
PN
|
POA
|
R1
|
Sundar
|
M
|
BTech
|
C101
|
Database
|
Kumar
|
CA101
|
R2
|
Ram
|
M
|
MS
|
C101
|
Database
|
Kumar
|
CA101
|
R3
|
Malini
|
F
|
MS
|
C101
|
Database
|
Kumar
|
CA101
|
R1
|
Sundar
|
M
|
BTech
|
C105
|
Data Structures
|
Praveen
|
CA107
|
R4
|
Kathik
|
M
|
MCA
|
C105
|
Data Structures
|
Praveen
|
CA107
|
R5
|
John
|
M
|
BSc
|
C104
|
Multimedia
|
Kesavan
|
CA103
|
Table 1 - STUDENT
This table looks error free according to its design, but has many hidden problems related to data stored. To identify those problems let us discuss about Modification Anomalies.
Modification Anomalies:
It is a set of problems caused
due to the data manipulation in any table. Let us discuss one by one using the
above sample tables STUDENT.
1. Insertion Anomaly
Insertion anomaly is the one
which is caused during insertion of records into a table. In our sample table
STUDENT, to insert information about any student he must have registered
atleast one course, and the course must be offered by one professor. That is,
if a student information is inserted without course information, according to
integrity constraint violation (part of the Primary key is NULL – Cno is null),
the record will not be accepted. The same is applicable for course data
insertion without professor. This problem is called insertion anomaly.
2. Deletion Anomaly
Deletion of value of some
attributes in a record leads to loss of some other important information too.
In our table STUDENT, for some reasons, if the subject ‘Multimedia’ is
canceled, then we lose the information about the student ‘John’ as well as the
information about professor ‘Kesavan’. Also, withdrawal of a course registered
by student ‘John’ will lead to loss of information. Such a inconvenience caused
due to the deletion is deletion anomaly.
3. Updation Anomaly
An important statement regarding
database and data is “Data redundancy leads to inconsistency”. It says if you
have data repeated in one or more tables may take you to an inconsistent state
while updating such information. In our table, student ‘Sundar’ has registered
two courses. It leads to duplication of RegNo, SName, Gen, and PR values. Course
‘C101’ is registered by 3 students. If for any reason, the change of course
name ‘Database’ to ‘Database Systems’ must be changed in all the 3 records.
Failing which leads to inconsistent state. That is if one of the record is not
changed the values ‘Database’ then we have two values for ‘C101’. If Sundar’s
data is changed, it must be changed in all the records wherever ‘R1’ available.
Our table STUDENT has all the
above said anomalies. To free the table from these anomalies we need to
normalize the table using Normalization techniques. Various normal forms which
are very basic and essential are discussed in the post Normal Forms.
↑ INDEX NEXT - First Normal Form (1NF) →
↑ INDEX NEXT - First Normal Form (1NF) →