Sunday, February 2, 2014

Normalization

INDEX                                                                                   NEXT - First Normal Form (1NF)

What is Normalization? / Why do we need to normalize a table? / What are modification anomalies?

Why do we need to Normalize a table?


Introduction

A database which is designed using a model (Example – ER model) is perfect to some extent. This is true as for as the design is concerned. That is, it depends on the chosen attributes. But, there might be some hidden problems hidden in the design. One has to identify and remove those problems. Then only we could say that the design is good. The actual problems are due to the data stored or the dependency of the attribute over the other, etc. If we eliminate those problems, then we could say that the final design is perfect. Let us analyze the possible problems with the following table (designed using ER model) which is populated with sample data.

The following table (STUDENT) shows information about the students of an Engineering college. The student information like Registration Number (RegNo), Name(SName), Gender(Gen), Program Joined(PR), Course Number(CNo), Course Name(CName), Professor Name(PN), and Professor Office Address(POA) are stored in the table. Also assume the following;
1. Every course is offered by only one professor.
2. Student can register many courses (say maximum 6)
3. POA gives the office address of the professor which is also unique for every professor.
4. All the students’ information who joined for any program will be included in this table.

For every table, it is good to have a key to uniquely identify some information. The key for the table STUDENT is (RegNo, CNo), a composite primary key. Because, no single attribute can identify the records uniquely.

RegNo
SName
Gen
PR
CNo
CName
PN
POA
R1
Sundar
M
BTech
C101
Database
Kumar
CA101
R2
Ram
M
MS
C101
Database
Kumar
CA101
R3
Malini
F
MS
C101
Database
Kumar
CA101
R1
Sundar
M
BTech
C105
Data Structures
Praveen
CA107
R4
Kathik
M
MCA
C105
Data Structures
Praveen
CA107
R5
John
M
BSc
C104
Multimedia
Kesavan
CA103
Table 1 - STUDENT

This table looks error free according to its design, but has many hidden problems related to data stored. To identify those problems let us discuss about Modification Anomalies.

Modification Anomalies:


It is a set of problems caused due to the data manipulation in any table. Let us discuss one by one using the above sample tables STUDENT.

1. Insertion Anomaly

Insertion anomaly is the one which is caused during insertion of records into a table. In our sample table STUDENT, to insert information about any student he must have registered atleast one course, and the course must be offered by one professor. That is, if a student information is inserted without course information, according to integrity constraint violation (part of the Primary key is NULL – Cno is null), the record will not be accepted. The same is applicable for course data insertion without professor. This problem is called insertion anomaly.

2. Deletion Anomaly

Deletion of value of some attributes in a record leads to loss of some other important information too. In our table STUDENT, for some reasons, if the subject ‘Multimedia’ is canceled, then we lose the information about the student ‘John’ as well as the information about professor ‘Kesavan’. Also, withdrawal of a course registered by student ‘John’ will lead to loss of information. Such a inconvenience caused due to the deletion is deletion anomaly.

3. Updation Anomaly

An important statement regarding database and data is “Data redundancy leads to inconsistency”. It says if you have data repeated in one or more tables may take you to an inconsistent state while updating such information. In our table, student ‘Sundar’ has registered two courses. It leads to duplication of RegNo, SName, Gen, and PR values. Course ‘C101’ is registered by 3 students. If for any reason, the change of course name ‘Database’ to ‘Database Systems’ must be changed in all the 3 records. Failing which leads to inconsistent state. That is if one of the record is not changed the values ‘Database’ then we have two values for ‘C101’. If Sundar’s data is changed, it must be changed in all the records wherever ‘R1’ available.

Our table STUDENT has all the above said anomalies. To free the table from these anomalies we need to normalize the table using Normalization techniques. Various normal forms which are very basic and essential are discussed in the post Normal Forms.

INDEX                                                                                   NEXT - First Normal Form (1NF)

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents