Showing posts with label Normal Forms. Show all posts
Showing posts with label Normal Forms. Show all posts

Monday, February 17, 2025

First Normal Form (1NF)

First normal form, how to normalize an unnormalized table into 1NF table, problem with non-atomic attributes, convert an UNF relation into 1NF relation


Prev - Normalization                                                                     Next - Functional Dependency (FD) 

First Normal Form (1NF) with example

Normal Forms

The idea is to organize the attributes in any tables to have reduced redundancy and dependency. In the process of Normalization we have the normal forms, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF). Here 1NF, 2NF, and 3NF were defined by Edgar F.Codd and BCNF was defined by Raymond F.Boyce along with Codd. There are other normal forms like 4NF, 5NF, DKNF, and 6NF. Here, we are about to discuss first four normal forms.

In the normalization process of a table, we need to check whether a table satisfies the conditions of every normal form. If yes, we could say the given table is in that normal form. That is, if a table satisfies the conditions of 1NF, then we say that the table is in 1NF, and so on.

First Normal Form (1NF)


In Relational Database Management System (RDBMS), 1NF is one of the properties a relation (table) must satisfy. 1NF requires the following properties to be satisfied.

  •     Any attribute (column) of a table must have been designed to accept only atomic values.
  • Any value stored in any column must have single value (no repeating groups).


Let us take the following relation (table) STUDENT1 to explain the property 1NF. In STUDENT1, PR is Program Registered, and CName_Regd is Name of the course registered by the student.


RegNo
SName
Gen
PR
CName_Regd
R1
Sundar
M
BTech
Database, Data Structures
R2
Ram
M
MS
Database
R3
Kathik
M
MCA
Data Structures, Multimedia
R4
John
M
BSc
Multimedia
Table 1 – STUDENT1


Rule 1 says that the columns can have atomic domains only. That is, the permitted/accepted values for the column should be indivisible. Here indivisible means that if we try to divide a value stored in the column in any possible way, it should not give a meaningful string. Let us take the table STUDENT1. In this table, the CName_Regd attribute is accepts more than one course if registered by single student. And the value is inserted using comma as the separator for every individual course. Student “Sundar”, registered for two courses, namely, “Database”, and “Data Structures”. And they are stored as a single value (string) “Database, Data Structures”. If we divide the value at the comma, we will get two meaningful names “Database”, and “Data Structures”. That is, both are valid course names. Hence, CName_Regd is divisible.

The domain of Gen (Gender) is indivisible, i.e., atomic. We cannot divide the value stored in the column further. RegNo values cannot be divided further; PR values cannot be divided further, and so on.

Rule 2 insists that the column must not have multiple values. In our example, CName_Regd column accepts set of courses (i.e., group of one or more values) registered by a student for every record. And, if more than one course is registered, the course names are stored as single value separated by a comma (or any separators as decided by the designer). Like said above, the value is divisible.
Our table STUDENT1 does not satisfy both the conditions. Hence, it is not in 1NF. (If any one of the rules were not satisfied, then also the table not in 1NF). To further normalize the table into 2NF, 3NF, and so on, we need to convert this design into 1NF.

Solution to convert into 1NF:


To convert non-1NF design into 1NF, “unnest” (i.e., flatten the relation) the given relation.

In STUDENT1, CName_Regd is the only attribute which violates 1NF. So, it can be unnested. That is, for every individual course of any student, repeat the content of all the other column values for that student as given in the table STUDENT_1NF below.


RegNo
SName
Gen
PR
CName_Regd
R1
Sundar
M
BTech
Database
R1
Sundar
M
BTech
Data Structures
R2
Ram
M
MS
Database
R3
Kathik
M
MCA
Data Structures
R3
Kathik
M
MCA
Multimedia
R4
John
M
BSc
Multimedia
Table 2 – STUDENT_1NF


Note that, the course “Data Structures” has two substrings. If we try to divide “Data Structures” further like “Data” and “Structures”, this division, even though give meaningful strings “Data” and “Structures”, these are not courses offered.
Hence, this kind of division does not mean ‘divisible attribute’.


What would happen if any attribute is divisible?

Let us table Table1 to explain this. For queries which require list of courses for a given register number is straight forward. That is, we get the list of course names just by using the register number in the WHERE clause of the query. But, when the requirement is, “Find the list of student register numbers whoever registered for the course Database”, it is a complex scanning of the attribute for the result. Because, it involves parsing of every value present in the CName_Regd column and it is not simple.

When we normalize the table, both the requirements can be handled easily without any string analysis.

For other simple definition on First Normal Form (1NF), click here.

Prev - Normalization                                                                     Next - Functional Dependency (FD)


*********

Related Links




First normal form 1NF in DBMS with example



 

Second Norml Form (2NF) with Simple Example


Second Norml Form (2NF) with Simple Example



Second Normal Form (2NF)


In Second Normal Form (2NF), the normalization process highly depends on the following;

  • Keys
  • Functional Dependency, and
  • Decomposition.
For a table to be in 2NF, it has to satisfy the following set of properties;

Property 1: The table should satisfy all the properties of previous normal form, i.e., 1NF. In other words, the table should be in 1NF..
Property 2: There should not be any partial key dependencies. This property is not applicable for relations (tables) which have single simple attribute as Primary Key. Because, single attribute primary key means that all the other attributes can be determined by the Prime attribute. Hence, property 2 is applicable for relations those have more than one attribute combination as Primary Key (i.e., Composite Key). Because, here is the possibility of some of the attributes of the table depend on any one or all of the attributes of the composite primary key.

Let us consider the table 1 STUDENT given below for our discussions.

[Assumptions on the following table STUDENT: student can register many courses, and every course is taught by exactly one instructor]               
RegNo
SName
Gen
PR
Phone
Courses
CInst
R1
Sundar
M
BTech
9898786756
Database
Kumar
R2
Ram
M
MS
9897786776
Database
Kumar
R3
Karthik
M
MCA
8798987867
Data Structures
Steve
R4
John
M
BSc
7898886756
Multimedia
Badrinath
R1
Sundar
M
BTech
9898786756
Data Structures
Steve
R3
Karthik
M
MCA
8798987867
Multimedia
Badrinath
Table 1 - STUDENT

For this table, we can write all the possible Functional Dependencies as follows;
RegNo SName
RegNo Gen
RegNo PR
RegNo Phone
Courses CInst
CInst Courses
RegNo Courses CInst
RegNo Cinst → Courses

Or, the first four FDs can also be written as,

RegNo SName Gen PR Phone                                       
                                                               
It means, wherever you give a register number, (for example, ‘R1’), it will always show one SName value (for ‘R1’, SName is ‘Sundar’), one Gen value (for ‘R1’, Gen is ‘M’), one PR value (‘R1’ is doing ‘BTech’), and one Phone ( 9898786756 is the phone number of ‘R1’).

But, attributes Courses and CInst (Course instructors) cannot be included in this list. Because, RegNo cannot uniquely identify either Courses or CInst. The reason is some of the students registered more than one course (for example ‘R1’ registered ‘Database’, and ‘Data Structures’ courses).

As a whole, all the FDs can be collectively written with the key for this relation as,
                                          RegNo Courses SName Gen PR Phone CInst                                      

For the table STUDENT, according to the above said FD, the key is (RegNo, Courses) or (RegNo, CInst). But, both the keys are showing Partial Dependencies. That is, to determine the attributes SName, Gen, PR, and Phone, the whole key is not required. Likewise, to determine the attributes Courses or CInst, the whole key is not required. Hence, the design clearly shows Partial Key Dependency.

How do we convert an un-normalized table into Second Normal Form (2NF)?

The following steps to be followed;
  1. Find the possible Primary Key for the table in question
  2. If Primary key is composite, then find the individual Functional Dependencies (FDs) of every attribute of composite key.
  3. If all or some of the composite key individually identify other attributes, then Decompose (break into different tables) the table in question into two or more tables.
Steps explained with example;
  1. As we have already discussed, for the table STUDENT given in table 1, the Primary key is the Composite Key. The key is (RegNo, Courses) which is a composite key.
  2. As Primary key is the composite for this relation, we need to find individual FDs for all the composite attributes, i.e., FDs of RegNo and Courses. The attributes Courses and RegNo have  following FDs;  
    • Courses CInst

      RegNo SName
      RegNo Gen
      RegNo PR
      RegNo Phone
       
  3. Thus, attribute Course determines the attribute CInst uniquely. Attribute RegNo determines the attributes SName, Gen, PR, and Phone uniquely. Both RegNo and Courses show partial key dependencies and thus violates Property 2. Hence, we need decomposition.

How do we decompose the un-normalized relation?

We need to identify the set of attributes which can be determined by individual or composite attributes of the Composite Primary Key. In its simplest form, we break the table in question into multiple tables based on the determination of other attributes by the attributes in the primary key. That is, create a new table for every prime attribute of the composite key (partial key of whole key) and its dependent attributes.

In our example, we can break STUDENT into two tables. Because, RegNo determines one set of attributes, and Courses determines other set. Hence, we would get the following conceptual schemas.

STUDENT(RegNo, SName, Gen, PR, Phone) and
COURSES(Courses, CInst)

Let us visualize these tables.

RegNo
SName
Gen
PR
Phone
R1
Sundar
M
BTech
9898786756
R2
Ram
M
MS
9897786776
R3
Karthik
M
MCA
8798987867
R4
John
M
BSc
7898886756
Table 2: STUDENT

Courses
CInst
Database
Kumar
Data Structures
Steve
Multimedia
Badrinath
TABLE 3: COURSES

Look at those tables carefully. Table 2 STUDENT has no repeating groups. Hence it is in 1NF. The only primary key is RegNo which is single simple attribute. It satisfies the condition of 2NF. Hence it is in 2NF. Table 2 COURSES has no repeating groups as well, has one FD where singe attribute Courses is the Primary Key. Hence Courses is also in 2NF.

Though the tables are in 2NF, they show a problem. The problem is both table are not related, which leads to some of the following questions can be answered and some cannot be.

Q1 - we can answer student information if we have RegNo,
Q2 - we can have course instructor if we know the course name.
Q3 - we cannot answer the courses registered by any student which is very vital part of the design.

The solution is to include a Foreign key in any one of the tables to establish a link between these tables or create a new table for establishing links.

On deciding which of the table’s attribute can be taken as Foreign Key in other table, we may use mapping cardinality ratios. In our example, the relationship between STUDENT and COURSES is Many-To-Many. That is, one student can register many courses and a course can have one or more students. So, we need to create a third table to establish a link as follows (where the duplication accepted);

RegNo
Courses
R1
Database
R2
Database
R3
Data Structures
R4
Multimedia
R1
Data Structures
R3
Multimedia
Table 4: STU_COURSES

At the end, we have three tables (STUDENT, COURSES, and STU_COURSES) as part of the normalization process of table 1 STUDENT with the following schemas (Primary keys underlined).

STUDENT(RegNo, SName, Gen, PR, Phone) and
COURSES(Courses, CInst)
STU_COURSES(RegNo, Courses) – only this table shows minimal redundancy

Some of the advantages and disadvantages of the above normalization:
Advantages:
1.       As in table 1 STUDENT, there is no multiple repeated records of students in table 2 STUDENT and no multiple repeated records for courses in table 3 COURSES.
2.       Space is reduced.
3.       No redundancy, so no inconsistency

Disadvantages:
1.       A new table need to be created
2.       To answer Q3 mentioned above, we need to join all the three tables. If Q3 is one of the frequent queries used in the application, then it makes trouble. That is, it needs more system resources.
For other simple definition on Second Normal Form (2NF), click here


Visit here: Steps to decompose non-2NF into 2NF relation.


Related Links



Second normal form 2NF in DBMS explained with example

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents