As we all know, normalization is a process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity. Normalization is also the process of simplifying the design of a database so that it achieves the optimum structure. It reduces and eliminates redundant data. In normalization, data integrity is assured. It was first proposed by Dr. Edgar F. Codd, as an integral part of a relational model.
And we already know that normalization gives us many advantages and benefits, such as:
- Make the Database Smaller by eleminating redundant data, by doing this the data will be easier to manage and saves us more space of storage.
- Better Performance because the smaller the data, the faster i tis to be processed.
- Delete anomalies that will cause an error in the systems, etc.
However, sometimes a normalized databaes does not provide maximum processing efficiency because there are too many joined tables that only provide data codes (identifier key) not the real data, so the system need to do further lookout in order to find the real data.
There are a few drawbacks in normalization :
- Creating a longer task, because there are more tables to join, the need to join those tables increases and the task become more tedious (longer and slower). The database become harder to realize as well.
- Tables will contain codes rather than real data as the repeated data will be stored as lines of codes rather than the true data. Therefore, there is always a need to go to the lookup table, thus the system will be slower to perform.
- Making querry more difficult, because it consists of an SQL that is constructed dynamically and is usually constructed by desktop friendly query tools, hence it is hard to model the database without knowing what the customers desires.
- Requires Detailed Analysis and Design, Normalizing a database is a complex and difficult task, because analyst have to know the purpose of the database, such as whether it should it be optimized for reading data, writing data or both, also affects how it is normalized. A poorly normalized database may perform badly and store data inefficiently.
Denormalization is a strategy used on a previously-normalized database to increase performance. The idea behind it is to add redundant data where we think it will help us the most.It is a process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data.
Denormalization refers to a refinement to relational schema such that the degree of
normalization for a modified relation is less than the degree of at least one of the original relations.
It refers to situations where two relations are combined into one new relation, which is still normalized but contains more nulls than original relations.
Things that needs to be considered when doing denormalization :
- Makes implementation more complex, because there will be more redundant data and anomalies that will effect the implementation process.
- Often sacrifices flexibility.
- May speed up retrievals but it slows down updates, data redundancy will slow the data updates because there will be more tables to be updated.
When to use Denormalization :
- Maintaining history
- Improving query performance, Some of the queries may use multiple tables to access data that we frequently need
- Speeding up reporting, We need certain statistics very frequently. Creating them from live data is quite time-consuming and can affect overall system performance.
- Computing commonly-needed values up front, We want to have some values ready-computed so we don’t have to generate them in real time.
We need to make a further analysis whether a table need to be normalized or not, because poorly done normalization will maket the data performed worse. A further analysis containing the pros and cons when normalized or denormalized data is necessary before the decision is made.
It is also important to point out that you don’t need to use denormalization if there are no performance issues in the application. But if you notice the system is slowing down or if you’re aware that this could happen – then you should think about applying this technique.
Published at : Updated