Database managers and other IT professionals should guard against “database redundancy” or “data redundancy” because of all of the negative impacts that redundancy can have in a database system or environment. Wherever a certain piece of data is duplicated, either in two fields in a database, or in two different database environments, it can have consequences for data retrieval.
One of the first reasons for avoiding data redundancy is that it may be wasteful or excessive.
It is important to point out that some types of data redundancy are planned, in order to safeguard and back up data. However, others arise from poor or inefficient coding, or the lack of attention to best practices. In many cases, large amounts of data redundancy cause the database to quickly grow beyond a reasonable size. With this in mind, many efforts to combat data redundancy are done to save space in a database, and consequently, to decrease costs and maintenance effort. However, this has to be done with an eye toward practicality – engineers can practice something called data deduplication, but it has to be done in a way that is efficient.
For example, database managers might explore something like taking away a string from a repeated field, such as a shared customer or company name, and replacing it with a simple variable reference where the string is held somewhere else. This can save space on a database – but it can also require more server activity to perform a given query, so it might not be as efficient as it seems.
Another big reason to deduplicate data or avoid data redundancy is because of the confusion that can result. Redundant data in a database can cause various types of anomalies. One of these is called an update anomaly – update anomalies happen when a record is re-entered with updated information, but the update does not make it back to the original record. In such a situation, there might be three different records for a particular company employee, with three different job titles and three different addresses, because the person's information was not updated throughout the entire database, but only on the record last entered.
As suggested by experts, database administrators can avoid data redundancy by design. They can also engage in data normalization practices that can fix update anomalies and other kinds of anomalies by standardizing the ways in which database tables' records are kept. Database administrators can also pursue data deduplication efforts that clean up and standardize data in other ways. All of this serves the purpose of creating cleaner database tables, making database records more consistent and preventing all of the headaches and complex problems associated with unplanned data redundancy.