What Does Data Quality Mean?
Data quality (DQ) is the degree to which a given dataset meets a user's needs. Data quality is an important criteria for ensuring that data-driven decisions are made as accurately as possible.
High quality data is of sufficient quantity -- and has sufficient detail -- to meet its’ intended uses. It is consistent with other sources, presented in appropriate ways and has a high degree of completeness. Other key data quality components include:
- Accuracy -- The extent to which data represents real-world events accurately.
- Credibility -- The extent to which data is considered trustworthy and true.
- Timeliness -- The extent to which data meets the user's current needs.
- Consistency -- The extent to which the same data occurrences have the same value in different datasets.
- Integrity -- The extent to which all data references have been joined accurately.
Currently, there is no global standard for evaluating and verifying data quality. Instead, most organizations approach data quality improvement on an organizational or project-by-project basis, using policies and frameworks to ensure data is properly collected, handled and processed at all stages of the information lifecycle.
Data Quality Guidelines
Extracting reliable and useful information from a large quantity of data requires the data to be as complete and error-free as possible. When data quality is unreliable, it can lead to poor decisions and wasted budget. If poor quality data is being used to make decisions about an online advertising campaign, for example, it’s likely that valuable advertising dollars will be spent on consumers who do not belong to the target audience.
The quality of data should be constantly assessed and reassessed in an iterative fashion to ensure that appropriate levels of quality are sustained in an acceptable and transparent manner. It requires organizations to establish data quality guidelines for data managers, data stewards, and other stakeholders who use the data. This includes:
- Assessing data quality early and often.
- Adopting a framework for evaluating data quality in order to ensure that all aspects of data quality are evaluated and verified consistently. Data quality assessments (DQAs) can help managers understand how much confidence they should have in specific datasets.
- Periodically reviewing data quality policies to ensure they support compliance regulations.
- Hiring a neutral third party to monitor data quality. Look for a partner who has both the expertise and wherewithal to identify which datasets are high quality and privacy compliant, and which are inherently flawed and will raise concerns.
Internal data quality policies should include guidelines for data entry, edit checking, validating and auditing data, correcting data errors, and removing the root causes of data contamination. Guidelines should also include policies and procedures for change-control, standardizing data formats and resolving data disputes.