The idea of implementing an integrated analytics platform, wherever appropriate, is quickly gaining credence. As organizations realize the importance of an integrated analytics platform, many are scrambling to implement one. But in the process, the issue of data quality is not getting enough attention. It is important to note that data quality is the most important factor in determining the relevance and quality of the analytics delivered by the analytics platforms. Data quality in this context means that the right data in the right format should be made available to an integrated analytics platform so that it is able to deliver meaningful analytics. But several problems such as systemic incompatibility, data structure issues and human inefficiencies are preventing even high-quality integrated analytics platforms from delivering quality analytics.

It goes without saying that without addressing data quality issues, the returns on investment (ROI) on integrated analytics platforms will not reach the expected levels. Here we examine the problems afflicting data quality meant for analytics platforms with the example of the health care sector, one of the sectors worst hit by poor data quality.

Data Quality Issues That Hamper Analytics Platform Performance

The issues with data quality can be summed up as the following: incorrect data format recording and capture, incompatibility of upstream systems with analytics platforms and inaccurate analysis.

Analytics Platforms Rely on Partial Data

In the current scenario, most of the analytics platforms only accept insurance claims data, which does not make the analytics comprehensive enough. Claims data only provides information on insurance claims such as date of admission, age and gender of the patient, reason for visiting the hospital, duration of stay in the hospital, details of treatment provided and the total cost of treatment. However, claims data constitutes only a subset of the total health care data that needs to be analyzed. To deliver total quality health care, comprehensive analytics are needed and obviously, a subset of information is not going to provide comprehensive analytics.

Incompatibility Between Analytics Platforms and Vendor Data Systems

There are a few analytics platforms that leverage clinical data through vendor-supplied integration messages, such as the Continuity of Care Document (CCD). Though the CCD provides a good way to integrate clinical data with the analytics platforms, there are a number of limitations in the form of design and implementation issues that prevent the combination of analytics and CCD to deliver quality analytics.

Other Data Quality Issues

Data quality may be impacted in a number of ways at a number of places throughout the data supply chain that leads to the analytics platforms. The points through which the data passes or is created are data transport, electronic health record (EHR) configuration, normalization, aggregation and reporting mechanisms. Data quality may be impacted in a number of ways, such as:

  • Incorrect identifiers of any entity, such as missing or incorrect social security number, date of birth or sex
  • Numeric data such as age written in text form rather than in the structured field
  • Incorrect entry of standard codes, such as incorrect diagnosis or treatment codes

It is clear that the above problems are the result of both omission and commission. But when data is exposed to such a long chain and human handling, it is prone to errors.

Capturing Quality Data for Analytics Platforms

Before discussing the ways to capture quality data, it is important to identify the points through which data quality issues are created. The most important points are capture, structure and data transportation.

Capture is the stage when people enter data into the system, for example, medical investigation report data. It is extremely important that all the relevant data be identified and entered into the system.

Structure is the stage when the correct data needs to be entered in the correct format and field. For example, patient weight is considered numeric data, but if it is entered into a text field, then the quality of the analytics will be impacted.

Data transportation is the stage when the data is loaded onto the analytics systems for analysis. The main reason data quality suffers at this stage is the absence of a direct connection with a database. When a database is directly connected with the data supply chain, the essential fields are captured in the right structure and format.

Given below are some ways to improve data quality.

Limit Unstructured Data

While it is extremely important to enter data in the right format and in the right field, for a human user, too many drop-down lists and boxes can be overwhelming. This is especially true when huge amounts of data need to be entered on a regular basis. To address this, data entry personnel need to be given an easy, intuitive user interface. In the healthcare field, for example, there should be a provision to attach and upload images of medical investigations such as X-rays, and the system should be able to identify whether the image dimensions qualify.

Make the Process Easier for the User

When human users enter data, chances are that there is a lot of repetitive data that needs to be entered. For example, if data regarding cardiology patients is entered, different specific conditions may have different codes. Each time a code is entered; the system should provide suggestions or automatically fill in the corresponding code. It requires just a bit of good programming or tweaks in the codes to implement a system like this. That way, you are reducing the possibility of human error to a great extent. If possible, there should also be validations in case the human user enters incorrect codes.

Bridge the Gap Between the Supplier and Analytics Systems

As pointed out earlier, data quality suffers because of design and implementation issues between the supplier and the analytics engines. Organizations need to work on creating a minimum common structure of data supplier systems, such as the EHR systems, so that the essential data is supplied in the right format to the analytics engines. Given that there is a large number of supplier systems made by different vendors, it is a challenge to achieve a common structure. However, there should be an effort toward building up a common minimum structure of data supplier systems.


The most important step towards data quality for analytics engines seems to be the common minimum structure of data supplier systems. While the other steps such as making human use of data entry systems more efficient are important, they will still naturally be prone to error. However, a standard of data entry and data transportation can ensure that the right data in the right format and structure is being entered into the analytics engines. For that to happen, there needs to be a common standard and protocol for the development of systems and user interfaces.