Mind the Data Integrity Gaps

Move Past Data Integrity Gaps to Data Governance

Amber PiferJanuary 30, 2018

Download eBook

To Outsource or Not to Outsource is the Question

It’s no secret that many organizations consider outsourcing to reduce capital expenditure (CapEx) or to streamline efficiencies. But while the reward seems high, it can’t be done without examining the potential risk. Some are blatant, others are hidden, but weighing risk versus reward is an exercise that cannot be taken lightly or done in a vacuum.

A common result of outsourcing in any industry is a lack of understanding of systems and processes that are put into place by such firms.  Though it may not seem evident now, organizations often reduce operating cost but lose on security and quality when they outsource work. Outsourced work often means that internal employees don’t have the education or information on how to work various systems and processes, which negatively impacts future scope or projects. Introducing these types of obstacles means data integrity  gaps, a serious compromise. A strong data governance program that includes a clear understanding of systems, processes, data, and policies have a greater chance of completing projects under budget and under time.

Data Integrity Gaps

Let’s talk further about a project in which an organization may wish to remove data integrity gaps with data integrity checks—validations that look at things like the completeness, accuracy, or timeliness of data to ensure it can be trusted.  In order to determine where the control gaps are within an enterprise, which will lead to the determination of the type of control or validation that should be added, one must be able to take a step back to see the big picture of what’s going on.

Before diving into the nitty gritty detail, I would recommend drawing out a detailed process flow which includes all the systems included in a process that is being discussed.  For example, consider all the methods of ingestion, and the dubious process that is often associated with the intake of healthcare membership data. Enrollment information can come into an organization via a number of different channels, and in a number of different formats.  Reformatting of the data, loading of the data, and the overall process is plagued with opportunities for corrupt data.  The same story holds true for any EDI Gateway environment or a process that moves critical data across an enterprise. It would be beneficial to understand what systems are involved, including understanding acronyms if used as system names, as well as the system’s function. Identify answers to questions like, “What is the purpose of the system?” and “Who are the system contacts?” and “Who is the technical owner of the system, business owner, data steward, etc?”  After this step, you can start looking at all the other elements that fall within this process and are critical to understanding how to fill those data integrity gaps.  These include, but are certainly not limited to:

  • Data Types and Transformations: What type of data is being processed? Does anything change within the data as it jumps from system to system? What is the key identifier/or identifiers within the data which proves its uniqueness throughout processing?
  • Data Transfer Methods: How does the data make it from one system to the next? How does the data change during each process? What type of tool or action executes this activity?
  • Data Frequency: How often is the data being pulled or pushed between each system? Is this a real-time or batch process? If real-time, is this throughout the day or is there a specific time/times that volume increases? If batch, how many times/day does this occur? Who else needs access to the data once it comes in?
  • Data Volumes: What is the daily, weekly, monthly average volume being processed? Provide this for both the file and record level. What is the maximum volume to be considered?  Are there any activities in the near future that would significantly increase or decrease the volume?  Could increased volume affect speed at which data is available?
  • SLAs: What are the SLAs associated with either processing data within a certain timeframe, or making sure completed data makes it to the target system by a certain time?

Moving Past Data Integrity Gaps to Data Governance

Of course there are more details you could consider when it comes to your data, but the above details are a good start.  While discussing the elements above, also make sure to note if there are currently existing checks/balances in place to make sure data makes it from point A to point B.  This could prevent you from creating redundant validations and wasting time.  Keep in mind that it is neither necessary nor sometimes possible for one single person to know all of these details.  The key is understanding who the SMEs are for the process, and getting them all in the same room at the same time.  One final thought, in undergoing the exercise above, I would highly recommend documenting the findings, as per a successful data governance program.  The results may change in the very near future as business processes, data formats, or systems change, but getting the knowledge down on paper will assist in future endeavors.

And if the opportunity presents itself, consider automating the process identified above. Data is dynamic, not static. It changes frequently over time and most organizations don’t have the bandwidth, nor the knowledge, to know when data has changed. If an automated system can track the metadata, that’s the data about data, then it leaves less room for error and more room for accurate business decisions.

To learn more about managing data integrity and data quality in the era of big data, check out the eBook below.

Get Insights

For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Download eBook