Managing Healthcare Data Lakes

Have you Considered the Integrity of the Data Swimming Around your Data Lakes?

Jeffery BrownMay 11, 2017

Download Data Sheet

You Can’t Manage What You Can Measure

According to Health Data Management, even though data lakes are becoming the most popular platform for modernizing data environments, many payers are still struggling to manage them and to extract value from the burgeoning amount of data being stored in big data repositories. What’s needed is to apply lean management principles focusing on the reduction of waste, but what’s occurring is empire-building metrics to measure arbitrary information such as how many terabytes do we have now? What management needs to ask is not about size, but rather, “so what?” We will never achieve the level insights needed to be successful by shoving more data than is needed into the data lake for some undefined rainy day need that most likely will never come.

Don’t Repeat Life Lessons You’ve Learned the Hard Way

Many people struggle to keep parts of their life organized – whether it’s a closet, the “random” drawer of stuff in the kitchen or a kids playroom – keeping things organized is the only way you’ll know if something is needed, missing, broken, etc. Data lakes are very similar.  If not managed properly, data lakes turn into a swampy mess of data that has no rhyme or reason. Data lakes are difficult to manage because the data within them is typically unstructured; and the data that is structured is often unprotected from any data quality sanity checks. The old rule of thumb for data stored in highly organized data warehouses was to balance and reconcile the data for accuracy, but ironically we regularly manage data lakes that are ungoverned, unmonitored, and unchecked for data accuracy and consistency. This results in a data dilemma – do we ignore data quality or institute it differently?

When Data Quality Becomes Top of Mind

Healthcare payers begin to see the true effects of degraded data quality typically during reporting or auditing, which happens to be when data integrity is being questioned. The question that often can’t be answered is how is the data that is of strategic value validated for accuracy, consistency and reliability when being used for reporting and compliance. The answer that often comes back is data outside of the data lake, such as in a data warehouse, undergoes data integrity validation, but not against the data stored in the data lake.

So how do we ensure that data stored in data lakes is checked for data integrity?


In order for healthcare payers to ensure data integrity within their data lakes they need a comprehensive data integrity platform that analyzes data before it enters, or when it’s extracted from, the data lake. The platform should include three core capabilities – data quality, balancing and reconciliation, and transaction monitoring.

The platform needs to apply sound data governance to big data. It does this by capturing structured and unstructured data, regardless if it’s stored in a data warehouse or data lake, in its native format and applying business rules to ensure that 100% of real-time or batch data is validated for data integrity. Governing the data lake’s metadata is also a key to improving the value of a data lake. If used accurately, metadata can provide insightful structure to the entire data lake and make the data available to people who need it, can use and reuse it, can create value from, as well as audit, cleanse and supervise it responsibly.

In addition, the platform should easily execute complex rules that automatically run in the background to catch data errors at the source, and provide continual monitoring to ensure maximum data integrity from source to destination. With a platform that delivers 100 percent visibility into all reconciliation points with flexible reporting and dashboards, any errors can easily be caught and resolved.

The Importance of Quality Data

Storing heaps of data isn’t the end goal of a data lake project. Actually, it’s just the beginning. Once the data is deemed accurate, the real fun comes in. Payers can then focus on member-centric service and unlock their data to ensure a good member experience which is critical to maintaining happy customers. 

To learn more about maintaining data integrity in data lakes, check out the data sheet below.

Get Insights

For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Download Data Sheet