Chris Reed | December 14, 2017

3 Obstacles Blocking Big Data Quality

Data is not just a valuable business commodity in this digital age, but critical to the very success of organizations regardless of industry. If data is accurate and trustworthy, it can yield tremendous value and have a transformative impact on business and the bottom line. But unreliable data can just as easily become a significant liability. When data quality is suspect, the insights produced are dubious. Business decisions may be based on false assumptions, and business users become wary of using those data sources. All of this can result in wasted data resources, customer satisfaction issues and a lack of compelling analytical insights.

Following the explosion of big data, organizations of all sizes across every industry must grapple with complicated data supply chains. Balancing numerous systems, sources, and big data environments is no small challenge, but safeguarding data quality throughout the continuum is essential to ensure that as data is exchanged and transformed along the chain, business users will have confidence in that data and utilize every data resource at their disposal. However, there are a few key obstacles blocking many organizations from successfully assuring data quality.

Big Data Quality Obstacle #1: Proving Validity

 If business users won’t use data to run analytics, or leadership doesn’t trust the insights derived from data to make business decisions, then that data is effectively worthless. Proving data quality is a major obstacle for many organizations because they lack any mechanism or metric to demonstrate its integrity. Without proof, organizations hesitate to rely on data, fearing it may be inaccurate, incomplete or degrade as it moves from one system to the next. Organizations need a framework of data quality monitoring in place that gives business users both transparency and demonstrable evidence of data accuracy to build trust in both the source data and the resultant analytical insights.  The framework should include data quality scorecards that will measure the quality of the data. The scorecards should be standardized across the enterprise.

Big Data Quality Obstacle #2: Scalability

Big data is becoming the norm in many industries, but many organizations’ established solutions for tackling data quality aren’t scalable to meet the demands of this new reality.  They are relying on data quality tools designed for their data warehouse environments using query technology. And while these tools are effective when working with structured, smaller quantities of data, they were not architected to solve data quality issues at scale within big data environments. Environments like data lakes contain raw, unstructured, semi-structured and structured data, in vast amounts. A data quality tool that wasn’t designed to handle data profiling, completeness, consistency, reconciliation, timeliness, and value conformity checks for big data isn’t up to the task, and trying to retrofit an old solution for big data environments simply won’t cut it. What’s needed is a new solution that can perform cutting-edge data quality checks while harnessing the power of the big data environment.

Big Data Quality Obstacle #3: Meaning

Data quality is, of course, fundamentally about the accuracy and integrity of your data assets, but it is also about perception. If data is perceived as unreliable or inaccurate, it won’t be trusted or leveraged. But misperceptions are often due to a lack of understanding, and data quality can be a moving target. An example: a business user needs insights into the geolocation of customers who purchased a specific product.  The data set they draw from is missing much of the payment detail, but has accurate zip codes and product data. For their purposes, the data quality is good. But if that same person wanted to run an analysis of credit card versus cash transactions for products by region, the data quality would be unacceptable.

This example illustrates the importance of a data governance framework that can help detect, measure and report quality issues as well as provide an easy-to-reference business glossary that gives business users a proper understanding of the data and gives data quality context, impact and business meaning.

Breaking Down Obstacles Blocking Big Data Quality

 Businesses need to look for an agile big data solution that can harness the power of the big data architecture. The solution should also be able to traverse the entire data supply chain to give insight to data attrition. The solution should layer in analytics to change the way quality checks are performed. Being able to perform analytics and machine learning to find anomalies and outliers in the data in concert with your data quality business rules substantially improves the efficiency and effectiveness of data quality checks.

The solution should include a comprehensive library of data quality validation rules to quickly perform data quality checks. It should utilize a drag-and-drop approach to build data quality rules in a logical flowchart-like approach, rather than custom coding in archaic rules engines that are neither reusable nor easy to understand, edit or maintain.

In addition, the solution should offer sophisticated, easy-to-use capabilities to deploy data quality validation including data profiling, completeness, consistency, timeliness, reconciliation/balancing, and value conformity. It should allow users of any skill level to quickly and easily apply powerful data quality checks to data sets. It should enable a repeatable process to institute data quality routines to standardize processes, re-use rules, and integrate results into interactive reports and case management workflows to quickly resolve data quality issues.

 To learn more about validating data quality download this datasheet.

Download the Data Sheet

Subscribe to our Blog!