In a previous blog, I introduced the concept of a data supply chain. Not familiar with the terminology? It’s similar to a typical supply chain, but it moves data instead of goods. Data is the raw material that enters an organization. That data is then stored, processed and distributed for analysis – akin to the transition of the raw material to the creation and distribution of a finished product. Or in our case, from raw data into insights. The last leg of a data supply chain involves an easily searchable data portal that allows the business user to discover and order the data to solve specific business problems.
To be successful, one key fundamental aspect has to be present: data quality. Data quality is the foundation of an organization’s data supply chain. When an organization has quality data, analytics programs have a chance at success. This leads to an improved customer experience and increased revenue. But, without quality data, an organization’s data supply chain becomes a huge liability. In addition to lost profits, lack of data quality can have negative effects on a company’s reputation or result in fines and additional costs for the organization.
To measure data quality, data must be checked and assessed for any issues. Consider the following four checkpoints to ensure the reliability of your data.
To achieve the highest level of data quality, it is crucial that data is analyzed with the following four characteristics:
Completeness: Data must be checked to ensure the values in every data set are complete and there is no missing information. For example, an organization may have a data record that includes a customer’s name, age and Social Security number but the address is missing. Therefore, the record isn’t complete, and it may not be possible to contact the person, or it could lead mistakenly sending key information to a different [Bill Smith] than the one in the actual record.
Conformance: This characteristic focuses on whether all the values in a data set conform to a specified format. Conformity looks at the format and determines if the value has the correct quantity of numbers or characters and proper casing (upper case or lower case). For example, if an organization is looking for Social Security numbers and they come across one containing 11 numbers, then that value does not conform with the standard nine number format, and therefore, must not be trusted.
Integrity: Data Integrity is a measure of how data has changed as it flows from system of origin to point of data consumption. As data flows from system to system, data can be corrupted, data records can be lost, data fields can change and data formats can change. Not all of these changes are necessarily bad, but it is vitally important to know where, how and why changes have occurred. For example, a customer may enter their name on a website as ‘Name: Joe Smith.’ This record might appear in a downstream system as ‘First Name: JOE ‘Last Name: SMITH.’ In this case, the data field, ‘Name,’ has been parsed into two fields, ‘First Name’ and ‘Last Name.’ In addition, the text has been converted to upper case. In this example, the data changes appear benign, but, it is important to understand these changes, or they will result in mismatched fields or incompatible formats. For example, if a billing application is expecting a full name, and receives only the first name, this error could result in a bill being sent to ‘JOE,’ which could cause confusion in a large organization that employs dozens of Joes.
Validity: Organizations must check for validity to ensure the data conforms to the respective standards set for it and the data is realistic and makes sense. For example, if an organization comes across a date of birth from the 1800s for a current customer, common sense would dictate the value is probably not valid. A certified reference table can be used to confirm the validity for known values such as state abbreviation code, or city name. A valid data value is not necessarily accurate. For example, ‘Springfield, MO’ could be a valid value, but only ‘Springfield, IL’ is the accurate value for the Abraham Lincoln Home National Historic Site.
Understanding these checkpoints is the first step to data quality improvement, but to ensure the data’s ongoing success and that there are no interruptions in the data flow, it’s important to test the data to ensure consistency. To test for data completeness, conformance, integrity and validity, there two options.
The first option is to compare data fields to a pre-defined list of data codes such as state abbreviations or ZIP codes.
The second option is to assess the data by running tests to determine if data values are complying with basic rules. By automating data quality testing, organizations can help prevent gaps in data. The right platform allows users to define validation rules once, resulting in higher data readiness.
Once an organization has implemented data quality testing, they can raise the level of their data quality checks to a more strategic level called Data Quality Intelligence (DQI).
Data quality intelligence is a quantitatively-backed, consistent measurement of data quality. To create a data quality intelligence measurement, organizations must bridge the business and IT divide, and collaborate to create those measurements.
The process starts with the IT department preparing data for the business user. Once prepared, the business user then goes through that data and identifies elements of the information that might be a problem. For example, consider a company that is listed three times in three slightly different ways, including: Company X, Company X LLC and Company X Inc. The business user must then decide which one of those are correct. Once they do, the IT department can set up data quality rules leveraging the power of analytics to achieve higher yields to ensure those fields are consistent or linked together.
The next step is to visualize the results using a data quality intelligence dashboard that links key business terms with data quality scores so that business users can understand what data is reliable to utilize in business analysis to derive useful statistics and insights. Many enterprises utilize a data governance dashboard which includes metrics such as a quality score tied to key business terms, which tells the user how the quality of their data ranks compared to their pre-set measurement rules. This adds a layer of business level-visibility and accountability to data quality, it helps to measure progress and it provides valuable information to help an organization be effective, and make higher level business decisions.
The aim of data quality intelligence is to not only bring organizations together to collaborate on data definitions and rules, but to help business users across the organization rethink their approach to data quality by giving them a more complete picture and better understanding of the quality of their data by leveraging both analytics-enabled data quality and data governance.
Data quality intelligence ensures that data isn’t being used when it shouldn’t be and gives organizations a chance to remediate any data quality issues they find. With this kind of insight into their data, organizations can avoid a negative customer experience.
To learn more about wrapping data quality intelligence with analytics and data governance, check out the eBook below.
For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.