Three Steps to Start Tracking Data Lineage

Franco PrimavesiSeptember 9, 2020

Atricle: Understanding Data Lineage from Varying Perspectives

It’s no secret that data is growing at rapid volumes. Add to that a global pandemic that has increased market risk and raised the stakes on positive customer experiences, and suddenly the only way to survive is to count on trustworthy data to get you ahead. With the increasing number of new data sources emerging and the unprecedented growth of big data, tracking data from origination through consumption, otherwise referred to as data lineage, is now critical.

Confirming data’s origins, systems it moves through, location, functions, transformations and meaning ensures that data users count on data that is actually trustworthy.

Regardless of data source, when data travels through an enterprise the probability of it being changed, intentionally or not, is very high. Data doesn’t move in a binary pattern from system A to B to C. Instead, data might start in system D and skip from system D to system F or L.

Not only is data’s path unpredictable, so is its form and function.

The IT community understands that as data progresses through data systems and ingestion points, it also transforms along the way. When data changes, so does its use within business processes. And, documenting those alterations is critical.

Information in constant motion is difficult to manage.  These adjustments make it difficult for users to decipher where the data came from, its form and meaning.

Since data’s journey is uncertain, organizations must track and monitor data throughout its lifecycle. That’s where data lineage comes in.

So, what’s the best way to get started with tracking data lineage? These three steps identify how organizations can tap into their data’s journey.

1. Track Data Lineage from Multiple Perspectives

An organization may share several common goals, but each department may have their own goals and priorities. As a result, data users have different needs and specific information they want to uncover from data lineage.

The first type of data user is the data steward. They generally work in the IT department and are the link between IT and business. They are interested in the physical storage, fitness and movement of data.

Data stewards are the data suppliers. They need to ensure data can be used, and used according to the organization’s security policies. Therefore they need to understand data from a technical perspective,—data storage procedures, data combinations and data transformation processes. They can interpret the many hops data takes throughout an enterprise to locate regulated and/or licensed data, understand how that data changed and demonstrate its impact on regulatory policy.

Data stewards need to ensure that the consumers of data are able to understand data at the business layer. A consumer should not need to dig into the technical layer to understand the data. They need to make sure that the relationships between the technical and business layers are present and accurate in order to create business lineage.

Business analysts are the consumers of data. They want to easily understand data from a high level or a business perspective. They use data lineage to understand data flows and the operational functions, business rules and policies impacted as data changes.

Business users are interested in various applications and third-party data, data mapping and any data transformation from a business perspective. They need to know if data is available and if it is suitable for their intended use. They want to know where the data came from, what processes the data went through, business knowledge around the data and its data quality and governance score. This information helps the business user know what can be used and that it’s trusted.

Once the enterprise understands the needs of all data users, it needs to track data lineage from two perspectives – technical and business.

2. Establish a Data Catalog

Following both technical and business data lineage from inception through utilization requires integrated data governance. By including data lineage with data quality within a data governance strategy, organizations build a data catalog and take inventory and valuation of all enterprise data assets.

A data catalog presents the collection (the offer or supply) of data assets and need to deliver clarity into the details of such data assets. This is achieved with the help of business and technical data lineage.

The catalog incorporates both data lineage perspectives, giving data users precise understanding of the flow (where data comes from), context (what it is, what it means) and dependencies (how it is consumed) of data. The catalog also includes data definitions, synonyms, business attributes and governance and quality metrics. This robust approach builds organizational trust among users who are often the ones tasked with extracting insights to meet industry challenges, identify marketplace opportunities, improve the customer experience and translate information into business value.

Tracking data lineage from a technical and business perspective is key to transforming data into a business asset. Additionally, organizations can utilize a data intelligence platform with automated capabilities for data governance, data quality, data lineage and analytics.

3. Automate with a Data Intelligence Platform

A data intelligence platform automatically harvests data and its technical lineage, analyzes and uncovers data descriptors needed to link technical data assets to business data assets, a critical step to automatically generate business lineage.  A data intelligence platform also automatically identifies any business context, determines quality and governance scores, measures knowledge impact and incorporates all the information in an easily accessible, searchable and browsable data catalog.

By utilizing a data intelligence platform to bring information directly into the data catalog, organizations include data’s business meaning, how it’s used and how it impacts the company, supplying additional business knowledge around data. As a result, data users can quickly connect information sets to a wide variety of business outcomes.

The data landscape has changed, but with a comprehensive, intelligence driven data catalog, organizations deliver a centralized source of all enterprise data knowledge. As a result, technical and business users can quickly track data lineage to immediately derive value, business intelligence and results from enterprise data.

Are you looking for information about automated data lineage? Check out this article, above or below, for more detailed information.

For additional information about data lineage and data understanding, read this article from Solutions Review:

Get Insights

For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Atricle: Understanding Data Lineage from Varying Perspectives