What is Data Lineage and Why Is It Important?

Nam TranMay 20, 2020

Webinar: Data Governance - the Key to your Organizational Goals

Enterprise data is in constant motion. As soon as an organization creates or ingests a piece of information, it begins its journey. As data moves through a variety of extractions and ingestion points, it’s manipulated and transformed to meet different business’ needs. As data moves through different platforms, its format, function and integrity levels may change multiple times without any transparency and oversight.

To protect the quality of the data and provide an audit trail throughout its lifecycle, companies must track, monitor and apply governance standards to the data’s movement from beginning to end. Monitoring that movement is referred to as data lineage, which helps track where data came from, where data is going and how it was altered to provide the business with visibility and traceability of data. This concept helps users verify source and trust the information they’re leveraging for decision-making and to provide actionable insights and new business opportunities.

However, documenting data lineage requires organizations to look at different lineage perspectives to monitor data quality and encourage data utilization.

Understanding Data Lineage 

Data lineage has different meanings to different users. Identifying which school of thought an individual subscribes to with regard to data lineage depends on that individual’s role and objectives in the organization.

The first viewpoint is business data lineage. This perspective is subscribed by business users who need to understand how data fits the business and the impact to data if the business modifies its use. For example, when a business user from the operations team is looking to update the format of the data, they will need visibility into where data is being used before submitting a change request to the IT team. They need to quickly identify the reports, business processes and metrics using that data to understand impact before making such a decision.

While business users believe business data lineage is important to data governance and quality, the IT community has a different perspective, which is technical data lineage. Technical data lineage captures data on the physical level such as schemas, tables, columns and how it moves across systems using ETL jobs, procedures and transformation rules. Tracking technical data lineage helps the IT community quickly narrow down the cause of data quality issues and triage efficiently.   

Each above perspective represents the “truth” of data lineage according to their viewpoint, which is why collaboration is critical between the IT and business community.

How Different Data Users Can Benefit from Data Lineage

In any business, different data users have diverging goals and priorities. Data lineage helps both IT-focused data analysts and business users company-wide accomplish various tasks.

For example, data analysts, who work within an organization’s IT department may use data lineage to understand the multiple steps information takes throughout a data supply chain, across data lakes and other technical changes. This information helps them demonstrate the impact regulatory or internal policies have on the data landscape. By knowing the physical storage and movement of their data, data analysts can quickly identify where sensitive information is located and how that data changed over time.

Business users can leverage data lineage at a strategic level by seeing into the context of the data to decipher data’s origins and flow. Data lineage tells business users where the data came from, what processes the data went through and how data integrity was ensured so they can understand and rely on their data to generate reliable, trustworthy insights.

If companies want to produce quality, dependable business intelligence, they must understand the origin of their information. To track and understand business and technical data lineage, organizations require a comprehensive, enterprise-wide data governance program. 

Tracing Data Lineage Through Data Governance 

Following data lineage from inception through consumption means companies need an integrated data governance framework that incorporates data quality. Businesses can then utilize data governance to take inventory of all enterprise data assets by building a data catalog.

A data catalog provides transparency into the details of an organization’s data assets, including data definitions, synonyms and key business attributes, so all users understand and utilize their data as an asset.

More important, a data catalog also documents data lineage, from origination through the data supply chain, giving both business and technical users a clear understanding of the flow, context and dependencies of their data.

By tracking data lineage from varying perspectives, organizations establish data trust and empower users to leverage data as a valuable business asset.

Data lineage is the foundation of transforming data into an enterprise asset, providing both business and technical users with the information they need to take control of their data.

Are you looking for more information about data lineage? Check out this webinar, above or below, for more detailed information about data lineage.

For additional information about tracing data lineage through data governance data catalogs, check out this article from Dataversity: https://www.dataversity.net/harvest-data-lineage-to-build-effective-data-governance/.

Get Insights

For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Webinar: Data Governance - the Key to your Organizational Goals