Five Best Practices for Tracking Data Lineage

Franco PrimavesiAugust 16, 2021

Many of us are familiar with the ins and outs of planning a road trip. It takes many hours to prepare an efficient route that ensures you visit certain destinations along the way. You schedule where you’re going, where to eat, confirm you have the right supplies and may even schedule visits to rest stops and gas stations. However, you enter road construction, and suddenly, the trusted route you mapped is re-routed to unknown roadways, where your relaxing trip turns into a stressful nightmare. Unless, of course, you have the right systems, the latest navigation tools and a well-organized itinerary in place.

Much like a fully mapped out road trip that takes you on different paths, data lineage tracks data’s movement through each system, extraction and ingestion point through consumption, referring to changes along the way.

Lineage provides all the necessary and insightful details about data assets that ensure data stakeholders have trustworthy data to rely on for valuable insights and decision-making.

Here are some data lineage best practices to help you trust where your data is taking you.

The Value of an Audit Trail

An audit trail is a step-by-step record that traces data to its source. Organizations should use technical audit trails to verify and track where sensitive data elements live, their source, how they altered, quality levels, who has access and how they’re shared. The trail should also show data procedures and data combinations. All these details are critical to compliance and operations and ensure that internal organizational processes protect consumer and citizen privacy. In addition, by mapping the flow of data within an organization, IT can interpret the many hops data takes throughout an enterprise to locate regulated and/or licensed data, understand how that data shifted and demonstrate its impact on regulatory policy.

By having full transparency on how data is used, organizations mitigate risk, meet regulations like GDPR and CCPA and federal privacy laws such as HIPAA and COPPA and avoid excessive fines.

Gain Confidence in Data Quality

In our data-driven world, data users must leverage reliable data to increase revenue, mitigate risk and innovate. According to Gartner research, “the average financial impact of poor data quality on organizations is $9.7 million per year.” IBM also found that businesses in the U.S. lose $3.1 trillion annually due to poor data quality. Regardless of size, poor data quality has profound implications. Data lineage helps organizations track data as it moves through each destination, tracing data alterations and errors back to their source. Users are empowered to quickly uncover and fix these quality issues and understand the impact of these changes. Data quality rules help users see data relationships between assets and ensure data users can leverage reliable data to generate insightful analytics. In addition, by instituting rules to analyze data inaccuracies, users can calculate data quality scores for technical and business assets to validate their level of trustworthiness.

Data with a View

Just like a road trip, everyone has a plan of what they want to see and do. Data users also have their own goals and priorities, which means data lineage needs to help meet everyone’s needs. Today, businesses need to view data lineage from multiple perspectives.

IT wants to view data from a technical perspective, including details critical to compliance, operations, data storage procedures, metadata connections and transformation processes.

Business users want to understand what the data means, its role in business processes and how any data modifications impact operational functions, business rules and policies. As a result, business lineage looks at frequency, criticality, usage context, determines ownership and documents knowledge.

Data lineage that combines business, technical and process lineage offers a 3D view of lineage,

helping data users better understand how data affects their underlying business and data processes. Organizations gain complete visibility and transparency across systems. Leaders gain insight to enhance operational efficiency and design better processes for effectively and efficiently leveraging data across the enterprise.

With 3D Lineage, data users can drive business process improvement, improve architectural decisions for system management and access, improve the trust of key performance indicators (KPIs) and process performance indicators (PPIs) and gain better insights into the data ownership model.

Data Catalogs Win Big

A data catalog brings together all the organization’s data knowledge, business processes, goals, objectives and metrics, into a single environment for teams to use. After all, business users need clarity into data definitions, synonyms, business attributes and usage to develop quality data intelligence.

To help create this collection of data assets, a data catalog must follow technical and business data lineage from inception through utilization

By including data lineage with data quality within a data catalog, data users gain a single 360-degree view of all data assets, understanding the data flow and their dependencies. This knowledge allows users to understand, trust and get value from their data regardless of their data expertise. Not only does a data catalog incorporating data lineage build organizational trust, but it also breaks down data silos, centralizing data systems for the entire company to use.

Automate Data Lineage

Data lineage requires an automated tool with ingestion capabilities that continually profiles and discovers data patterns, signatures and descriptors. The same automated tool can help build a browsable, curated data catalog of data assets, policies, processes, standards, rules, glossaries and more, giving users a complete and consistent data view.

Why Not Just One Tool?

Instead of cobbling together different functions, a single tool incorporating data governance, data quality, data catalog and automated data lineage can streamline data efforts and deliver valuable, trustworthy data across the enterprise.

The right tool should provide a self-service component helping data users pull their own data and reports, reducing the risk of data misunderstanding while closing the data-trust gap and giving users absolute confidence in data quality. By incorporating data lineage as part of data governance, organizations develop standard data definitions and document information, allowing users to quickly connect data assets to business outcomes and use cases.

Is your organization in need of tracking data lineage? Watch this brief product demo, Accelerating Data Governance Success Using 3D Lineage or view the webinar, Without Data Lineage, it’s Not Really a Data Catalog—Achieving Data Visibility thru Data Lineage.