Understanding Data Lineage from Varying Perspectives
Understanding Data Lineage from a Business and IT Perspective
“…many of the truths we cling to depend greatly on our own point of view,”
Obi-Wan-Kenobi, Star Wars: Episode VI – Return of the Jedi
When looking at data lineage, a single quote from Star Wars sums up a lot of the confusion and expectations, and our understanding and expectations of the function. Ask a technical user for lineage and one would expect to receive a complex diagram that represents flows through every single store, extraction and transformation point throughout the enterprise. Ask an enterprise architect about lineage and definition and expectations will differ significantly. Each visual will represent the “truths” according to their points of view, but may not represent the views of others whose views represent the business use and governance of the data.
Lineage is Pervasive
In our everyday lives, we tend to take GPS trackers for granted – but they accomplish a lot, tracking our routes from source to destination, collecting all types of statistics and optimization to storing historical data. In the workplace, the vehicle for us to make many of the critical decisions we make is data, and the routes are built using myriad of underlying technology components. Using the GPS analogy, as business users we are interested in getting our data at the target destination without worrying which satellite carries the signal, the additional maps, alternative routes or historical statistics needed at a granular level to deliver the data, but rather the high level view of how we get our data delivered in a trusted and timely fashion. Without this concept of a business oriented solution to navigate for data we use, we are left with a series of confusing artifacts (though important to the technical population), which attempts to provide some direction such as technical data lineage sourced from an ETL tool to track data flows as it moves from system to system.
What’s Missing is Business Lineage
There are different perspectives on how to view your data – from a technical standpoint (where does it live and how does it move) and from a business standpoint (what do you need to know to make a good decision). While solutions such as Master Data Management (MDM) provide a perspective on data lineage based on a technical point-of-view, business users are aghast at trying to interpret these illustrations of technical data flows. If the business is to drive data governance and accountability, they need to understand data flows from a perspective of how the data is used to perform a business function. This is a very different perspective and requires the technical details to be synthesized in a language which captures the business impact.
As data flows through an organization, it goes through multiple systems and consumption points that can transform and alter the data. Because many systems act as source and target, understanding the data flow can help us have a better understanding of how to ensure data integrity.
At the end of the day, the organization’s data impacts business decisions, how you manage your operations, mitigate risk, forecast profitability, and much more. If you don’t understand the impact of the data on these items you won’t understand what to do when the data changes or new priorities emerge. Business data lineage should answer all of the questions that business producers, contributors and consumers need to solve a problem, finding the accountable party, or discovering new insights.
To understand the data lineage requirements, we should first understand the different personas of your users.
Types of Users
In the business world, there are multiple departments with various employees all of whom have different goals. So, it is important to understand what each user is trying to learn from their data lineage.
Data Custodians: Data custodians tend to be technical resources within an organization. Technical users are interested in the physical storage and movement of their data. Their ask of lineage is to know the multiple steps data takes throughout an environment, persistent stores and any other technical variations. These users understand the more granular enterprise data flows and how to navigate the various “hops” that data makes throughout the organization.
Decision Support: The second type of user is Decision Support. These users are typically business analysts and they understand some of the technical information, but not all of it. They are interested in application and vendor data, data mapping, and data transformations more so from a business perspective.
Decision Makers: High-level users are the Decision Makers. These users understand the high-level flow of the information and the business rules and policies that are impacted surrounding this lineage. They are looking for information about data lineage through business processes, like business functions, business rules, and business policies. They want to know what vendors delivered the data, how they processed the data, and how data quality was ensured.
Having both technical and business data lineage helps these different users all function in a better and more cohesive way.
The Two Views of Data Lineage
In addition, there are also two different ways to view data lineage depending on the user and what they want to accomplish.
Directional View: This view explores the data’s origins and where it moves over time. It describes what happens to data as it goes through diverse processes. This type of view helps provide visibility into the data analytics pipeline and simplifies tracing errors back to their sources. This view is typically used by high-level and intermediary users.
Impact View: Instead of knowing all the details of the direction of the lineage, it shows all the relationships of the lineage. This allows people to interactively explore these relationships and query the entire glossary from a visual perspective. This view shows the impact a security policy has on the different data domains and is usually used by the low-level user.
So how can organizations assemble and synthesize the right data lineage view for the right user?
Organizations can implement an automated data governance tool with interactive data visualization and lineage capabilities. The solution should deliver both a directional and impactful view, and include extended search capabilities covering data relationships and hierarchies.
The solution should also allow for low-level, intermediary, and high-level users to gain valuable insights into data flows, definitions and responsibilities. It must also be interactive and should skew more to the business population while still providing technical oversight for critical data elements. The ability to automate where possible to reduce the manual footprint is also valuable for the extraction of business lineage, utilizing connectors where possible to extract technical metadata and lineage from various systems and platforms.
The Importance of Data Lineage
So far we’ve identified who benefits from data lineage, as well as the various perspectives for different audiences. But why do we need it? Here are just a few reasons:
- Data lineage helps explain the different processes involved in the data flow and their dependencies, which allows the various user groups to better understand the data they consume and make critical decisions that may impact the organization.
- Data lineage helps with the maturity of a data governance program because it provides the needed information, at the right level of granularity and context for business to understand and direct the program.
As noted earlier, data continues to grow exponentially both in volume and scope, making knowledge about data even more critical. Understanding your data’s quality, correctness, and completeness using data lineage helps all audiences better understand their data and make more informed decisions.
To learn more about data lineage, download this data sheet:Download the Data Sheet