Organizational data is constantly in flux, from the moment it’s created or ingested until it is archived or purged. Whether its source is internal or external, data is continuously being moved, manipulated and transformed within systems and processes. New sources of data continue to proliferate, with the exponential growth of IoT and big data and the prevalence of data preparation tools for non-technical users.
The path of data through a supply chain can also be unpredictable. Normally we think of data moving from point A to point B, but in complex technical environments, a single transaction of data will move from point C to J as well. Along the way, its form or function may be changed, or its identity masked. It is well understood in technical circles that as data progresses through data stores and extraction points, it may transform multiple times along its journey.
Since data passes through various systems and platforms, it’s critical for organizations to track, monitor, apply data governance standards and provide an audit trail for data’s full lifecycle. Data movement and storage is complex, and when data is altered or misunderstood, users likely won’t know where the information came from and therefore will distrust any reporting results or analytics output that it generates. For that reason, businesses must understand data lineage from multiple perspectives to build a comprehensive and mature picture of their data governance landscape.
Tracing data lineage is accomplished by:
In any business, there are varied departments with numerous employees, with diverging goals and priorities. Because of that, data consumers have different needs and distinct information they want to learn from data lineage. To understanding what data lineage should provide, you must first understand the different types of data users.
Data Stewards are typically technical employees within an organization’s IT department. These users are interested in the physical storage and movement of their data. When it comes to data lineage, data stewards want to understand the multiple steps data takes throughout an environment, across data stores and any other technical alterations. These users grasp the more granular enterprise data flows and how to navigate the numerous “jumps” that data makes throughout an enterprise.
Business analysts understand some of the technical data lineage information, but not all of it. They are interested in various application and third-party data, data mapping and any data transformation from a business perspective.
Business users are interested in data lineage from a high level. They need to know the flow of the data and the business rules and policies that are impacted as data changes and transforms. They are looking at data lineage from a business perspective to understand the operational functions, business rules, and business policies impacted by data. They want to know where the data came from, what processes the data went through and how data integrity was ensured.
Because of the different data users within organizations, it is important to have both technical and business views of data lineage to ensure that every data consumer can understand and apply data assets according to their specific roles.
There are two different ways to view data lineage depending on the user and what they want to achieve.
Business Lineage is about providing visibility into the data analytics pipeline and tracing errors back to their sources to ensure business users understand their data and have accurate data to generate meaningful insights. This perspective investigates the data’s origins and where it travels over time.
Technical lineage describes all the details of a particular piece of data such as stored procedures, data joins with other sets of data, and processes of data transformations. This enables IT staff to interactively explore these actions and easily search the entire data glossary. Technical lineage shows the impact regulatory policy has on different data environments by identifying where privacy or critical data elements are stored and how they have been transformed.
Regardless of user, data lineage is critical to ensure success with data.
Businesses use the data they consume to make critical business decisions. Data lineage describes the different processes involved in the data flow and their dependencies, establishing trust among business users to make decisions that impact the entire organization.
Data lineage is also key to an organization’s data governance strategy. It provides the information business users need to understand and take control of their data governance program. Technology plays a large role in the governance process by allowing a single point of access to all types of users. It provides data lineage views that meet the needs of the business, technical and governance teams.
To track and understand business and technical data lineage, organizations should have an integrated enterprise data governance solution that includes data lineage along with a comprehensive data catalog. Ideally, this data governance solution is integrated with data quality and analytics capabilities, delivering a self-service data management solution that can serve the needs of every type of data consumer. An easy-to-use solution with robust lineage can encourage both collaboration and data utilization.
By implementing modern technologies and creating the right processes, businesses can successfully trace and understand data lineage across the data landscape to see the big picture of data and ensure all lines of business run smoothly.
Are you looking for additional information about data governance and tracing data lineage? Check out the webinar below.
For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.