We constantly hear about the explosion of big data and how important data is to any business across any vertical. However, so many business users simply aren’t using their data because they don’t know what they have (do you have an up-to-the-minute inventory of your enterprise data?), they can’t find it (does any individual in your organization know where all of your important data resides?) or they just don’t trust it (we found it, but where did it come from and what does it mean?). If you can’t answer all of these questions definitively, then, surprise, you’re not alone!
In large organizations, data inevitably spans many systems and thus, IT has the challenging task of integrating data from various processes and systems. For example, many organizations maintain traditional and legacy systems and are expanding their data capabilities with cloud storage, ‘big-data’ Hadoop clusters, and third-party vendor data. Each of these data repositories has its own rules and requirements. The modern organization’s Data Supply Chain is massive, complex and scattered and all it takes is one change, such as a 3rd party feed switch, to impact an array of business processes. It’s no wonder organizations struggle to pinpoint the right data when they need it. Yet, these same organizations require real-time insights at the speed of business to make informed business decisions to achieve or sustain a competitive advantage.
Organizations are now looking toward metadata to help solve this problem. In short, metadata is simply data about data. It can tell us various data attributes including where data resides and how to find it. Remember the days of finding a book by identifying information (aka metadata) such as an author, title, subject or date of publication? The card catalog is the first place you would go to search for something in the library. It would tell us where to find the book and how the library was organized. If a card catalog tells us how books are organized, we can think of a centralized metadata portal as a card catalog for data. Just as books reside across a library, on different floors, in different sections, etc., data is scattered across disparate systems in different formats in an enterprise. Metadata is (or at least should be) stored in a central location and used to help organizations standardize how data is located. However, before you can organize the metadata by type and understand how it functions, you need to go back and understand where metadata starts, and define your data.
Before metadata can be used to create a glossary that tells organizations exactly where the data is located and how it should be used, we must understand the purpose and value of metadata to the business. To fully understand data, it must be triangulated—viewed from three different perspectives. Gathering metadata from these different perspectives is the only way to achieve a comprehensive understanding of how and where data lives in the business:
Physical Data Perspective: Organizations have multiple databases and each one has a code specifying where exactly each set of data lives. The metadata in this model should include information about where each system resides and where certain data sets are located within each system. Typically, this type of metadata can be automatically derived from the software that runs the physical hardware.
Logical Data Perspective: This category should contain metadata about how data travels from point A to point B. Essentially it is a map that tells organizations the data’s origins, what happens to it and where it has moved over time. The logical data model shows how data should flow through an organization and gives us a picture of where the metadata comes from and how it is transformed.
Conceptual Data Perspective: Conceptual metadata should convey the meaning and purpose of a data set from a business standpoint. It should tell users what the data means, for what purpose/s is it typically used, when it was created, if it is up-to-date, and if it is confidential or not, etc. The conceptual data model requires human input to define the data. It also requires users to continuously update this metadata because it changes over time. For example, a data scientist might find a new use for ‘old’ data. In this situation, the use-case metadata should be updated to reflect this new purpose for data. To effectively manage this much metadata, users must be able to go in and suggest updates with a process in place for certification or approval. Think Wikipedia, anyone can go in and add their two cents, but there are also controls for editing to ensure that sensitive or controversial topics are not corrupted with bad information. An effective metadata implementation will have similar controlled crowdsourcing functionality.
Once an organization views their data from these perspectives, the metadata model will start to take shape. The next step is to implement an appropriate data governance solution to organize their metadata and place it in a centralized repository.
To create a comprehensive data glossary that offers business professionals transparency into the ownership dimension, organizations need to invest in a data governance platform. The platform should promote fluid communications between the data owners and the data consumers. In addition, it should have extensive collaboration capabilities for users to gain expertise on their data.
The platform should deliver an all-inclusive view of an organization’s data landscape. By delivering transparency into all aspects of an organization’s data assets, business users can gain valuable insights into not only the details of their data assets, but also quality of their data and the attendant risks associated with its use across business applications. Like with any corporate asset, you need a current inventory and value assessment to even begin the process of tracking and securing data. Comprehensive, current metadata is essential for data protection and security.
Finally, the right platform should have automatic discovery capabilities, enabling the capture and monitoring of changes to metadata. Once changes are discovered, the technical metadata relationships are the raw material upon which a business lineage is built in order to deliver meaningful insights on data in a business context. With the proper data governance strategy, enterprises can successfully create a comprehensive data governance glossary of business-term definitions that appear in data artifacts such as reports and applications. At the heart of data governance is more than just standardized terminology, but people that oversee the data and answer questions. This is accomplished by assigning data owners and data stewards that are responsible for organizing and maintaining data definitions, usage rights, and data quality parameters so that business professionals can consume data in a business context.
To learn more about managing metadata and creating a glossary, download the data sheet below.
For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.