Big Data Governance
Learn and Overcome the Challenges of Big Data Governance
Perhaps as a child, you heard that goldfish will keep growing to the size of their environment. But the fact is, the ability of goldfish to keep growing throughout their lifetimes is far more dependent on the quality of their surroundings than the size of their habitat. Without properly filtered and clean water, the growth of the goldfish will be stunted and their lifespans cut short.
There’s an apt analogy here for data. Data has seen explosive growth, and because of its increasing ability to generate critical business insights and improved outcomes through analytics, organizations have rushed to adopt big data environments to amass enormous data stores. But for that data to reach its full potential, it must be properly cared for. Untended data assets can quickly become data liabilities. And that is why data governance, considered a market advantage just a few years ago, is quickly becoming a business imperative.
A Growing Need for Governance
With the proliferation of big data and big data environments, it is no surprise that the conversation around data governance has grown from a murmur to a roar. Increasing regulatory concerns, like the EU’s General Data Protection Regulation (GDPR) effective this May, are also contributing to an increased urgency around understanding both the nature of data stores and the quality of those data assets. Organizations no longer have the option of simply stockpiling data with the intent to someday put it to use when resources and budgets will allow, even if they have the storage capacity to do so. Data must be understood and managed, or there could be serious compliance consequences down the road.
While definitions around data governance vary, we’ve identified it as the formal orchestration of people, processes, and technology that allows an organization to leverage data as a business asset. It’s an enterprise undertaking, and fundamentally it is about collaboration between IT and myriad business units. It isn’t a collection of projects undertaken by business units or IT to achieve limited goals—at least not if you want it to succeed as part of a long-term strategy. Nor does it exist in specific data environments, such as a data warehouse or a data lake. It is meant to encompass all of these things across your entire data supply chain, and provide a framework for enterprise-wide data ownership, understanding and collaboration.
People and process are the drivers of data governance, and there are many technological tools that can be used to enable data governance—from business glossary and data dictionaries, to data lineage and metadata management, to dashboards, workflows, and intelligent interfaces—to deliver automation and promote solid data management. But how does “big” data governance differ from “traditional” data governance? And how do you deploy these tools in an integrated strategy to maximize both efficiency and impact?
Big Data Governance Considerations
The foundation of data governance remains the same regardless of environment, but big data certainly presents governance challenges unseen in traditional relational databases or data warehouses, such as large amounts of unstructured data, semi-structured data and structured transaction data. As in the goldfish analogy, there may be a tendency to collect as much data as there is space to fill when it comes to big data environments. But unless you are governing data from the point of ingestion, there’s no way to separate trash from treasure among your data “assets” – and “garbage in, garbage out” is a truism in data analytics as much as in any other area of computer science. The biggest differentiator between big data environments and traditional storage methods is simply one of scale—every data challenge is amplified by a factor of ten when it comes to big data: business users can’t access data, they aren’t sure what data is worth managing, and they question both the value and quality of the data that is there.
Understanding is fundamental to data governance, and particularly with big data, it should begin with data lineage and metadata management as you build your business glossary and data dictionaries. Data is most valuable when it is leveraged as a business asset, so you need to empower business users to easily leverage that data. Data lineage will provide the critical information users need regarding the source of data, and metadata—including physical, logical and even conceptual—will clarify how data is used across the enterprise and the value it can bring to analytics.
Activating People and Processes with Enabling Technology
As previously mentioned, people and processes are the drivers of data governance, and collaboration is essential to the success of your data governance program. Integrated tools enable process automation, as well as the dashboards and workflow to facilitate communication, but there needs to be accountability from both IT and business as data owners, stewards, and stakeholders to act as the grease that makes governance function effectively. Big data will continue to provide new challenges, and a convergence of data management technologies will increasingly put business users in the driver’s seat as citizen data scientists tasked with turning big data into business insights. The key is to provide them with a data governance framework that will allow them to quickly identify the best data assets and mine that raw data for competitive gold.
To learn more about the steps needed to implement a data governance framework, check out this eBook.Download the eBook