How to Successfully Implement a Big Data/Data Lake Project

Thierry RoullierMay 18, 2017

Download Data Sheet

Big Data’s Batting Average

We’ve all heard it before: “This was an extremely successful pilot, but we still need a solid business case to help our cause when we pitch this idea to the executive team.”

Gartner says that, as of 2016, only 14% of big data programs have hit ”production” stage, while 70% of big data initiatives have not moved past the “pilot” phase. Chances are most big data budget requests for 2017 were turned down by the CIO/CEO, or put on hold, due to the inability to deliver a compelling business use case or direct sponsorship from business teams.

One of the most successful big data use cases in recent years was around a big data platform driven by a data lake. The idea was to store raw data to open up decentralized data access to business teams, democratizing data to create an opportunity by which all levels – from CEO to shop floor – could access the data analytics power needed for effective decision making. Naturally there was tremendous support from business teams who were deprived of data access for years.  Despite successful use cases and general acceptance, IT executives continued to struggle to justify further investments toward experiments around data lakes or attract sponsorship from C- level executives.

Driver for Change and Innovation

As technology teams continue to be influenced by the hype and disruption of big data, most fail to step back and understand where and how it can be of maximum business value. Such radically disruptive new business processes can’t be implemented without knowledge gathering and understanding how big data technology can become a catalyst for organization and cultural change. Change has been proven most effective by instituting smaller, incremental changes to prove value along the way versus the big bang approach to boil the ocean on a multi-year project that is bound to experience organizational resistance.

Looking back at historical projects – ERP, CRM and Data Warehouse (EDW) programs weren’t identified, created, and executed overnight. And neither can big data. Execution and success are binary, when done properly.  Big data/data lakes programs have much to learn from its predecessors. Success can come from capitalizing on the existing successful programs – their processes, timelines, technology, etc. – and identifying such small improvements that can quickly bring business value and garner momentum towards the next milestone.

The objective to such an approach is to identify areas of innovation across existing programs, defining small wins that can build new innovative thinking across business teams.  Small wins provide excellent platforms for leaders and small groups to learn from, and can bring about more creativity and have cultural impact on the organization’s data usage. Such changes are the drivers for quick acceptance of data programs like big data/data lakes.

The Solution

A big data analytics platform with self service capabilities allows you to draw on the data inside the data lake to make better decisions. Rather than waiting for IT or a data scientist to pull the data you need, you’re able to do it yourself and not lose the opportunity at hand because you were waiting days or weeks for the data needed. Below are a few considerations to make once you’ve disrupted your organization with a data lake:

  1. Data Quality: We said earlier that the idea of a data lake has been around longer than the word has in the English dictionary. Assuming this arbitrary fact is true, your organization likely already has a data lake. But is it a data lake or swamp? Have you done anything to ensure the quality of the data before it’s transferred for analysis? You’re not alone. Most organizations have used their data lake as a dumping ground for the past few years under false pretenses that they’ll eventually need the data. The reality is that they do need the data, but the information has to be clean. Applying data quality before the data is transformed during the ELT process means you’re actually analyzing data you can trust. Novel concept, I know.
  2. Machine Learning: After you’ve cleaned up your data, apply machine learning analytics to improve the quality of your analytical outcomes. That’s where the real value of the data lake comes from. Look at how business problems can be addressed by machine learning. Remember you no longer need to be an expert to benefit from machine learning
  3. High Volume Processing: At a previous place of employment I worked in a corporate complex that housed three office buildings with various tech companies. There was a shared cafeteria and I remember overhearing a conversation during lunch about how many Teradata’s worth of storage one gentleman needed for all the data they were storing. The reality is not how much data you store, but how much data you can process. Storing data just drives up your storage bill. But if you can process that data and draw conclusions on that, now you’re in business. And the only way to do that is to make sure that you can leverage good quality data at scale from which you’re drawing conclusions.
  4. Business User: Who are the frustrated business users in your company and what are they looking for? You need to put them on your side to strengthen your business case. Their ability to extract data from the data lake will inevitably help push the project through.
  5. One Step at a Time: Finally, the obvious one, work step by step, one use case at a time. Look for the low hanging fruit and prove to management that you can execute.

If you follow these steps you will improve your chances of a successful data lake implementation. History repeats itself and we can learn from data warehouse and cloud implementations in the recent past to void the mistakes that were made.

To learn more about implementing a successful big data analytics solution, check out the data sheet below.

Get Insights

For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Download Data Sheet