The Lost Art of Data Mining

Transform marketing, customer acquisition, risk management, retention, and debt recovery

Elbert HearonJanuary 26, 2017

Download White Paper

Data Mining Intelligence

Organizations are currently inundated with big data that consists of many attributes about their customers or prospects.  These attributes range from hundreds to thousands of structured variables that are obtained from a myriad of databases as related to transactions, billing history, financial activity, spending capacity, demographics, credit behavior, credit scores, customer sentiment, product usage, transactions, deceased records, or Office of Foreign Assets Control. To add to the above-mentioned attributes, organizations are engulfed with unstructured data obtained from call center logs, customer surveys, and even social media.

With organizations receiving gigabytes, terabytes, or petabytes of the above-mentioned data on a daily, weekly, or monthly basis, it has become an arduous task to sift through this vast amount of data to glean intelligence for making optimal business decisions. These business decisions span strategies for target marketing, fraud detection, account acquisitions, cross-sell or upsell, risk management, customer retention, payment behavior analysis, collection prioritization, debt recovery, and many other business functions.

Data mining is one solution that can help organizations optimize business strategies for making more profitable decisions.  Let’s look at some of the critical preliminary steps, as well as the actual process of data mining.

Three Essential Data Mining Prep Steps

  • Data Access: The first step in the data mining process is reading-in files which are typically from disparate sources. For example, data is often received in many format types such as fixed, SQL, delimited, Microsoft Excel, CSV, XML, Java Object, Hadoop, or various databases. A flexible data access tool is essential in reading-in these multiple files from disparate sources.
  • Data Quality: Once files are read-in, a key preliminary step to data mining is to check critical data fields for completeness, conformity, consistency, reasonableness, and overall accuracy.  Thresholds should be set to determine acceptable pass rates. For data fields that fail your thresholds (e.g., missing values, amount values that fall outside of normal ranges, or account numbers that do not conform to specific lengths or types like numeric), they should be investigated by your data quality team and fixed before proceeding.
  • Balancing and Reconciliation: Before data gets stored into repositories for mining, analytics, or reporting purposes, they make many “hops” through several databases. For example, data from new customers for a media and telecommunications firm may go through several systems (e.g., applications, order, provisioning, network, switch, rating, billing, invoicing, etc.) before settling into payments, collections, general ledger or other databases. Hence, as these customer records are migrating from system to system, balancing and reconciliation steps ensure totals match, records do not get lost in the process, and they migrate through systems in a timely manner. Furthermore, checks need to be put into place to monitor incoming files to ensure that they arrive as expected and are not duplicated.

Data Mining Steps:

Prepare Your Data:

Before proceeding, there will be a point when the disparate datasets will need to be prepared for data mining or predictive analytics. For example, it is common that an organization receives hundreds of thousands of files daily, weekly or monthly that need to be combined into a few. However, there are considerations to take before or after joining these like checking for duplicate records or payments, imputing missing values, transforming variables, binning variables, sampling large databases into smaller, more manageable ones, performing variable reduction, and/or splitting files into training and validation datasets. These or other data preparation functions are necessary to obtain the most value from your wealth of data.

Know Your Customers:

A critical step in the data mining process is learning more about your customers, good or bad. The intent is to accurately identify your customers in detail, by segment. The description of your customers should be crystal clear to your business strategists to enable them to accurately execute target marketing. From selling the right bundle of products or services to minimizing delinquencies and properly segmenting your customers is critical.

For example, the following are 20+ key questions that should be considered for answering, by customer segment, while designing your data mining strategies:


  • How do characteristics differ between my profitable and unprofitable customers?
  • What are customers’ demographic, psychographic or lifestyle characteristics?
  • What are customers’ buying habits, product usage, and overall profitability?
  • What is the market size of prospective customers?
  • Do my ideal customers reside in urban, suburban, or rural areas?
  • What are optimal channels of communications to reach my customers or prospects?
  • What are buying proclivities for customers or prospects within unique segments?
  • Which customers should be targeted for cross-sell or upsell promotions?
  • Which customers are likely to defect and warrant retention programs?
  • What is our six month sales forecast to establish budgets, inventory, and staffing?
  • What are the next best products or services to offer each customer?
  • Overall, what is the 360 degree view of my customer portfolio?

Fraud Detection

  • Which point-of-sale transactions should be flagged as fraudulent?
  • What are customers’ unusual behavior patterns that could indicate fraud?
  • Are there links between multiple customers that are indicative of fraud rings?

Risk Management

  • What credit line assignments should be assigned based on customers’ credit profile?
  • Which customer accounts should we closely monitor to proactively employ actions to minimize delinquencies or write-offs?
  • What business relationships with our organization should be considered for delinquent customers before taking adverse actions?
  • Which customer’s accounts should be renewed for service upon expiration?
  • Which customer’s should we require a deposit before providing services?
  • Which customers are entitled to an expansion in their products or services?
  • What proportion of customers are forecasted to migrate from 30-days past due on payments to write-off status?
  • What factors are indicative of a good account going into write-off status?
  • What is the correlation between my customers’ payment behavior with other credit grantors and my organization?
  • How will adverse economic conditions (e.g., increase in unemployment or interest rates) stress my organization’s revenue or profitability?

Debt Recovery

  • What are forecasted 30, 60, or 90+ day delinquency rates, including write-offs, by segment?
  • Which delinquent accounts should we prioritize for collections to maximize account receivables?
  • Which work queues should we assign delinquent accounts for collections based on stage of delinquency, dollars past due, and overall customer relationship?
  • What are optimal collection strategies (e.g., dunning letters, automated dialers, highly vs less experienced collectors, etc.) should we employ to maximize dollars collected?

Traditional statistical techniques would be very useful to help describe your data. For example, descriptive statistical measures such as averages, medians, modes, ranges, variances, standard deviations, percentiles, and other techniques would be ideal for describing your data.  In addition, inferential statistical techniques such as classification or segmentation, regression analysis, forecasting systems, or recommendation systems would be ideal for testing various hypotheses about your data or making accurate predictions.

Furthermore, some of the more advanced machine learning techniques would be essential here, too. Examples include decision tree learning, association rule learning, support vector machines, genetic algorithms, Bayesian networks, deep learning, clustering, or reinforcement learning.

For unstructured data, tools related to text mining, sentiment analysis, or content categorizations are vital. These tools provide your organization with the “softer” side of how customers or prospects think about your products or services that may not be reflected in the structured data. For example, customers could be posting positive or negative feedback on social media about their experiences with your products or services. This valuable information could be used to fix problems that will reduce attrition or lead to valuable product features that could significantly increase sales.

In addition, visualization tools that use dashboards which consist of charts, graphs, tables, or plots are also critical in data mining. These tools help you see trends, patterns, outliers, or correlations that may be more conducive to various analysts, management, executives, auditors, or stakeholders.

Sifting Through the Data

Well integrated or self-service tools would be ideal for accomplishing the aforementioned tasks of ensuring data accuracy, preparing data, and performing data mining. Moreover, tools that can easily handle big data are ideal if an organization has this volume of data to sift through.

An ideal solution provides an end-to-end process that acquires data from any data source and allows the user to prepare it for mining. As noted above, your solution has to have the ability to apply advanced analytics to gain as much insight as possible. This type of solution would then enable the user to act on the data in the needed way. Finally, the perfect solution would also enable users to automate certain data processing executions to mirror the work done, but on a regular basis.

To learn more about advanced analytics, check out this white paper.

Get Insights

For a deeper dive into this topic, visit our resource center, where you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.

Download White Paper