November 22, 2022

The 7 Deadly Sins of Propensity Modeling

Mistakes to Avoid When Building a Propensity-to-Buy Model

A propensity-to-buy model uses logistic regression or other advanced non-linear machine learning techniques to predict how likely a prospect is to buy your products. These predictions are based on characteristics about your customers, like their industry, geography, and size, along with historical performance data from your organization.

Propensity-to-buy models help sales leaders balance sales capacity and prioritize accounts with the highest likelihood of making a purchase, all while providing reps with an equal number of high-potential accounts to pursue.

While the propensity-to-buy modeling process is simple in theory, there are many opportunities for mistakes that can result in misinformed decisions by sales and sales operations teams.

As you start building or refining your propensity-to-buy model, here are some common missteps to avoid:

#1 Using Business Judgment to Weight Variables

Some organizations develop complex analyses to understand drivers of high and low propensity-to-buy. Issues arise when analysts or other team members assert variable weights based on their own surface-level insights. For example, you may find that you win more often in some geographies and give those a high weight (e.g., 50) and low-win geographies a low weight (e.g., 0). This is a mistake because it doesn’t appropriately quantify the true impact of geography in a statistical way.

Instead, it is better to use statistics (e.g., logistic regression) or machine learning (e.g., XGBoost) to assign appropriate weights to variables that will maximize the overall predictive power of your propensity-to-buy model.

#2 Not Back-Testing Your Model to Ensure It’s Predictive

Once an organization builds and trains a propensity-to-buy model, they might immediately deploy the model without proper back-testing. However, it’s crucial to back-test your algorithm to make sure it is indeed predictive. Not doing so may result in a model that is not very predictive — or worse — even predicts the reverse (e.g., high-quality accounts are given low scores and vice-versa).

Instead of deploying your model right away, we believe it best to do two things. First, always keep a small hold-out set of data, and test the algorithm on a random subsection of data it has never seen. Second, periodically (e.g., quarterly) check that accounts with high propensity scores are indeed buying at a higher rate than those with relatively lower scores.

#3 Developing a Global Model That Doesn’t Include Non-Linear Patterns

In some cases, organizations develop a single, global propensity model that encompasses all territories and customer segments. And while this approach sounds comprehensive in theory, it can create false signals. For example, a global model that has many small companies may extrapolate industry attractiveness to large enterprises. However, we frequently find that attractive industries will often be different between large and smaller companies.

Rather than building a global model, investigate where data relationships are not linear. When you find non-linear correlations, you can either use ML models that can handle non-linearity, or you can stratify data into multiple groups and train multiple simpler models on the stratified data.

#4 Using Low-N Noise as Signal

Sometimes, an analysis will find that certain variables (e.g., geographies or industries) seem to have very positive or negative signals. However, in reality, signals appear particularly high or low as there are only a few examples within the dataset. What appears to be a potential signal is actually noise and inconsequential on its own.

Instead of interpreting these results as definitive, you should aggregate signals from small sample sizes to avoid over-fitting. For example, nascent industries with only a handful of historical examples can be combined in an “Other” bucket.

#5 Treating Missing Data as Bad Data

Some organizations fall prey to the myth of data perfectionism. They will only use perfect, complete data and disregard the rest. For example, if a data set doesn’t have perfect competitive information, they may disregard it. In reality, a perceived lack of information can actually be a signal in itself. In the previous example, missing data points may be bad data, or they might be an indication of a green-field account which could be a very positive signal.

Rather than selectively excluding “incomplete” data or striving for data perfectionism, it is best practice to investigate missing data to see if it is predictive, then make a feature out of it if it’s found to be such.

#6 Failing to Correct Historical Data for Bias

In most cases, the question is not whether data is biased, but how. For example, reps’ win-loss data may be extremely biased because some reps may achieve high scores by only entering prospects into the pipeline tracking system when they’re nearly certain they’ll be successful.

As you build your model, it’s important to understand how CRM hygiene and data collection may bias your data. From there, you can take appropriate steps to reduce bias and improve data quality, such as only using data after CRM hygiene is improved or excluding certain sales reps from your analyses.

#7 Failing to Look for Future Data Leakage

The last and most insidious issue occurs when a historical data record contains information that was only available at a future time. In data science, we call this data leakage. For example, it’s possible that your organization only collects information on the number of seats for accounts with a high likelihood of success. In such a case, using quantity can be problematic because seat data is not known a-priori.

Data leakage is the hardest issue to spot, and it generally requires that you investigate features that have extremely high predictability and vet them with subject-matter experts.

Unlock Powerful Predictive Insights With Coro’s MoneyMap

Whether you’re new to the world of propensity modeling or looking to advance your analytics strategy, Coro’s MoneyMap can connect you with proprietary data and advanced analytics capabilities that inform revenue-focused decisions.

Use MoneyMap to measure propensity to buy by product, territory, and customer segment, then provide your commercial team with actionable strategies for engaging high-value accounts.

The insights you generate through MoneyMap can be used to inform account prioritization, capacity planning, and other critical sales activities, empowering your front line and furthering your revenue goals.

Organizations use insights from MoneyMap to: