Numeric variables with long-tail distributions

673d23dd68cf08627e90e16be103471a

Updated at October 15th, 2019

Protobi sets "Round by..." to auto which bins values into equal ranges. You can set "Round by..." to log for distributions with heavy tails. Learn more about the "Round by..." dialog in our "Bin numeric values" tutorial.

Protobi automatically bins numeric variables into ranges. Numeric variables come in a few varieties:

  • Constants (e.g. π = 3.141... )
  • Light-tailed distributions
  • Heavy-tailed distributions

Many variables we encounter in market research have light-tailed or even distributions, such as percentages, preference ratings, etc. 

Other variables such as number of patients, income, book sales, frequent flier miles, etc. have heavy-tail distributions. Benoit Mandelbrot coined the terms "mild" versus "wild" randomness to describe the difference.

Example

Below is an example where customers are asked for their purchase budget in dollars. This has a classic heavy-tail distribution with a small number of individuals with very large values.

By default, Protobi sets Round By = auto, which chooses linear bin sizes for numeric variables based on the standard deviation. We can see that many people have budgets of $1,000 to $5,000, and very few have budgets much over $30,000.

The second version uses default binning with Round By set to log ,which chooses logarithmic bin sizes. In the graph below we can see that there are quite a number of customers willing to spend under $1,000, and also a substantial number that are willing to spend a lot more.

A product strategy might be radically different with this perspective, selling differently to customers with $250 versus $2,500 to spend, rather than lumping them all into an "Under $5,000" category.

Was this article helpful?