Bin values into numeric ranges

Editing "Round by..." attribute

Updated at September 26th, 2022

Protobi automatically bins numeric variables into ranges when you create a project and you can edit the bins to reflect your analysis goals.

"Round by..." dialogue

Press the context menu icon and select "Round by...".

This dialogue will pop up:

You can group data into the following options:

  • Auto — Auto-rounds based on the standard deviation
  • Log — Uses logarithmic ranges
  • Custom
    • Linear bins — Specify your desired equal bin size, e.g. 25
    • Custom bins — Enter an array of cut-points, e.g. [10,25,50]

Auto bins

The setting "Auto" chooses a linear range with bin sizes automatically selected to generate a reasonable number of ranges given the standard deviation, with boundaries that snap to nice multiples of 5 or 10.

Log bins

Logarithmic ranges choose bin sizes that neatly map to multiples of 10, 25, and 50, resulting in small ranges for small numbers and bigger ranges for bigger numbers.  

Log ranges generally make sense for unbounded numbers and absolute counts, such as income, number of customers, or patient volume.  

Linear bins

Here we set "Round by" to Custom: 10 

which generates ranges of equal width:

Linear ranges are inclusive of the upper bounds, and exclusive of lower bounds, and rounded to reasonable precision for display. So in above example, the range "21 to 30" really represents "20 < Q3 <= 30".

Custom bins

It's possible to choose arbitrary ranges by specifying a list of cut-points, separated by commas. Here we set "Round by" to 10,25,50:

This generates bins that include the upper bounds:

To specify bins that do not include the upper bound use a closing parenthesis instead of a bracket e.g. [0,10,25,50)

Remove decimals from bins

Custom bin ranges might show decimal places. For Q1 we want to see respondents in the following value bins: 0, 1, 2, 3-5, 6-9, and 10+. 

We set the custom bins: [0,1,2,5,9,100]. For the ranges (e.g. 3-5) specify the upper limit only. 

This results in some ranges showing decimals:

Add the "epsilon" attribute

One option to remove decimals is to add the "epsilon" attribute to the element's JSON. In mathematics epsilon represents an arbitrarily small quantity. Setting "epsilon" to 1 will remove any decimals from the binned ranges. 

The result:

Specify format

Another option is to specify formats for each bin. This will give you greater control of how the bins are labeled, since format will take any string of characters. 

From the context menu choose "Format..." and enter the desired format for each data cut-point. In addition to adding format, un-check "Group unformatted values". 

This is the result after specifying format:

Was this article helpful?