Bin values into numeric ranges

Editing "Round by..." attribute

Horizontal bar chart showing logarithmic binning of numeric data. Seven ranges are displayed from single value '1' (1%) up to '100 to 249' (3%), with the largest concentrations in mid-ranges: '10 to 24' at 30% and '25 to 49' at 26%. Total sample size N=100.

Protobi automatically bins numeric variables into ranges when you create a project and you can edit the bins to reflect your analysis goals.

Protobi data view showing question Q3 with automatic binning applied. The question asks 'Thinking about your Condition Y patients, how many of those are currently receiving a GA (gamma antagonist)?' Five ranges are displayed from '1 to 20' (64%) down to '81 to 100' (3%), with Mean of 21.4 and N=100.

"Round by..." dialogue

Press the context menu icon and select "Round by...".

Context menu dropdown showing various data manipulation options. The 'Round by...' option is highlighted with a blue tooltip stating 'Select ranges to bin numeric variables'. Menu shows keyboard shortcuts like 'E B' for Round by, 'E F' for Format, and 'E R' for Recode.

This dialogue will pop up:

Round by dialog box with header showing 'Round by' title and 'E B' keyboard shortcut. The dialog prompts 'Enter a bin size for rounding' with four radio button options: None (selected), Auto, Log, and Custom (with input field showing '25').

You can group data into the following options:

  • Auto — Auto-rounds based on the standard deviation
  • Log — Uses logarithmic ranges
  • Custom
    • Linear bins — Specify your desired equal bin size, e.g. 25
    • Custom bins — Enter an array of cut-points, e.g. [10,25,50]

Auto bins

The setting "Auto" chooses a linear range with bin sizes automatically selected to generate a reasonable number of ranges given the standard deviation, with boundaries that snap to nice multiples of 5 or 10.

Log bins

Logarithmic ranges choose bin sizes that neatly map to multiples of 10, 25, and 50, resulting in small ranges for small numbers and bigger ranges for bigger numbers.  

Bar chart showing logarithmic binning result for Q3. Seven ranges are displayed using logarithmic scaling from '1' (1%) through '100 to 249' (3%), with highest concentration in middle ranges '10 to 24' (30%) and '25 to 49' (26%). Mean=21.4, N=100.

Log ranges generally make sense for unbounded numbers and absolute counts, such as income, number of customers, or patient volume.  

Linear bins

Here we set "Round by" to Custom: 10 

Round by dialog showing Custom option selected with value '10' entered in the text input field. This setting will create linear bins of equal width (10 units each).

which generates ranges of equal width:

Bar chart showing linear binning result with equal-width ranges of 10 units. Eight ranges are displayed from '1 to 10' (44%) through '91 to 100' (3%), demonstrating consistent bin size throughout. Mean=21.4, N=100.

Linear ranges are inclusive of the upper bounds, and exclusive of lower bounds, and rounded to reasonable precision for display. So in above example, the range "21 to 30" really represents "20 < Q3 <= 30".

Custom bins

It's possible to choose arbitrary ranges by specifying a list of cut-points, separated by commas. Here we set "Round by" to 10,25,50:

Round by dialog with Custom option selected and '10,25,50' entered as comma-separated cut-points. This will create custom bins with boundaries at 10, 25, and 50.

This generates bins that include the upper bounds:

Bar chart showing custom binning result using cut-points 10, 25, and 50. Four ranges are displayed: '≤ 10' (7.0%), '11 to 25' (27.2%), '26 to 50' (48.8%), and '> 50' (17.0%). The question asks about percent of purchases made online. Mean=34.7.

To specify bins that do not include the upper bound use a closing parenthesis instead of a bracket e.g. [0,10,25,50)

Bar chart showing custom binning with notation [0,10,25,50) where parenthesis indicates upper bound is exclusive. Four ranges displayed: '< 10' (35%), '10 to 24' (30%), '25 to 49' (26%), and '≥ 50' (9%). Question asks about GA treatment for Condition Y patients. Mean=21.4, N=100.

Remove decimals from bins

Custom bin ranges might show decimal places. For Q1 we want to see respondents in the following value bins: 0, 1, 2, 3-5, 6-9, and 10+. 

We set the custom bins: [0,1,2,5,9,100]. For the ranges (e.g. 3-5) specify the upper limit only. 

This results in some ranges showing decimals:

Bar chart showing custom bins with unwanted decimal places. Six ranges are displayed with decimals: '≤ 0' (5%), '0.1 to 1' (9%), '1.1 to 2' (11%), '2.1 to 5' (27%), '6 to 9' (9%), and '10 to 100' (39%). Question Q1 asks about Condition X patients receiving GA. N=100.

Add the "epsilon" attribute

One option to remove decimals is to add the "epsilon" attribute to the element's JSON. In mathematics epsilon represents an arbitrarily small quantity. Setting "epsilon" to 1 will remove any decimals from the binned ranges. 

JSON editor dialog showing element properties for Q1. The code shows the 'epsilon' attribute highlighted on line 63 with value set to 1. This epsilon setting controls decimal display in binned ranges. Other visible properties include key, title, color, type, chartOptions, roundby array with values [0,1,2,5,9,100], and displayKey.

The result:

Bar chart showing custom bins after applying epsilon=1 to remove decimals. Six clean ranges without decimals: '≤ 0' (5%), '1' (9%), '2' (11%), '3 to 5' (27%), '6 to 9' (9%), and '10 to 100' (39%). Question Q1 about Condition X patients. N=100.

Specify format

Another option is to specify formats for each bin. This will give you greater control of how the bins are labeled, since format will take any string of characters. 

From the context menu choose "Format..." and enter the desired format for each data cut-point. In addition to adding format, un-check "Group unformatted values". 

Format dialog showing a table with Value and Format columns. Six data values (0, 1, 2, 5, 9, 100) are mapped to custom format labels ('0', '1', '2', '3-5', '6-9', '10+'). The checkbox 'Group unformatted values into [other]' is unchecked. Additional columns for Mean, Sort as, Sort last, Hide, and Remove are visible.

This is the result after specifying format:

Bar chart showing custom bins with format labels applied. Six cleanly formatted ranges: '0' (5.0%), '1' (9.0%), '2' (12.0%), '3-5' (26.0%), '6-9' (9.0%), and '10+' (39.0%). Question Q1 about Condition X patients receiving GA treatment. N=100.