Use alternative values to calculate summary statistics

"Mean" values in the format dialog

In the format dialog, there is a column "Mean" where you can enter alternative values. These "Mean" values override original values when calculating summary statistics. Let's take a look at two situations where this feature would be applicable.

Use alternative values to calculate summary statistics for questions with:

  • Qualitative response categories
  • Range response categories

Qualitative response categories

In q1, the response categories are displayed to the user as: "Very happy," "Pretty happy," etc. However, in the survey data these response categories are represented by integer values (e.g. 1, 2,...). 

In this type of question, summary statistics like "mean" aren't particularly useful for analysis because they're calculated using "Value." However, if you can assign more purposeful numeric values to response categories, then summary statistics will have more meaning. 

Survey question q1 asking 'Generally, how would you say things are these days in your life' with response options and statistics showing Mean: 2.07, Std Dev: 1.43, Median: 2.00.

We can view q1 values by choosing "Format..." from the context menu:

Edit format dialog showing the Value column with numeric codes 1, 2, 3, and 9 assigned to happiness responses and a 'Group unformatted values into [other]' checkbox.

Below for the same element q1, we assigned "Mean" values to the response choices that are indicative of their worth. How these values are determined can be subjective. 

Edit format dialog with the Mean column highlighted, showing alternative numeric values: 100 for 'Very happy', 50 for 'Pretty happy', -50 for 'Not too happy', and -100 for 'Don't know/Refused'.

After "Mean" values are specified, we can see that the summary statistics are calculated using those values instead. 

Survey question q1 displaying updated statistics after alternative values were applied, showing Mean: 45.0, Std Dev: 54.8, Median: 50.0.

Range response categories

Another instance where alternative values are useful is when questions have response choices that are ranges. In the element, age, the summary statistics seemingly do not correlate to the data. 

Age question showing response categories '18 to 24' (71.8%), '25 to 31' (10.5%), '32 to 38' (11.4%), '39 to 45' (6.4%), '46 to 52' (0.0%) with statistics Mean: 1.52, Std Dev: 0.927, Median: 1.00.

Looking at the format for age, we see that there are random integer "Values" assigned to the range categories. You can add the appropriate "Mean" values to adjust the summary statistics.

Edit format dialog for age variable showing Mean column with midpoint values: 21 for '18 to 24', 28 for '25 to 31', 35 for '32 to 38', 42 for '39 to 45', and 49 for '46 to 52'.

The result is seen below:

Age question displaying recalculated statistics after applying midpoint alternative values, showing Mean: 24.7, Std Dev: 6.49, Median: 21.0.