Programmatically "QC" data to identify outliers

Updated at September 29th, 2020

Your survey has completed fielding, and you're starting to look through the data only to find that some responses are out of the ordinary. Is it possible the outlier responses are not genuine and some participants were providing low quality responses?

There are different methods to check for outliers or low quality data in survey responses. Your programmer can include a variable that indicates length of time the participant took to complete the survey. You can add checkpoint questions to your survey that test if the participant is paying attention. There are features within Protobi that can help you identify the outliers and flag low quality respondents. 

Below there are three questions with instruction that checks if the participant is paying attention. 

Toggle the format button on the toolbar to look at the underlying values for each response option. These elements would be more suitable as flags if people who didn't pass the checkpoint had a value of "1" and everyone else a value of "0". 


Create "flag" variables

In data process, we can create new variables where it's more clear which responses are flagged. The new variables will be binary flags with "0" or "1" values.

Code for above example

//Create binary flag_ versions of check point questions

rows.forEach(function(row) {

    row.flag_Check1 = row.Check1 != 2 ? 1 : 0 //If respondent did not choose "2" for Check1 they will be      assigned "1" in flag_Check1

    row.flag_Check2 = row.Check2 != 3 ? 1 : 0

    row.flag_Check3 = row.Check3 != 1 ? 1 : 0

});


Count number of flags

After binary flags are created for each checkpoint variable, group the new elements together. We can "Transform..." the parent group to count the number of flags the respondents have. Each flag has a value of "1" if true, and respondents who aren't flagged have a value of "0". Choose "Transform..." to sum to get a count of flagged responses. 

Now we can see that 19% of respondents have 1 flag, and 4% of respondents have 2 flags. 


Flag straightliners

Straightlining is when respondents give the same, or nearly the same answer for a battery of questions. Straightlining can happen when the respondent wants to complete the survey quickly without putting in effort, or found the survey too long and complex. Q13 is a battery of 7-point scale questions, we may want to create a flag that detects respondents that gave a straightline answer to this set of questions.


Below is code that contains a function to check for straightliners. In the function, we assign a value of "1" (yes) if the standard deviation of the battery is 0. However, you can use a different condition such as if the deviation(vals) <= 2. 

Code to check for straightliners

//Method to check for straightliners

function straightline(row, kidKeys) {

    var vals = kidKeys.map(function(kidKey) { return row[kidKey] }); 

    var res = vals[0] && d3.deviation(vals) == 0 ? 1 : 0       //Assign value of "1" if deviation is = 0

    return res;

}    


//Groups for which you want to check for straightliners

var checks = {

    "Q13": ["Q13_1", "Q13_2", "Q13_3", "Q13_4", "Q13_5"],

    "Q20": ["Q20_1", "Q20_2", "Q20_3", "Q20_4", "Q20_5", "Q20_6"],

    "Q23": ["Q23_1", "Q23_2", "Q23_3", "Q23_4", "Q23_5", "Q23_6", "Q23_7"],

    "Q28": ["Q28_1", "Q28_2", "Q28_3", "Q28_4", "Q28_5", "Q28_6"]

}


//Create new "flag_" variables for straightline check

rows.forEach(function(row) {

    for (var key in checks) {

    row["flag_"+key] = straightline(row, checks[key])

    }


});


Result

The flag created for Q13 indicates that 2.6% of the respondents were straightliners.



If we press to drill into the flagged respondents, we can identify the six people by respid and see what straightline values they gave for Q13.


Use flags as filters

In flag_Q13 we can press "0" to drill into respondents with no flags. Then in Q13 apply "Filters..." to limit the view to only those respondents who weren't flagged in flag_Q13.


Advanced support

There are complex cases where you might want to flag respondents, not limited to what's shown in this tutorial. Contact us at support@protobi.com to discuss your specific needs.

Was this article helpful?