Automatically infer project organization from column names

Updated at September 8th, 2022

Protobi's Autogroup feature organizes elements into groups based on the name of the data column, recovering survey organization that may be lost in the datafile.


SPSS data files are great for market research data because they contain not only the response data, but also metadata, such as text corresponding to the questions and response options.  

But a lot of the survey structure gets lost in translation. Grid questions (e.g., "Which of the following ...") may generate multiple data columns but the structure to group all the columns under one question may not be contained in the data.

When you first create a project from a data file, Protobi creates an initial view with one element for each column in the data.  These are placed into a group called Fields.  


In the  example below, there is a long linear list of elements, e.g. q1, q2, q3, q4, q5a, q5b, q5c, q5d, q5e, q6, ...


To reorganize these automatically, select the group Fields by pressing on its header, then press the "Advanced" button and from the menu choose "Autogroup" and press "Ok". 

This will inspect the column names to infer an organizational structure:

  • q5a, q5b, q5c, q5d, q5e imply the existence of a group q5
  • q1, q2, q3, q4, q5, .. imply the existence of a group q


This algorithm is generally pretty good, especially if your firm follows common conventions in numbering survey questions.

But it isn't perfect, so you'll need to review for edge cases.  The algorithm can fail when question numbering is not quite consistent, e.g. if your survey has two questions Q6 and a Q6a, we'll assume that Q6a is a child of Q6.  

Was this article helpful?