Protobi's autogroup feature helps you quickly organize thousands of data columns into a neatly organized dashboard in seconds. Autogroup works on a single collection of elements and neatly organizes the elements within it.

It reads column names, question text, and response options to infer the original survey structure, based on our years of experiencing setting up survey views from scratch. Protobi recognizes conventions commonly used in Forsta, Qualtrics, TypeForm, YouGov, Dimensions and other leading engines.

When to autogroup

Autogroup is particularly useful when you first create a project from an SPSS or CSV data file. For instance, a 45-minute length quantitative survey may easily have four thousand data columns. Organizing these manually would otherwise be very time consuming.

Autogroup can also be useful later when adding waves of a tracking study. If the new wave data file has new data columns, Autogroup can not only organize new questions into new groups and place these into appropriate places in the existing tree.

How to autogroup

The Autogroup dialog appears automatically when you open a newly created project if it has data and has not been already saved. Or, you can select a group element and choose "Autogroup" from the Advanced button in the toolbar.

Autogroup progress is visible as percentage in the upper right corner of the screen.. Once complete, you will see a "Autogroup complete" notification.

What autogroup does

Autogroup creates a neatly organized view that parallels your survey structure to start your analysis by inferring the survey organization from meta data in your survey data file.

If you're starting your Protobi project with a direct API connection to the survey engine, Protobi can read the survey layout directly from the survey engine, and there's no need for Autogroup.

But if you're starting from an SPSS ".sav" file or a Comma Separated Value ".csv" file exported by the survey engine, some important information can get lost or mangled. For instance,

  • how questions are grouped into pages or sections such as "Screener" or "Demographics" with intro text
  • text for grid questions such as "Which of these brands have you heard of?"
  • constraints on responses such as "[Must some to 100]"
  • skip patterns such as "Ask only if previously purchased"

Initially Protobi places all questions into a special group Fields in the order that they appear in the data file:

SPSS data files contain not only the response data, but also meta data for question text and formatted response values. This meta data often contains useful clues. Autogroup uses these clues to recover this information and provide a good initial view.

Infers hierarchy from column names

Autogroup organizes elements into a hierarchical structure reflecting the column names. The view below shows a project after it as first created.

The view below shows the same project after autogroup:

  • Q2v1 _1 is placed into a new group Q2v1
  • Q2v1 is placed into group Q with Q1 and Q3
  • Q is placed at the top level of the project Similarly, questions starting with S1, S2, etc. are placed into a group S

Elements are sorted intelligently so that Q11 is placed numerically after Q2 (rather than alphabetically before). Similarly, it places S4 in group S but not status or Segment which also start with the letter "s".

Your survey may follow other conventions. For instance instead of naming questions numerically, it may name them mnemonically such as HISTORY1–HISTORY6. In a case like this Autogroup would infer a group HISTORY as the parent group of these elements.

Moves major sections to top level

Autogroup generally keeps elements within their original parents. But if it identifies major sections with single letter names such as S, Q or A it will move these to the top level. A special rule is that Autogroup places S and Q rather than strictly sorting alphabetically since these often stand for "Screener" and "Questionnaire".

Extracts common text to parent elements

As Protobi creates groups it extracts common text shared by the child elements into the parent group.
Often survey engines will append text for grid questions to the labels for data columns in that grid.

For instance, S10v1_1 and S10v1_96 all share the common text

“Which of the following have you prescribed/recommended”.

Autogroup extacts this common text into the parent group S10v1 so that children have shorter more specific labels such as "Product A"

Identifies grid questions

Autogroup identifies groups of questions that have a common scale as grid questions.

For instance, here Q21 is a group of “Yes/No” data columns which Protobi identifies as a “Select all that apply” grid question with a binary response scale. This group is further displayed as a compact grid, showing just the percent "Yes" for concise display.

Similarly, Q4v1 is a group of numeric questions that Protobi identifies as a numeric response grid. This is further concisely as a compact grid showing the mean value for each child, and also rounding the values into numeric ranges based on the standard deviation.

Autogroup can also identify higher dimensional grids.

Identifies and refines rating scales

Autogroup can often identify ordinal rating scales with an implied order, such as Likert-type scales. In this example, Q10_1 is part of a grid question with a 1-10 response scale.

Here the survey engine the data file by appending scale values to the end of the response label, "Not at all likely 1". Autogroup moves the scale points to the front of the label so that they appear neatly vertically on screen.

Wrangles “Black sheep”

Surveys are written by human analysts and don't slavishly follow set numbering conventions. In this example, the analystl used a suffix Q21A to include a new question between questions Q21 and Q22.

Autogroup can often identify these as "black sheep" questions, and move these up a level or two. In this example, moving Q21A up a level allows Q1 to be recognized as a binary grid.

Removes question numbers from title text

Some SPSS files have the question number in the titles (i.e. S1: What is your speciality), this is commonly seen in Decipher SPSS files. Autogroup removes the question number from the titles:

Squishes multiple responses

Autogroup will condense a set of multiple responses columns for an open end text question into a single multiple-response element (using the Protobi "squeeze" transform):

Identifies other/specify responses

By default Autogroup identifies “Other (specify)” questions with open-end response text and places these into a special group titled OE. Do not select this option if you do not want to move all the other specify open-ends into one group, and would prefer they be in chronological order of the survey (i.e. S1_other would come right after S1) .

Sets big collections to "Tabs" view

Autogroup will try to keep elements visible togther, But if it creates a group that has too many elements or too many elements that don't display as concise compact grids it will set the view to "Tabs" to avoid the user needing to scroll too much:

Advanced options

Autogroup has a lot of detailed options you can override. For most projects the default values are good and based on our cumulative experience. But if your survey follows a different convention or uses a different survey engine, these can be selectively changed.

Use suffix if key exists

Take a survey that has questions S1, S1_1, S1_2, S1_3. Since S1 already exsits in the data, a different parent key should be used to group S1_1, S1_2, S1_3 together. Specify a suffix to add when the key already exists, so instead of using S1, Protobi will use S1# to group S1_1, S1_2, S1_3 together.

Tabbed if collection over {X}

This option lets you set a rule that groups with X number of sub-elements should be tabbed collections. "10" is the default, if a group has 10 or more sub-elements it will be turned into a tabbed collection. If 9 or less, then it will show in its standard view which is one tall page where you can continue scrolling to see all sub-elements.

Grid text delimiter (Decipher)

Define the delimiter when extracting the parent-level question text from questions in a Decipher SPSS file.

Decipher's question text pattern is often in the following structure: Key: Option level text - Parent level text. The delimiter between option level text and parent level text is the first instance of a dash.

Note: If you select this option, make sure "Grid text regex (Confirmit)" is not selected. Otherwise, Protobi will seperate the text at the delimiters specified in both options. If your SPSS file is neither Decipher nor Confirmit, you can select the "Grid text regex (Decipher)" option and just change the delimiter to whatever delimiter is used to seperate option level text and parent level text in your file.

Grid text regex (Confirmit)

Define the delimiter when extracting the parent-level question text from questions in a Confirmit SPSS file.

Confirmit's question text pattern is often in the following structure: Option level text (Parent level text). The delimiter between option level text and parent level text is the first instance of an opening parentheses.

Note: If you select this option, make sure "Grid text regex (Decipher)" is not selected. Otherwise, Protobi will seperate the text at the delimiters specified in both options. If your SPSS file is neither Decipher nor Confirmit, you can select the "Grid text regex (Decipher)" option and just change the delimiter to whatever delimiter is used to seperate option level text and parent level text in your file.

Regex for 'other/specify'

This option lets you enter regex to identify other specify open-ends. The default is to look for element where "other", "oe" or "Other" are in the name. If your data has a different convention like "TEXT" in the column name, then you would edit this field to reflect the convention in your data file.

Regex to tokenize keys

This option lets you specify the regex to tokenize keys. Feel free to reach out to support@protobi.com if you have any questions on how to use this option in your specific project.

The default regex is (^hid|^MODE|[a-z]+|[A-Z]+|[0-9]+|[.]|[^a-zA-Z0-9.]+) The default defines a "token" as:

One possible case is to define "^h" as a token. In that case, all "hid" elements would get grouped under group h and since that's a single-letter group name it gets pulled to the top level.

After autogroup

Autogroup aims to be a good start for your analysis, but it's just a start. Protobi allows you to edit and refine your groupings at any time.

Please see:

Advanced support

Our support team is ready to help you with your specific goals. Please contact us at support@protobi.com if you have any questions.