Process data in Protobi

38d87d9852a66eea29da00ae16511664

Updated at October 15th, 2019

Even for the best designed surveys, you may need to do additional data refactoring and cleaning :

  • remove respondents
  • merge in translations
  • combine month and year into a single date
  • combine waves
  • stack patient cases, choice cards, etc.
  • create a new variable

For these you might use SPSS, SAS, R, Excel or something else to make those changes. You can use those tools outside of app, and upload the revised data to Protobi just fine.

But -- and you might like this a lot better, you can do serious data processing in Protobi as well. Protobi is written in Javascript. Your browser runs Javascript. Javascript is a powerful first-class language comparable to C, Perl, and Python which can do many things (and possibly do them more easily than you could in SPSS or SAS).

One advantage of processing data in Protobi is that all your processing code is in one place, neatly organized, with changes tracked.

Example

Let's say your study asks for a date, but asks it as three columns:

  • F4_1 month of year
  • F4_2 day of month
  • F4_3 year

You might want to combine it into a single date value for analysis.

In the Admin page for your project choose the Pre-Calculate tab.


This editor starts with the simplest possible code return rows which just returns an array containing your dataset.

In this page we can write a program that iterates over this array and creates a new data column F4_date that concatenates these into an ISO date string with format YYYY-MM-DD:

rows.forEach( function(row) { 
    if (row.F4_2) {
       row.F4_date = [row.F4_3, row.F4_1, row.F4_2].join('-')
    }
})
return rows;


The page should now look like:


In the project you now have a new data column F4_date which is a string value: 


With this new variable, you can set its type to date so that it will be interpreted as a date value, allowing you to display it in more interesting ways: 


How it works

The program above is written in Javascript syntax. This is a full programming language that lets you do pretty much anything you might do in SPSS, Python or R.

The code operates on the data in Protobi, and returns either the dataset itself (after being modified) or can even return entirely new datasets.

This is Javascript code defining a function that operates on your data. Not familiar with Javascript yet? No worries. It's a complete programming language as powerful as R, SAS, C++, Python and Perl. And it runs in your browser.

This function receives a variable rows which contains your raw data. And it returns your processed data. In this simple example, they happen to be exactly the same, it simply returns your raw data.

How your data is represented

Protobi stores your data in the server as a CSV data file:

country,s0,s1,s2
US,Oncology,30,Children's Hospital,...
IN,Ophthamology,Lilati Hospital,...
DE,Pulmonology,99,Universitätsklinikum Tübingen

In the browser, your raw data is parsed and stored in memory as a Javascript array variable called rows.

Each element of the array represents one row of your data. For example

[
  {   
    "country": "US", 
    "s0": "Oncology", 
    "s1": 30, 
    "s2": 
    "Children's Hospital", 
    ... 
  }, 
  {   
    "country": "IN", 
    "s0": "Ophthamology",
    "s1": 22,
    "s2": "Lilavati Hospital", 
    ... 
  },
  {   
    "country": "DE", 
    "s0": "Pulmonology",  
    "s1": 99, 
    "s2": "Universitätsklinikum Tübingen", 
     ... 
    },
  ...
]

Processing data

You can read and modify your data programmatically.

Read the data

You can access rows by integer index (starting from zero). For instance, the first row is an object:

rows[0] // {   "country": "US", "s0": "Oncology", ... }

You can access column values within a row by column name. If the column name is simple (starts with a character and uses only alphanumeric characters or underscores after that), you can just use dot notation. Or you can always use reference notation as below:

rows[0].s0    // dot notation, "Oncology"
rows[0]["s0"] // index notation, "Oncology"

Modify the data

You can modify the data directly. For instance, in the example below we're setting the variable "region" to "North America" for the first row. The variable region need not be defined in advance, you can just create it:

row[0]["region"] = "North America"

Iterate over rows

You can iterate over rows in several ways. One way is to pedantically type the row number:

rows[0]["s0"]  // "Oncology"
rows[1]["s0"]  // "Ophthamology"
rows[2]["s0"]  // "Pulmonology"

You can iterate using a for(...) {...} loop:

for (var i=0; i<rows.length; i++) {
   var row = rows[i];
   if (row.s0 === 'Oncology') row.specialist = true;
   if (row.s1==99) row.s1 = null;  // recode `99` to missing
}

Javascript arrays have a method .forEach(fn) whose argument is a function that receives the row and its index as arguments:

rows.forEach(function(row) {
   if (row.s0 === 'Oncology') row.specialist = true;
}

Define new variables

You can define new variables as you wish. Let's say that we wish to calculate a sum of questions A4_a, A4_b, and A4_c. You can do it as follows:

rows.forEach(function(row, index) {
   row.S1_seg = row.S1 > 
   row.A4_sum = (+row.A4_a) + (+row.A4_b) + (+row.A4_c) 
});

Filter rows

You can exclude rows from your data by using the Javascript array method filter. For instance, if our dataset includes all respondents who took the screener, but we wish to focus analysis only on people who completed the survey, we might filter for completes as follows:

rows = rows.filter(function(row) {
   return row.complete == true;
})

Advanced coding

You have access to all of Javascript in your browser. Here's just a few examples:

  // use Math functions
  row.radius = Math.sqrt(row.area / Math.PI)   


  // Use Regular Expressions to test for string matches
  row.rituximab = row.A4.test(/uximab/i) ? 1 : 0;

Tricky point: Numbers vs strings

In the above example we used the + operator to convert the value row.A4_a from a string to a number.

Protobi loads your data from CSV files, and CSV files represent all data as string values. The Protobi app is tolerant of number types, so 5 and "5" are both interpreted as the number five. Normally you don't need to think about it.

But when processing data files, you may need to be aware of the difference because Javascript adds strings and numbers differently.

Adding two numbers yields a number (e.g., 2 + 2 is 4) but adding two strings yields a string (e.g., "2" + "2" is "22").

We can cast a string to a number by using the + sign. E.g., (+"2") + (+"2") yields the number 4 instead of the string 22.

Pre-calculate vs Data processes

In the above example we wrote the code under Pre-calculate. This code will execute dynamically every time you open the project, and it applies to the primary data file.

For more complex processes it is also possible to create a process under the "Data" tab. An advantage of static data processes is the ability to operate on multiple files, for tasks like merging and stacking. A data process is only executed when you choose to "Edit/run" it. Its result is stored as a data file which can be the primary data file for the project.

Processing in other languages

Do you prefer to work in R or Python? Or do more complex work locally? You can process Protobi data in other languages too. Use the Protobi REST API to download a file from Protobi and/or upload a new data file when you're done.

You can also use Protobi R library which wraps the REST API to load data into or from an R data frame.

Advanced support

Protobi can handle data processing that's a lot more complicated than what's mentioned above. Rather than try to put it all in a tutorial, our support team is ready to help you with your specific goals. We can give you code that does what you need and show you how to modify it from there.

Please contact us at support@protobi.com

Was this article helpful?