In even the best designed surveys, there is nearly always some data refactoring and cleaning you just have to do:

  • remove respondents
  • merge in translations
  • combine waves
  • stack patient cases, choice cards, etc.
  • create a new variable

For these you might you SPSS, SAS, R, Excel or something to make those changes. You can still use those tools outside, and upload the revised data to Protobi just fine.

Process data in Protobi

But ... and you might like this a lot ... you can do serious data processing in Protobi as well. Protobi is written in Javascript. Your browser runs Javascript. Javascript is a powerful first-class language comparable to C, Perl, and Python which can do many things possibly easier than you can in SPSS or SAS.

One advantage of doing processing in Protobi is that all your processing code is one place, neatly organized, with changes tracked.

Get started

Select Edit project... for your project and choose the Pre-Calculate tab. You'll see a code editor. It starts with the simplest possible code:

return rows;

This is Javascript code that operates on your data.

This function receives a variable rows which contains your raw data. And it returns it as your processed data.

Not familiar with Javascript yet? No worries. It's a complete programming language as powerful as R, SAS, C++, Python and Perl. And it runs in your browser.

Your data table is a Javascript array

Your raw data is represented as an array variable called rows.

Each element of the array represents one row of your data. For example

[
    {
        "country": "US",
        "s0": "Oncology",
        "s1": 30,
        "s2": "Children's Hospital",
        ...
    },
    {
        "country": "IN",
        "s0": "Ophthamology",
        "s1": 22,
        "s2": "Lilavati Hospital",
        ...
    },
    {
        "country": "DE",
        "s0": "Pulmonology",
        "s1": 99,
        "s2": "Universitätsklinikum Tübingen",
        ...
    },
    ...
]

Access the data

In this notation, you can access your data directly.

You can access rows by integer index (starting from zero). For instance, the first row is an object:

rows[0] // {   "country": "US", "s0": "Oncology", "s1": 30,  ... }

You can access column values within a row by column name. You can always use index notation, putting the column name in quotes within square brackets. If the column name is simple (starts with a character and uses only alphanumeric characters or underscores after that), you can just use dot notation:

rows[0]["s0"]  //index notation, "Oncology"
rows[0].s0;    //dot notation, "Oncology"

Modify the data

You can modify the data directly. For instance, in the example below we're setting the variable "region" to "North America" for the first row. The variable region need not be defined in advance, you can just create it:

if (row[0].country == "Canada") row[0]["region"] = "North America"

Note that in Javascript that == is used to compare values, whereas = assigns values.

// this sets country to "Canada" for everyone
// and then assigns their row to "North America"
if (row[0].country = "Canada") row[0]["region"] = "North America"  // CAREFUL!

Iterate over rows

You can iterate over rows in several ways. The simplest is just to pedantically type the row number:

rows[0]["s0"]  // "Oncology"
rows[1]["s0"]  // "Ophthamology"
rows[2]["s0"]  // "Pulmonology"

You can iterate using a for(...) {...} loop:

for (var i=0; i < rows.length; i++) {
    var row = rows[i];
    if (row.s0 === 'Oncology') row.specialist = true;
    if (row.s1==99) row.s1 = null;  // recode `99` to missing
}

Javascript arrays have a method .forEach(fn) whose argument is a function that receives the row and its index as arguments:

rows.forEach(function(row) {
if (row.s0 === 'Oncology') row.specialist = true;
}

Define new variables

You can define new variables as you wish. Let's say that we wish to calculate a sum of questions A4_a, A4_b, and A4_c. You can do it as follows:

rows.forEach(function(row, index) {
row.S1_seg = row.S1 >
row.A4_sum = (+row.A4_a) + (+row.A4_b) + (+row.A4_c)
});

One tricky point

Wait... what's with the extra parentheses and plus signs? Why not just write it like this?

rows.forEach(function(row, index) {
row.A4_sum = row.A4_a + row.A4_b + row.A4_c
});

Protobi loads your data from CSV files. Initially all data is represented as strings. Javascript adds strings and numbers differently. Adding two numbers yields a number (e.g. 2 + 2 is 4) but adding two strings yields a string (e.g. "2" + "2" is "22").

But we can cast a string to a number by using the + sign. E.g. (+"2") + (+"2") yields the number 4.

Filter rows

You can exclude rows from your data by using the Javascript array method filter. For instance, if our dataset includes all respondents who took the screener, but we wish to focus analysis only on people who completed the survey, we might filter for completes as follows:

rows = rows.filter(function(row) {
return row.complete == true;
})

It's Javascript

You have access to the entire Javascript language in your browser. Here's just a few examples:

row.radius = Math.sqrt(row.area / Math.PI)   // use Math functions
row.rituximab = row.A4.test(/uximab/i)       // Use Regular Expressions to test for string matches

Data processes

The code in Pre-calculate runs every time you open the browser, and is convenient for simple coding or rapid testing. For more complex steps like stacking and merging dataset, create a data process. These run once and save the results so are more efficient for complex code.

In your project admin page go to the Data tab , press "New process" and give it a name. This defines a new data process.

Access datafiles

Press "Edit/Run" to edit the code. This operates much like Precalculate except it receives data from all your data files. Let's say you have three datafiles named "wave1", "wave2" and "segments".

data["wave1"]  // array of rows for wave1
data["wave2"]  // array of rows for wave1
data["segments"]  // array of rows for wave1

Simple example

You can return one of these as your dataset:

var rows = data["wave1"]
return rows;

Stack waves

You can stack waves of datat:

var rows = CSV
        return data["wave1"]

Advanced data processing

Protobi can handle data processing a lot more complicated than above. Rather than try to put it all in a tutorial, our support team is ready to help you with your specific goals, and give you code that does what you need and show you how to modify it from there. Please contact us at support@protobi.com