There are two types of data merging. One is to combine different datasets by appending more rows, and the other is joining data to merge in new variables.

This tutorial demonstrates how to combine multiple waves of data by appending the data in Protobi using a static data process.

In this tutorial we'll show you how to combine:

  • Data tables:
    • from the same project (Synchronous)
    • from different projects (Asynchronous)
  • A data table with the result of a data process

Preparation

Combining data in Protobi involves creating a new data process and potentially adding data tables in the data tab of project settings. If you need a refresher, press the links to review those tutorial topics first. 

You will also need the names of the data tables you want to combine (e.g. "main", "wave1").


For the asynchronous data process, you will need to obtain each project's dataset ID. The dataset ID can be found in the project’s URL (blue text below).  




Combine data tables

There are two methods to combine multiple data tables in Protobi-- using a simple data process or an asynchronous data process. Example code for both types of data processes are included below. 

As always for data processes, "Save" and "Run" the process after you are done editing the code. To use the result of the process as the primary data for the project, you will need so set it as "Primary". 

Simple vs Asynchronous 

Static data processes come in two forms, "Simple" and "Asynchronous".   Simple is for when you want to operate only on data tables (blue icons) in the same project, and return a single modified table. 

Asynchronous is for when you need to do more sophisticated processing, including merging external data, operate on results of other processes, or data in other projects.


Simple

If the data tables you want to combine are from the same project use a simple data process.

The code first declares a variable for each data table that will be combined (e.g. var W1,) and assigns the value by referring to the respective data table name (e.g. data[“wave1”]). 

Variable names are referenced as an array in the Protobi.stack_rows function. 

Remember to include the return rows statement to recall the result of the stacking process. 

 var W1 = data["wave1"]
 var W2 = data["wave2"]
 var W3 = data["wave3"]
 var W4 = data["wave4"]
 

var rows = Protobi.stack_rows([W1 ,W2, W3, W4]);

 return rows

You can use similar code to append data fielded in different countries for the same survey.

Asynchronous

You can combine data tables that are from different projects by using an asynchronous data process. Asynchronous data processes allow you to draw data from multiple independent sources, call a wider array of functions, and take precise control over how the data is returned.

The main difference is it is up to the process code to pull in the input data required, and explicitly call a return function callback when done.

In the code, “var sources” is an array containing objects of key pairs. Next to “datasetId” enter the ID found in the project’s URL. Next to “keys” create a name that will represent the project. Next to "tables" enter the name of the data table you want to use from that project (if there's more than one use a comma to separate).


var sources = [
    {datasetId: "5dc40f3972a9af00046aceb3", key:"project1", tables:["main","main2"]},
    {datasetId: "5db276c8445a1c000477aa07", key:"project2", tables:["main"]}
    ]
   
async.mapSeries(
    sources,
    function(entry,cb) {
        Protobi.get_tables(entry.datasetId, entry.table, cb) 
    },
    function(err, tables) {
        if (err) {
            console.log("error message")
            return $.notify(err)
        }
        else {
            // datasets is now an array of results paralleling sources

            var project1 = tables[0]             var project1 = tables[1]              var project2 = tables[2]                      

            var rows = Protobi.stack_rows(tables)

    $.notify("Process complete", "success")

            console.log("rows",rows)                 return callback(null,rows)         }             })


Data table + result of a data process

Using an asynchronous data process, you can also combine data tables with the result of data processes. In the example below, the code combines a data table from one project with the result of a data process from another. 

var sources = [
    {datasetId: "5dc40f3972a9af00046aceb3", key:"project1", table:"main"},
    {datasetId: "5db276c8445a1c000477aa07", key:"project2", table:"process"}
    ]
   
async.mapSeries(
    sources,
    function(entry, cb) {
        Protobi.get_table(entry.datasetId, entry.table, cb)
    },
    function(err, tables) {
        if (err) return callback(err)
        else {
            // datasets is now an array of results paralleling sources
            var project1 = tables[0]
            var project2 = tables[1]
            
            console.log(tables)
                      
            var rows = Protobi.stack_rows(tables)
        
            $.notify("Process complete", "success")
            return callback(null,rows)
        }
    }
)