Join additional data to your project

Updated at September 13th, 2022

There are two types of data merging. One is to combine different datasets by appending more rows, and the other is joining data to merge in new variables.

For example, there are new variables to add to a project, or you need to replace existing data (e.g. bring in coded values of open-ends). In this case, a left join is used to merge datasets. This function keeps data in the left table, and only brings in the data on the right table if there is a matched key.

Data process code

The example code below merges coded open-ends into an existing Protobi project using a left join. 

The left join merges two files only - the "main" dataset (left table) and the new "OE" dataset with the coded open-ends (right table). The keys are used to match the new data with the rows of data in the left table.

var rows= data["main"] //primary data file
var OE = data["OE"]    //auxiliary data file 

// Keys are merge keys. 
// Keeps are all the fields to add to the main data file or replace existing data.

var keys = ["responseid", "respid"]
var keep = ['q19_OE','q27_OE']

rows = Protobi.left_join(
       rows, //left table (project's existing data)
       OE,   //right table (data to be joined into project)
       keys, //join keys 
       keep  //the variables from the right table to join

If you want to include all the columns from the right table 

You don't have to explicitly name every column in the "keep" array if you want to bring in all columns. Instead, of reference keep in the left-join use the code below.

Object.keys(OE[0])  //All variables from the right table

If identical questions that have different keys

Instead of an array, keys and/or keep can also be an object of key pairs. This is useful if the two files do not have identical column names for identical questions, such as capitalization changes or name differences.

The first name in a pair is the name of the column in the old file, and the second is the name of the column in the new file. 


var keys = {"responseid":"ResponseID", "respid":"RespID"}
var keep = {"q19_OE" : "q19_OE_coded", "q27_OE" : "q27_OE_coded"}

If new variables use existing keys

Say there is an open end you want to bring in, "Q8", however the project already contains a column Q8 that you do not want to overwrite. Use a ".forEach" function to assign a new name (e.g. "row.Q8_oe") for the column from the new file. 


//This block of code only runs on the "OE" data file
OE.forEach(function(row) {
    row.Q8_oe = row.Q8

Left join more than two datasets

The code below works for instances where there are multiple new files to merge into a project. Each file will need a separate "Protobi.left.join" clause where the column from the particular file is referenced.

var rows = data["main"]
var Q4 = data["Coded_Q4.csv"]
var Q6 = data["Coded_Q6.csv"]
var Q8 = data["Coded_Q8.csv"]
var Q10= data["Coded_Q10.csv"]

var keys = ["respid"]
var keep = ['Q4','Q6','Q8','Q10']

rows = Protobi.left_join(rows, Q4,  keys, ["Q4"]);
rows = Protobi.left_join(rows, Q6,  keys, ["Q6"]);
rows = Protobi.left_join(rows, Q8,  keys, ["Q8"]);
rows = Protobi.left_join(rows, Q10, keys, ["Q10"]);

window.rows = rows;
return rows;

Reminder: For data processes, "Save" and "Run" the process after you are done editing the code view. To use the result of the process as the primary data for the project, you will need to set it as "Primary".
Data processes are specific to each project, and your code may not look identical to our example.

Was this article helpful?