Join additional data to an existing project

Updated at October 6th, 2020

In Combine data using a simple data process, we reviewed a function that combines datasets by appending the rows of data vertically. This works when the same data column names are used for the same questions (e.g. different waves of the same study).

However, you might need to merge in new data with the same respondents as the existing data. For example, you might have new data fields to add to a project, or you need to replace existing data (e.g. appending specialties to an existing data file, or to bring in coded values of open-ends). This involves appending the new data columns horizontally. 

In these cases, a left join is used to merge the datasets. This function keeps data in the left table, and only brings in the data on the right table if there is a matched key.

Data processing code

Left join two datasets

In the example below is a data process that merges coded open-ends into an existing Protobi project. 

The process code joins two files only - the "main" dataset (left table) and the new "OE" dataset with the coded open-ends (right table). The keys are used to match the new data with the rows of data in the left table.

var rows= data["main"] //primary data file
var OE = data["OE"]    //auxiliary data file 

// Keys are merge keys. 
// Keeps are all the fields to add to the main data file or replace existing data.

var keys = ["responseid", "respid"]
var keep = ['q19_OE','q27_OE']

rows = Protobi.left_join(
       rows, //left table (project's existing data)
       OE,   //right table (data to be joined into project)
       keys, //join keys 
       keep  //the variables from the right table to join

Match identical questions that have non-identical data column names

Instead of an array, keys and/or keep can be an object of key pairs. This is useful if the two files do not have identical column names for identical questions, such as capitalization changes or name differences.

The first name in a pair is the name of the column in the old file, and the second is the name of the column in the new file. 

For example:

var keys = {"responseid":"ResponseID", "respid":"RespID"}
var keep = {"q19_OE" : "q19_OE_coded", "q27_OE" : "q27_OE_coded"}

Left join more than two datasets

The code below works for instances where there are multiple new files you want to merge into a project. For each file you will need a separate "Protobi.left.join" clause where the column from the particular file is referenced.

Rename data columns

Say you have an open end you want to bring in, "Q8", however the project already contains a column Q8 that pertains to a different question. You can use a ".forEach" function that names a new "row.Q8_oe" and assigns it the correct value from the new file. 

var rows = data["main"]
var Q4 = data["Coded_Q4.csv"]
var Q6 = data["Coded_Q6.csv"]
var Q8 = data["Coded_Q8.csv"]
var Q10= data["Coded_Q10.csv"]

Q8.forEach(function(row) {row.Q8_oe = row.Q8})
Q10.forEach(function(row) {row.Q10_oe = row.Q10})

var keys = ["respid"]
var keep = ['Q4','Q6','Q8','Q10']

rows = Protobi.left_join(rows, Q4,  keys, ["Q4"]);
rows = Protobi.left_join(rows, Q6,  keys, ["Q6"]);
rows = Protobi.left_join(rows, Q8,  keys, ["Q8"]);
rows = Protobi.left_join(rows, Q10, keys, ["Q10"]);

window.rows = rows;
return rows;

Reminder: For data processes, "Save" and "Run" the process after you are done editing the code view. To use the result of the process as the primary data for the project, you will need to set it as "Primary".
Data processes are specific to each project, and your code may not look identical to our example.

Was this article helpful?