Sankey flow diagrams

Sankey diagrams are sometimes useful to show the relationship between discrete variables. Applications include progression of therapy, customer conversion, or almost any crosstab. Here's how to create and customize them in Protobi. 

Sankey flow diagram showing the relationship between Q1 (happiness) and Q2 (health) with colored flows connecting response categories. Left side shows happiness levels (Very happy, Pretty happy, Not too happy, Don't know/Refused) and right side shows health ratings (Excellent, Good, Only fair, Poor, Don't know/Refused). Gray and white flowing bands connect the categories showing how responses correlate, with varying widths indicating the proportion of respondents.

The above chart shows the relationship between two discrete questions, Q1(happiness) and Q1 (health).

Flow diagrams from crosstabs

Below are these two questions shown as marginal distributions:

Two side-by-side bar charts showing marginal distributions. Left chart (q1) shows happiness levels with responses: Very happy 30.5%, Pretty happy 50.9%, Not too happy 15.4%, Don't know/Refused 3.2% (N=2511). Right chart (q2) shows health ratings with responses: Excellent 25.4%, Good 51.7%, Only fair 17.8%, Poor 4.9%, Don't know/Refused 0.3% (N=2511). Both display horizontal blue bars with percentages and frequency counts.

We can create a crosstab by dragging one element onto another (see brief video): 

Crosstab table showing Q2 (health) by Q1 (happiness) with blue-shaded cells. Rows show health ratings (Excellent, Good, Only fair, Poor, Don't know/Refused) and columns show happiness levels (Overall, Very happy, Pretty happy, Not too happy, Don't know/Refused). Each cell contains percentage values with stronger colors indicating higher percentages. Chi Square statistic of 416.9 (p = 0.000) shown at bottom, indicating significant relationship between variables.

Press the circle edit icon and from the context menu choose "Chart type..." and select "Sankey": 

Chart type selection dialog with blue header showing a grid of chart type icons. The Sankey chart type (showing flow diagram icon) is selected with green highlight in the third row. Other visible options include core, paired, tornado, column, bar, line, pie, scatter, cumul, venn, cloud, pmap, tabs, and data. Below the chart types are Chart options (stack, swap, legend, overall, reverse-x, reverse-y) and Legend position options (top, right, bottom). Cancel and Ok buttons at bottom.

The graph should now look like the one at top of this page. Colors are assigned by default, but you can override them by specifying colors, see below.

Flow diagrams from groups

You may have a sequence of variables, like first-, second- and third-line therapy. You can create a Sankey flow diagram directly from any collection of discrete variables. The order of columns in the diagram is the order of elements in the group.

Here we created a special group of Q1 (happiness), Q2 (health) and Q3 (wealth):

Three survey questions displayed in a grid layout under parent group Q1-Q3. Top section shows cq1 (happiness) with three response levels: Very happy 30.5%, Pretty happy 50.9%, Not too happy 15.4%, Don't know/Refused 3.2%. Top right shows cq2 (health): Excellent 25.4%, Good 51.7%, Only fair 17.8%, Poor 4.9%, Don't know/Refused 0.3%. Bottom shows cq3 (financial situation): Live comfortably 36.8%, Meet your basic expenses 29.2%, Just meet your basic expenses 22.6%, Don't even have enough 9.2%, [VOL. DO NOT READ] 2.1%. All questions display blue horizontal bars with N=2511.

Setting the chart type to Sankey creates a diagram as follows: 

Three-column Sankey flow diagram with black rectangular boxes and gray/white flowing bands. Left column (cq1) shows happiness levels (Very happy, Pretty happy, Not too happy, Don't know/Refused), middle column (cq2) shows health ratings (Excellent, Good, Only fair, Poor, Don't know/Refused), right column (cq3) shows financial situations (Live comfortably, Meet your basic expenses, Just meet your basic expe, Don't even have enough t, [VOL. DO NOT READ] Do). Flow bands connect categories across all three variables showing multi-way relationships.

Customize colors

If the overall chart has a format specified, colors will be assigned by default. Otherwise, colors will appear all black initially:

JSON editor dialog titled 'Edit element properties' showing configuration for a Sankey diagram. The JSON includes key properties like "key": "sankey", "children": ["cq1", "cq2", "cq3"], "chartType": "SankeyDiagram", and a "colors" object (line 66) mapping numeric values to color codes: "1": "forestgreen", "2": "#9b6", "3": "#fa0", "4": "#39C", "5": "#26A". Also shows "titles" mapping question keys to labels. Cancel and Ok buttons at bottom.

You can set specific colors from the context menu for the chart element. Choose "Edit JSON..." and create a "colors" attribute. This should be an object mapping (unformatted) values to HTML colors like below: 

Three-column Sankey flow diagram now displaying with color-coded boxes showing happiness, health, and financial situation relationships. Left column shows happiness levels in varying shades of green and orange. Middle column shows health ratings ranging from dark green (Excellent) through light green (Good) to orange (Only fair) and blue (Poor). Right column shows financial situations in similar color coding. Gray and white flow bands connect the colored boxes, making relationships easier to trace visually.

Customize positions

The boxes are positioned by default to minimize line crossings. You can customize box positions by dragging them up or down.

The positions you drag are saved, the rest are free. Since the graph is redrawn on every press, it's often best to position everything if you position anything. Although sometimes it's sufficient to move special categories like "Other" down to the bottom and leave the others free to move.

Customize size

To change the size or margin of the graph, select the chart element by pressing on the header. Blue resize handles and red margin handles will appear. You can drag these with the mouse or set them explicitly under "Edit JSON..." by setting the "size"and "margin" attributes under "chartOptions":

Three-column Sankey diagram with visible selection handles, displayed with light blue background. The diagram shows the same happiness-health-wealth relationships with colored boxes, but now includes a dashed border with red corner handles (for margins) and blue handles (for resizing) around the chart area. The header 'Q1-Q3' appears in a light blue badge at top left. The overall chart has more spacing and appears larger than previous versions.