Word clouds

Create word clouds from text verbatims

Updated at November 8th, 2022

Let's say your study asked respondents to provide text answers to an open ended question:


The answers are too long to read in one line, and there are too many to show in one bar chart. Word clouds can be a simple and whimsical way to quickly convey the gist of the answers.

Create a word cloud

To turn responses in an element into a word cloud press the circle edit icon to bring up the context menu. Select "More properties...," then under "Chart type" select "Word cloud". This will create a simple cloud showing all verbatim answers:

By default, it represents the frequency of each answer with font size equal to the percentage frequency. Here, this chart doesn't make sense because the fonts are too small , each answer is unique,  and represents a tiny percentage of the sample.

When displayed this way, the visual data is not interesting because each independent response is unique. 

Edit word case

Protobi is case sensitive, so "easy" and "Easy" are counted as two different response values. If you don't want the same word with different casing to be counted as separate responses, select "Edit properties..." and next to "string" select a case to use for all values. 

split phrases into words

Word clouds are more effective when the design clearly represents prevalence of key words. To exhibit the frequencies of individual words, rather than complete responses, select "Edit properties..." and under "split" type:

  • " " (i.e., just a space, without the quotes) or
  • "word" (i.e., just the word 'word' without the quotes)

The first will split the sentences at each space but only at spaces. The second will be a little smarter and will split strings at certain punctuations and word boundaries but avoid splitting at underscores and hyphens.

This will now split strings at each space into shorter strings, and show frequencies of each word:

Exclude common words

By default Protobi word clouds exclude the following set of words that occur frequently in English, "of,the,and,to,an,are,is,for,do,a,it,be,i,with,in,that,have,on,so".

These words are specified under the suppress property. This can be a list of words separated by spaces, commas, hyphens, or any regex word boundary. So you could just copy/paste the question text at the end to also exclude any words from the question text.

To add more words to this list via the user interface, right click on a word and press Ok when asked to confirm:

To remove words from the list, edit the property exclude in the Additional Options dialog above.

Customize the chart

You can customize many aspects of the chart including the maximum and minimum font sizes, cloud shape, etc.  From the Chart Type... dialog, select "Additional options" to bring up a dialog with more options:

Resize the chart

By default Protobi scales the word cloud to match the element inner size.  To adjust the chart size, select the element so its header is highlighted.   Dashed outlines showing the outer and inner size.  Resize chart outer size by moving the blue resize handle.   Adjust the inner size by dragging the red margin handles.

If desired Protobi can size words exactly so frequent word has font size specified by the option  maxFontSize and other words sized relative to that.  In the Additional Options dialog select the checkbox for scaleToMaxFontSize and specify a maxFontSize value:

Combine similar words

After values are split into words, you can combine similar words into one code using the Recode feature. 

From the context menu to bring up the Recode dialog. Search for similar words, then select and drag them to a new or existing code on the left.

Set word colors

It's possible to setup rules to color words.  The simplest is to select Color... from the context menu and choose a color theme.  You can add your companies primary and alternate color themes to your project, and these color schemes can define color sequences or specific colors for specific values; this is covered in a separate tutorial.


If you want to assign specific colors to specific words for one element only you can specify this mapping in the element. Select Edit JSON... from the context menu and create an attribute colors which maps words to colors as shown below:

The matching colors will appear and all other words appear as grey:


Keep certain phrases together

The split feature in Edit properties is useful, but what if there are words that have to be kept together, like "Staten Island", or "COVID 19".  The challenge is such logic has to be applied before splitting values into words. Protobi is already smart enough not to split hyphens like "Ocasio-Cortez", and we're working to make this feature prettier and accessible via the user interface. For now, you can keep certain phrase together by replacing spaces with hyphens or underscores.

 First, select Edit JSON... from context menu to bring up the JSON editor for the element. Then create an attribute replace which is an object mapping expressions on the left to alternate values on the right.  

In the example below, the attribute is set to replace all instances of "staten island" with "staten_island" before  words are split.  

Expressions on the left are "regular expressions" and can be super expressive.  If you use a dot in "staten.island" is a wild card that matches any one character, like a space or hyphen or other character.  

Expressions on the right are the replacement.  Protobi doesn't consider  underscores or hyphens to be word boundaries when splitting, when split is set to "word, so "staten_island" won't be split. And Protobi word clouds display underscores as spaces, so the underscore is a good character to represent a non-breaking space.

Limit the number of words

You can set an absolute limit on the number of words that appear by setting maxWords in the Additional Options dialog.  The default value is 60.  In this case no more than 60 words will appear.  This limit is applied after excluding common words.

You can also set a threshold for the minimum frequency for a word to appear by setting minBasis in the Additional Options dialog.  The default is zero so that any words that occurs even once could be included in the word cloud (subject to the maxWords limit).  If you set it to 5, then only words that occur 5 or more times will be considered for the word cloud.

Available options

Word clouds are powered by the engine by Timothy Chien. The chartOptions block is passed straight to the rendering engine, allowing you to set these options:

  • minFontSize: 5 minimum font size (pixels) 
  • maxFontSize: 60 maximum font size (pixels)
  • scaleToMaxFontSize: true|false whether to scale
  • limit : 60 maximum number of values to draw
  • fontFamily: font to use
  • fontWeight: font weight to use, e.g. "bold" or 600
  • color: color of the text, can be any CSS color
  • weightFactor: number to multiply size of each word
  • backgroundColor: color of the background
  • drawOutOfBound: true allows words to extend outside the box
  • shape: "ellipse"The shape of the "cloud" to draw,
    • "circle" (default),
    • "cardioid",
    • "diamond",
    • "square",
    • "triangle",and
    • "star".
  • ellipticity: degree of "flatness" of the word cloud.
  • shuffle: true (default) randomizes points
  • rotateRatio:
  • Probability for word to rotate (1=always, 0=never)


Word clouds aren't just for text verbatims

Many distributions, not just text, can be drawn as word clouds. Values appears as words, and the font size is proportional to its frequency.   For instance, respondent state could be drawn as a word cloud rather than a map:


Video Tutorial

See Word clouds article for more information. 

Transcript

Word clouds are a fun way to show the qualitative gist of a quant distribution. Here, we'll show you how to create a word cloud from verbatim text responses.

Begin by selecting a text open-end, in this example we'll use D2_raw, which ask respondents their thoughts on cardiac monitoring devices.

Let's make the element a little bit wider, we still cannot read all the answers, but enough so that we can scan for some trends. Using the search bar we can discern a few themes like "easy", "cost", and "portable". Coding this many answers can take a while.

Let's just try splitting into words. Select the magenta circle, and choose "More properties...", under split enter "word". Press ok. D2_raw is now split into individual words.

Select "Chart Type..." from the context menu, select "Word Cloud". Protobi by default excludes a number of common words, we can exclude others like "I" and "am" under chartOptions. Select "Edit JSON" from the context menu, and add additional words to exclude.

And while we're at it, let's make the word cloud square. Sometimes you may wish to combine synonyms. For instance, we may want to combine "easy", "easier", and "easiest". Select "Recode..." from the context menu. We can now begin searching for similar responses for things like "easy", "cost" and "portable". Hit apply, D2_raw is now a word cloud.

See other tutorials on Protobi chart types. 












Was this article helpful?