Word trees with googleVis 0.6.4

It’s been while since the last update on googleVis. Well, the Google Chart Tools are fairly settled now, but some time ago Google added Word Trees:

A word tree depicts multiple parallel sequences of words. It could be used to show which words most often follow or precede a target word (e.g., “Cats are…”) or to show a hierarchy of terms (e.g., a decision tree). Google word trees are able to process large amounts of text quickly. Modern systems should be able to handle novel-sized amounts of text without significant delay.

Ashley Baldry contributed the gvisWordTree function to googleVis 0.6.4, which allows us to create word trees from R.

Examples

Let’s take a look at the Cats data in googleVis:

library(googleVis)
Cats
##                           Phrase Size Sentiment
## 1      cats are better than dogs    1         8
## 2                cats eat kibble    1         5
## 3  cats are better than hamsters    1         6
## 4               cats are awesome    1        10
## 5            cats are people too    1         9
## 6                  cats eat mice    1         7
## 7                   cats meowing    1         3
## 8             cats in the cradle    1         5
## 9                  cats eat mice    1         7
## 10     cats in the cradle lyrics    1         5
## 11               cats eat kibble    1         5
## 12             cats for adoption    1         5
## 13               cats are family    1         8
## 14                 cats eat mice    1         7
## 15  cats are better than kittens    1         5
## 16                 cats are evil    1         0
## 17                cats are weird    1         2
## 18                 cats eat mice    1         7

Default Word Tree

To visualise the phrase of the Cats data and analyse the order of words in those phrases we can simple call gvisWordTree on the data and specify the column name containing the phrases.

# set googleVis plot option to display chart in RMarkdown 
op <- options(gvis.plot.tag='chart')
# create word tree chart
wt1 <- gvisWordTree(Cats, textvar = "Phrase")
plot(wt1)

Hover over the words to see information about frequency, click on any of the words in the chart to make it the root of the word tree.

Styling a Word Tree

As with the other googleVis functions we can set various options to change the root, style and look of the plot.

Here is one example with ‘cats’ set as the root and some styling options set, for more details visit the Google documentation.

Cats2 <- Cats
Cats2$Phrase.style <- ifelse(Cats$Sentiment >= 7, "green", 
                             ifelse(Cats$Sentiment <= 3, "red", "black"))
                             
wt2 <- gvisWordTree(Cats2, textvar = "Phrase", 
                    stylevar = "Phrase.style",
                    options = list(fontName = "Times-Roman",
                                   wordtree = "{word: 'cats'}",
                                   backgroundColor = "#cba"))
plot(wt2)

Implicit and explicit Word Trees

There are two ways to create word trees: implicitly (default) and explicitly. The choice is specified with the wordtree.format option.

  • ‘implicit’: The word tree will take a set of phrases, in any order, and construct the tree according to the frequency of the words and sub-phrases.
  • ‘explicit’: We tell the word tree what connects to what, how big to make each sub-phrase, and what colours to use.

Example of an explicit word tree:

# Explicit word tree
exp.data <- data.frame(id = as.numeric(0:9),
                       label = letters[1:10],
                       parent = c(-1, 0, 0, 0, 2, 2, 4, 6, 1, 7),
                       size = c(10, 5, 3, 2, 2, 2, 1, 1, 5, 1),
                       stringsAsFactors = FALSE)

wt3 <- gvisWordTree(exp.data, idvar = "id", textvar = "label", 
                    parentvar = "parent", sizevar = "size",
                    options = list(wordtree = "{format: 'explicit'}"),
                    method = "explicit")
plot(wt3)

For other chart types, visualisations and documentation see the googleVis vignettes on CRAN.

comments powered by Disqus