6
votes

The R package wordcloud has a very useful function which is called wordlayout. It takes initial positions of words and their respective sizes an rearranges them in a way that they do not overlap. I would like to use the results of this functions to do a geom_text plot in ggplot. I came up with the following example but soon realized that there seems to be a big difference betweetn cex (wordlayout) and size (geom_plot) since words in graphics package appear way larger. here is my sample code. Plot 1 is the original wordcloud plot which has no overlaps:

library(wordcloud)
library(tm)
library(ggplot2)

samplesize=100
textdf <- data.frame(label=sample(stopwords("en"),samplesize,replace=TRUE),x=sample(c(1:1000),samplesize,replace=TRUE),y=sample(c(1:1000),samplesize,replace=TRUE),size=sample(c(1:5),samplesize,replace=TRUE))

#plot1
plot.new()
pdf(file="plot1.pdf")
textplot(textdf$x,textdf$y,textdf$label,textdf$size)
dev.off()
#plot2
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot2.pdf")
#plot3
new_pos <- wordlayout(x=textdf$x,y=textdf$y,words=textdf$label,cex=textdf$size)
textdf$x <- new_pos[,1]
textdf$y <- new_pos[,2]
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot3.pdf")
#plot4
textdf$x <- new_pos[,1]+0.5*new_pos[,3]#this is the way the wordcloud package rearranges the positions. I took this out of the textplot function
textdf$y <- new_pos[,2]+0.5*new_pos[,4]
ggplot(textdf,aes(x,y))+geom_text(aes(label = label, size = size))
ggsave("plot4.pdf")

is there a way to overcome this cex/size difference and reuse wordlayout for ggplots?

2

2 Answers

4
votes

cex stands for character expansion and is the factor by which text is magnified relative the default, specified by cin - set on my installation to 0.15 in by 0.2 in: see ?par for more details.

@hadley explains that ggplot2 sizes are measured in mm. Therefore cex=1 would correspond to size=3.81 or size=5.08 depending on if it is being scaled by the width or height. Of course, font selection may cause differences.

In addition, to use absolute sizes, you need to have the size specification outside the aes otherwise it considers it a variable to map to and choose the scale itself, eg:

ggplot(textdf,aes(x,y))+geom_text(aes(label = label),size = textdf$size*3.81)
4
votes

Sadly I think you're going to find the short answer is no! I think the package handles the text vector mapping differently from ggplot2, so you can tinker with size and font face/family, etc. but will struggle to replicate exactly what the package is doing.

I tried a few things:

1) Try to plot the grobs from textdata using annotation_custom

require(plyr)  
require(grid)

# FIRST TRY PLOT INDIVIDUAL TEXT GROBS
qplot(0:1000,0:1000,geom="blank") +
  alply(textdf,1,function(x){
  annotation_custom(textGrob(label=x$label,0,0,c("center","center"),gp=gpar(cex=x$size)),x$x,x$x,x$y,x$y)  
})  

enter image description here

2) Run the wordlayout() function which should readjust the text, but difficult to see for what font (similarly doesn't work)

# THEN USE wordcloud() TO GET CO-ORDS
plot.new()
wordlayout(textdf$x,textdf$y,words=textdf$label,cex=textdf$size,xlim=c(min(textdf$x),max(textdf$x)),ylim=c(min(textdf$y),max(textdf$y)))
plotdata<-cbind(data.frame(rownames(w)),w)
colnames(plotdata)=c("word","x","y","w","h")

# PLOT WORDCLOUD DATA
qplot(0:1000,0:1000,geom="blank") +
  alply(plotdata,1,function(x){
    annotation_custom(textGrob(label=x$word,0,0,c("center","center"),gp=gpar(cex=x$h*40)),x$x,x$x,x$y,x$y)  
  })  

enter image description here

Here's a cheat if you just want to overplot other ggplot functions on top of it (although the co-ords don't seem to match up exactly between the data and the plot). It basically images the wordcloud, removes the margins, and under-plots it at the same scale:

# make a png file of just the panel
plot.new()
png(filename="bgplot.png")
par(mar=c(0.01,0.01,0.01,0.01))
textplot(textdf$x,textdf$y,textdf$label,textdf$size,xaxt="n",yaxt="n",xlab="",ylab="",asp=1)
dev.off()

# library to get PNG file
require(png)  

# then plot it behind the panel
qplot(0:1000,0:1000,geom="blank") + 
  annotation_custom(rasterGrob(readPNG("bgplot.png"),0,0,1,1,just=c("left","bottom")),0,1000,0,1000) +
  coord_fixed(1,c(0,1000),c(0,1000))

enter image description here