0
votes

I have a dataframe containing some comparisons and the value represent the similarity between objects. I have a real object compared to some random ones which led to very small similarity. Also, I compared random objects versus random which led to higher similarity rate. At this point I want to put all together and plot it as a heatmap. Problem is that very small values of similarity which I want to highlight have the same colour as the not-so-small from the random-random comparison. Of course this is a problem of scale but I don't know how to manage colour scale. The following code generate a heatmap that actually show the issue. Here, the first column has a yellowish colour, which is fine, but this is the same colour as other tiles which, on the other hand, have higher, non comparable values. How to colour tiles accordingly to the actual scale?

The code:

set.seed(131)

#number of comparisons in the original data: 1 value versus n=10
n <- 10
#generate real data (very small values)
fakeRealData <- runif(n, min=0.00000000000001, max=0.00000000000002)

#and create the data structure
realD <- cbind.data.frame(rowS=rep("fakeRealData", n), colS=paste("rnd", seq(1, n, by=1), sep=" "), Similarity=fakeRealData, stringsAsFactors=F)

#the same for random data, n=10 random comparisons make for a n by n matrix
rndN <- n*n
randomData <- data.frame(matrix(runif(rndN), nrow=n, ncol=n))

rowS <- vector()
#for each column of randomData
for (r in seq(1, n, by=1)) {
    #create a vector of the first rowname, then the second, the third, etc etc which is long as the number of columns
    rowS <- append(rowS, rep(paste("rnd", r, sep=" "), n))
}

#and create the random data structure
randomPVs <- cbind.data.frame(rowS=rowS, colS=rep(paste("rnd", seq(1, n, by=1), sep=" "), n), Similarity=unlist(randomData), stringsAsFactors=F)

#eventually put everything together
everything <- rbind.data.frame(randomPVs, realD)

#and finally plot the heatmap
heaT <- ggplot(everything, aes(rowS, colS, fill=Similarity)) + 
    geom_tile() +
    scale_fill_distiller(palette = "YlGn", direction=2) +
    theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1))+
    xlab("")+
    ylab("")

plot(heaT)
1
Does a transformation help. Try adding trans = scales::log10_trans() to your scale_fill_distiller. PS you (probably) never need to call rbind.data.frame directly, rbind will work out what to do.Richard Telford
I tried to transform the data before plotting but the results were not good. apparently this function you suggested does the trick even though the best solution is to plot numbers into the matrix as in the answer.gabt

1 Answers

1
votes

Here are three approaches:

Add geom_text to your plot to show the values when color differences are small.

heaT <- ggplot(everything, aes(rowS, colS)) + 
  geom_tile(aes(fill=Similarity)) +
  scale_fill_distiller(palette = "YlGn", direction=2) +
  geom_text(aes(label = round(Similarity, 2))) +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  xlab("") +
  ylab("")

heat map with text labels

Use the values argument to set a nonlinear scale to scale_fill_distiller. I added an extra break point at 0.01 to the otherwise linear scale to accentuate the difference between 0 and small nonzero numbers. I let the rest of the scale linear.

heaT <- ggplot(everything, aes(rowS, colS)) + 
  geom_tile(aes(fill=Similarity)) +
  scale_fill_distiller(palette = "YlGn", direction=2, 
                       values = c(0, 0.01, seq(0.05, 1, 0.05))) +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  xlab("") +
  ylab("")

enter image description here

Transform your scale as Richard mentioned in the comments. Note that this will mess with the values in the legend, so either rename it or hide it.

heaT <- ggplot(everything, aes(rowS, colS)) + 
  geom_tile(aes(fill=Similarity)) +
  scale_fill_distiller(palette = "YlGn", direction=2, trans = "log10", 
                       name = "log10(Similarity)") +

  theme_bw() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  xlab("")+
  ylab("")

enter image description here

Try combinations of these approaches and see what you like.