12
votes

I'm using xtable to generate tables to put in Latex, and was wondering if there's a way to have conditional formatting of cells so that all significant p-values are in grey? I'm using Knitr in TexShop.

Here's an example using the diamonds data in ggplot2, and running a TukeyHSD test to predict carat from cut.

library(ggplot2)
library(xtable)
summary(data.aov <- aov(carat~cut, data = diamonds))
data.hsd<-TukeyHSD(data.aov)
data.hsd.result<-data.frame(data.hsd$cut)
data.hsd.result

I can then get data.hsd.result into xtable format with:

xtable(data.hsd.result)

In Latex, the output looks like this:

                         diff         lwr         upr        p.adj
Good-Fair         -0.19695197 -0.23342631 -0.16047764 0.000000e+00
Very Good-Fair    -0.23975525 -0.27344709 -0.20606342 0.000000e+00
Premium-Fair      -0.15418175 -0.18762721 -0.12073628 0.000000e+00
Ideal-Fair        -0.34329965 -0.37610961 -0.31048970 0.000000e+00
Very Good-Good    -0.04280328 -0.06430194 -0.02130461 5.585171e-07
Premium-Good       0.04277023  0.02165976  0.06388070 3.256208e-07
Ideal-Good        -0.14634768 -0.16643613 -0.12625923 0.000000e+00
Premium-Very Good  0.08557350  0.06974902  0.10139799 0.000000e+00
Ideal-Very Good   -0.10354440 -0.11797729 -0.08911151 0.000000e+00
Ideal-Premium     -0.18911791 -0.20296592 -0.17526989 0.000000e+00

It it possible to have any p-values < 0.05 to have a grey coloured background automatically or highlighted in some way? Obviously, for this set it would be the whole column, but I'm hoping for something that works with all my data.

2

2 Answers

18
votes

Hello try this :

\documentclass{article}
\usepackage{color}
\begin{document}

<<echo=FALSE, results='asis'>>=
df = data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1))
df$V3 = ifelse(df$V2 < 0.5, paste0("\\colorbox{red}{", df$V2, "}"), df$V2)
library(xtable)
print(xtable(df), sanitize.text.function = function(x) x)
@

\end{document}

EDIT

If you have multiple conditions, one solution is to use package dplyr and function case_when :

set.seed(123)
df <- data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1))

library("dplyr")
df %>% 
  mutate(
    V3 = case_when(
      V2 < 0.5 ~ paste0("\\colorbox{red}{", round(V2, 3), "}"),
      V2 >= 0.5 & V2 < 0.8 ~ paste0("\\colorbox{blue}{", round(V2, 3), "}"),
      TRUE ~ formatC(V2, digits = 3)
    )
  )
#   V1        V2                      V3
# 1  A 0.2875775  \\colorbox{red}{0.288}
# 2  B 0.7883051 \\colorbox{blue}{0.788}
# 3  C 0.4089769  \\colorbox{red}{0.409}
# 4  D 0.8830174                   0.883
# 5  E 0.9404673                    0.94
# 6  F 0.0455565  \\colorbox{red}{0.046}
1
votes

Victorp provide an excellent solution and it gave me such a relief from a hours long struggle. Then later the day I need to impose more than one conditions on same data set, meaning I need two different colors on cells based on different conditions, to solve this, totally based on Victorp's answer, I figured a solution and hope this would help those need this in the future.

    <<echo=FALSE, results='asis'>>=
    df = data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1),V3 = runif(6, 0, 1))
    ## replicate the data frame of which you are going to highlight the cells
    ## the number of duplicates should be equal to number of conditions you want to impose
    temp.1<-df
    temp.2<-df
    ## impose conditions on those temporary data frame separately.
    ## change the columns you want to 
    for (i in colnames(temp.1)[2:3]) {
    temp.1[,i]= ifelse(temp.1[,i] <= 0.5,
                                paste0("\\colorbox{red}{", temp.1[,i], "}"), temp.1[,i])}
    rm(i)


    for (i in colnames(temp.2)[2]) {
    temp.2[,i]= ifelse(temp.2[,i] > 0.5 & temp.2[,i] <=0.8,
                                paste0("\\colorbox{blue}{", temp.2[,i], "}"),temp.2[,i])}
    rm(i)
    ## then record the position of cells under you conditions
    pos.1<-which(df[,] <=0.5,arr.ind = TRUE)
    pos.2<-which(df[,] >0.5 & df[,]<=0.8,arr.ind = TRUE)
    ## replace cells in original data frame that you want to highlight
    ## replace those values in temp which satisfy the condition imposed on temp.1
    if(length(pos.1)>0) {
      temp[pos.1]<-temp.1[pos.1]
    }


    ## replace those values in temp which satisfy the condition imposed on temp.2
    if(length(pos.2)>0) {
      temp[pos.2]<-temp.2[pos.2]
    }
    rm(temp.1,temp.2,pos.1,pos.2)
    @

then you print df in the way you like. This works, however,given the power of R, I do believe there should be much more easier ways for this.