2
votes

I am using the tabular function from tables package to get a summary table of the "df" dataframe. I need to get an output that can be processed to get a Latex table with multirow-cells capability.

Here below is the dummy "df" dataframe:

age <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
L3 <- factor(rep(paste(LETTERS[7:16], LETTERS[7:16], LETTERS[7:16], sep=""), c(1,3,2,1,3,5,1,1,6,1)))
L2 <- factor(rep(paste(LETTERS[3:6], LETTERS[3:6], sep=""), c(4,6,6,8)))
L1 <- factor(rep(LETTERS[1:2], c(10,14)))
df <- data.frame(Top=L1, Mid=L2, Low=L3, Age=age, stringsAsFactors=F)

I use the following R command line to produce the summary table:

tabular( (Top*Mid*Low*DropEmpty()) ~ (CL3=1) + (Age)*(Format(digits=2, latex=T)*mean + Format(digits=3, latex=T)*sd), data=df)

The summary table generated with this command line is the following:

                 Age      
 Top Mid Low CL3 mean sd  
 A   CC  GGG 1   62     NA
         HHH 3   61   2.08
     DD  III 2   65   2.83
         JJJ 1   71     NA
         KKK 3   65   1.00
 B   EE  LLL 5   68   1.87
         MMM 1   68     NA
     FF  NNN 1   56     NA
         OOO 6   62   1.47
         PPP 1   59     NA

Caption: with CL3 which is a just short names meaning "Count at Level 3" (i.e : for Top-Mid-Low combination).

But I would like to get this (2 extra count columns (CL1 and CL2) respectively at level "Top" and "Mid"):

                 Age      
 Top CL1 Mid CL2 Low CL3 mean sd  
 A   10  CC  4   GGG 1   62     NA
                 HHH 3   61   2.08
         DD  6   III 2   65   2.83
                 JJJ 1   71     NA
                 KKK 3   65   1.00
 B   14  EE  6   LLL 5   68   1.87
                 MMM 1   68     NA
         FF  8   NNN 1   56     NA
                 OOO 6   62   1.47
                 PPP 1   59     NA

Caption: with CL1, CL2, CL3 which are just short names meaning "Count at Level x" (with x standing for Top, Top-Mid, Top-Mid-Low combination).

So, could you help me to figure out how I can get this with the tabular function of the tables package? I need to use this function, or at least any others that can output Latex code handling multirow-cells capability (ex: xtable, or bytable of taRifx package) because I want to output a picture (.EMF or .SVG or .JPG) of this table with verticaly centered multirow cells.

2

2 Answers

1
votes

Tabular doesn't seem to support nested formulas - i.e. trying to do ((Top~(CL1=1))*(Mid~(CL2=1))*Low*DropEmpty()) throws an error about nested formulas. So one idea might be to get the group counts before using tabular

Something like:

df$CL1 <- factor(ave(as.character(df$Top), as.character(df$Top), FUN = length))
df$CL2 <- factor(ave(as.character(df$Mid), as.character(df$Mid), FUN = length))

tabular( (Top*CL1*Mid*CL2*Low*DropEmpty()) ~ (CL3=1) + (Age)*(Format(digits=2, latex=T)*mean + Format(digits=3, latex=T)*sd), data=df)

 #Top CL1 Mid CL2 Low CL3 mean sd  
 #A   10  CC  4   GGG 1   62     NA
 #                HHH 3   61   2.08
 #        DD  6   III 2   65   2.83
 #                JJJ 1   71     NA
 #                KKK 3   65   1.00
 #B   14  EE  6   LLL 5   68   1.87
 #                MMM 1   68     NA
 #        FF  8   NNN 1   56     NA
 #                OOO 6   62   1.47
 #                PPP 1   59     NA
1
votes

Here's one using dplyr and (my) huxtable package:

library(huxtable)
library(dplyr)

# prepare summaries:
df_sum <- df %>% 
      group_by(Top) %>% mutate(CL1 = n()) %>% 
      group_by(Mid, add = TRUE) %>% mutate(CL2 = n()) %>% 
      group_by(Low, add = TRUE) %>% mutate(CL3 = n()) %>% 
      summarize(
        mean = mean(Age), 
        sd   = sd(Age), 
        CL1  = CL1[1], 
        CL2  = CL2[1], 
        CL3  = CL3[1]
      ) %>% 
      select(Top, CL1, Mid, CL2, Low, CL3, mean, sd)

# format for LaTeX/HTML output:
hux_sum <- as_hux(df_sum)
rowspan(hux_sum)[c(1,6), 1:2] <- 5
rowspan(hux_sum)[c(1,3,6,8), 3:4] <- c(2,3,2,3)
number_format(hux_sum)[, 1:6] <- 0
hux_sum