1
votes

How can I convert the summary run on a data.frame into a data.frame itself? I need a data.frame as an output to knitr::kable in RMarkdown.

In particular I have this dataframe

d <- data.frame(a=c(1,2,3), b=c(4,5,6))
ds <- summary(d)
class(ds)
# returns "table"

And I need ds in a data.frame format.

The output I would like would be a data.frame with "Min.", "1st Qu.", "Median", etc. as row names, "a" and "b" as column names, and the numbers in the cells.

as.data.frame doesn't work:

ds.df <- as.data.frame(ds)
print(ds.df)
# Output is messed up

The code in this related question doesn't work either:

df.df2 <- data.frame(unclass(summary(ds.df)), check.names = FALSE, stringsAsFactors = FALSE)
print(df.df2)
# Output equally messed up

broom::tidy on a table is deprecated and in anyway returns an error:

df.df3 <- broom::tidy(ds)
# Returns error
# Error: Columns 1 and 2 must be named.
# Moreover
# 'tidy.table' is deprecated.

The as.data.frame.matrix puts "Min" and the other names of the statistics inside each cell, instead of them being row names:

ds.df3 <- as.data.frame.matrix(ds)
print(ds.df3)
# Returns "Min" and "1sd Qu." inside the cell
# instead of them being row names
1
Well, the same answer from the post you linked to suggests do.call(cbind, lapply(d, summary)) which seems to work for your case.Ronak Shah
@RonakShah Yes but I have have preferred a real conversion of the object returned by summary(df).robertspierre

1 Answers

2
votes

We could use the matrix route

out <- as.data.frame.matrix(ds)
row.names(out) <- NULL

-output

out
             a             b
1 Min.   :1.0   Min.   :4.0  
2 1st Qu.:1.5   1st Qu.:4.5  
3 Median :2.0   Median :5.0  
4 Mean   :2.0   Mean   :5.0  
5 3rd Qu.:2.5   3rd Qu.:5.5  
6 Max.   :3.0   Max.   :6.0  

If we need the min etc as row names, loop over the columns with sapply and apply the summary

as.data.frame(sapply(d, summary))

-output

          a   b
Min.    1.0 4.0
1st Qu. 1.5 4.5
Median  2.0 5.0
Mean    2.0 5.0
3rd Qu. 2.5 5.5
Max.    3.0 6.0