For each col in my dataframe, perform a function and add its results to a new dataframe?

Question

I have a dataframe in the following format (numeric columns with the first row corresponding to some name; data can be missing)-

    col1.name  |  col2.name  |  col3.name  |  ...
    132        |  12.1       |  NA         |  ...
    12.4       |  NA         |  14.6       |  ...
    13         |  1441       |  535        |  ...

For each column, I want to calculate it's mean, median, and standard deviation and add them to a dataframe in the format-

    col.name   |  mean       |  median     |  sd
    col1.name  |  123        |  456        |  12.2
    col2.name  |  12.1       |  45         |  32.1
    col3.name  |  111        |  14.6       |  69.2
    ...        |  ...        |  ...        | ...

I currently have the following code; but it gives me an error on 'x' must be numeric. What can I do to do this?

data.frame(ID=hvbp.analysis.df[,1], Means=rowMeans(hvbp.analysis.df[,-1]))
apply(hvbp.analysis.df, 2, mean, na.rm = TRUE)

alistaire alistaire · Accepted Answer · 2018-03-02T20:25:16

If you reshape to long form first, e.g. with tidyr::gather, the rest is pretty typical aggregation:

library(tidyverse)

df <- data.frame(col1.name = c(132, 12.4, 13), 
                 col2.name = c(12.1, NA, 1441), 
                 col3.name = c(NA, 14.6, 535))

df %>% 
    gather(col.name, value) %>% 
    group_by(col.name) %>% 
    summarise(mean = mean(value, na.rm = TRUE), 
              median = median(value, na.rm = TRUE), 
              sd = sd(value, na.rm = TRUE))
#> # A tibble: 3 x 4
#>   col.name   mean median     sd
#>   <chr>     <dbl>  <dbl>  <dbl>
#> 1 col1.name  52.5   13.0   68.9
#> 2 col2.name 727.   727.  1010. 
#> 3 col3.name 275.   275.   368.

summary and skimr::skim also provide similar summaries.

For each col in my dataframe, perform a function and add its results to a new dataframe?

4 Answers