0
votes

I have a dataframe in the following format (numeric columns with the first row corresponding to some name; data can be missing)-

    col1.name  |  col2.name  |  col3.name  |  ...
    132        |  12.1       |  NA         |  ...
    12.4       |  NA         |  14.6       |  ...
    13         |  1441       |  535        |  ...

For each column, I want to calculate it's mean, median, and standard deviation and add them to a dataframe in the format-

    col.name   |  mean       |  median     |  sd
    col1.name  |  123        |  456        |  12.2
    col2.name  |  12.1       |  45         |  32.1
    col3.name  |  111        |  14.6       |  69.2
    ...        |  ...        |  ...        | ...

I currently have the following code; but it gives me an error on 'x' must be numeric. What can I do to do this?

data.frame(ID=hvbp.analysis.df[,1], Means=rowMeans(hvbp.analysis.df[,-1]))
apply(hvbp.analysis.df, 2, mean, na.rm = TRUE)
4

4 Answers

1
votes

If you reshape to long form first, e.g. with tidyr::gather, the rest is pretty typical aggregation:

library(tidyverse)

df <- data.frame(col1.name = c(132, 12.4, 13), 
                 col2.name = c(12.1, NA, 1441), 
                 col3.name = c(NA, 14.6, 535))

df %>% 
    gather(col.name, value) %>% 
    group_by(col.name) %>% 
    summarise(mean = mean(value, na.rm = TRUE), 
              median = median(value, na.rm = TRUE), 
              sd = sd(value, na.rm = TRUE))
#> # A tibble: 3 x 4
#>   col.name   mean median     sd
#>   <chr>     <dbl>  <dbl>  <dbl>
#> 1 col1.name  52.5   13.0   68.9
#> 2 col2.name 727.   727.  1010. 
#> 3 col3.name 275.   275.   368.

summary and skimr::skim also provide similar summaries.

0
votes

First ensure all your columns are numeric: They might seem to be but maybe they are not. if you do sapply(data,class) you will get the class for the columns. or do str(data). To solve this problem:

data=rapply(data,as.numeric,how="replace")

Now you can apply your codes to the data

0
votes

This works.

df <- data.frame(col1name = c(132, 12.4, 13), col2name = c(12.1,NA,1441), col3name = c(NA,14.6,535))
new_df <- data.frame(col_name = colnames(df))

for(i in c('mean','median','sd'))
{
    new_df[[i]] <- apply(t(df),2,eval(i), na.rm=T)
}

print(new_df)

  col_name   mean median         sd
1 col1name  72.05  72.05  84.782103
2 col2name  13.50  13.50   1.555635
3 col3name 663.00 535.00 722.553804
0
votes

With data.frame d

d <- data.frame(a=1:3, b=4:6, c=c(5,5,5))

You can do

t(apply(d, 2, function(i) c(mean=mean(i), median=median(i), sd=sd(i))))
#  mean sd sum
#a    2  1   6
#b    5  1  15
#c    5  0  15

If you have NAs to take care of

t(apply(d, 2, function(i, ...) c(mean=mean(i,...), median=median(i,...), sd=sd(i,...)), na.rm=TRUE))