1
votes

I would like some help on creating formatted tables in R - whether it's just using the normal IDE or R Markdown. There are two main things that I'd like to do:

  • Present the descriptive statistics (Mean, Median, Min, Max) by group based on different columns
  • Present the descriptive statistic based on the total sample (ungrouped data)

Sample data:

   df <- data.frame(Gender = c("F", "M", "F", "M", "M", "M", "M", "F", "M", "M"),
                 Young = c("Y", "N", "Y", "N", "Y", "N", "Y", "N", "Y", "N"),
                 Age = c("14", "25", "13", "24", "14", "25", "13", "24", "10", "26"),
                 Location = c("Suburb", "Rural", "Suburb", "Rural","Suburb", "Rural","Suburb", "Rural","Suburb", "Rural"))

Expected results

Variable Mean Median Max Min
Gender
Female
Male
Location
Suburb
Rural
TOTAL

Is there a way to do this in R?

3
Your sample data seems to be missing the data. Where are the numbers?r2evans

3 Answers

4
votes

You can get all the information that you need by getting the data in long format.

library(dplyr)
library(tidyr)

df <- type.convert(df, as.is = TRUE)

df %>%
  pivot_longer(cols = -Age) %>%
  group_by(name, value) %>%
  summarise(min_age = min(Age), 
            max_age = max(Age), 
            median_age = median(Age), 
            mean_age = mean(Age))

#  name     value  min_age max_age median_age mean_age
#  <chr>    <chr>    <int>   <int>      <int>    <dbl>
#1 Gender   F           13      24         14     17  
#2 Gender   M           10      26         24     19.6
#3 Location Rural       24      26         25     24.8
#4 Location Suburb      10      14         13     12.8
#5 Young    N           24      26         25     24.8
#6 Young    Y           10      14         13     12.8
1
votes

Similar answer using data.table:

> library(data.table)
> df <- data.frame(Gender = c("F", "M", "F", "M", "M", "M", "M", "F", "M", "M"),
+                  Young = c("Y", "N", "Y", "N", "Y", "N", "Y", "N", "Y", "N"),
+                  Age = c("14", "25", "13", "24", "14", "25", "13", "24", 
+                          "10", "26"),
+                  Location = c("Suburb", "Rural", "Suburb", 
+                               "Rural","Suburb", "Rural","Suburb", 
+                               "Rural","Suburb", "Rural"))
> setDT(df)                        # make it a data.table    
> df[,Age:=as.integer(Age)]        # correct age column   
> df[,.(mean=mean(Age), median=median(Age), max=max(Age), min=min(Age)),
+     by=.(Gender,Location)]   
   Gender Location    mean median max min
1:      F   Suburb 13.5000   13.5  14  13
2:      M    Rural 25.0000   25.0  26  24
3:      M   Suburb 12.3333   13.0  14  10
4:      F    Rural 24.0000   24.0  24  24
> 

Or if we want to stratify by one variable at a time:

> df[,.(mean=mean(Age), median=median(Age), max=max(Age),min=min(Age)), 
+    by=.(Gender)]
   Gender    mean median max min
1:      F 17.0000     14  24  13
2:      M 19.5714     24  26  10
> df[,.(mean=mean(Age), median=median(Age), max=max(Age), min=min(Age)), 
+    by=.(Location)]
   Location mean median max min
1:   Suburb 12.8     13  14  10
2:    Rural 24.8     25  26  24
> 

And inspired by Ronak's nice answer, the same as a data.table one-liner:

> melt(df, id.vars="Age")[, .(mean=mean(Age), 
+                             median=median(Age), 
+                             min=min(Age), 
+                             max=max(Age)), by=.(variable,value)]
   variable  value    mean median min max
1:   Gender      F 17.0000     14  13  24
2:   Gender      M 19.5714     24  10  26
3:    Young      Y 12.8000     13  10  14
4:    Young      N 24.8000     25  24  26
5: Location Suburb 12.8000     13  10  14
6: Location  Rural 24.8000     25  24  26
> 
0
votes

Several packages have wrapped functions for this. I usually use describe from {psych} package.

library(tidyverse)

df <- data.frame(Gender = c("F", "M", "F", "M", "M", "M", "M", "F", "M", "M"),
                 Young = c("Y", "N", "Y", "N", "Y", "N", "Y", "N", "Y", "N"),
                 Age = c("14", "25", "13", "24", "14", "25", "13", "24", "10", "26"),
                 Location = c("Suburb", "Rural", "Suburb", "Rural","Suburb", "Rural","Suburb", "Rural","Suburb", "Rural"))

df_summary <- psych::describe(df)

df_summary

         vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
Gender*      1 10  1.7 0.48    2.0    1.75 0.00   1   2     1 -0.75    -1.57 0.15
Young*       2 10  1.5 0.53    1.5    1.50 0.74   1   2     1  0.00    -2.19 0.17
Age*         3 10  3.5 1.58    3.5    3.50 2.22   1   6     5  0.00    -1.42 0.50
Location*    4 10  1.5 0.53    1.5    1.50 0.74   1   2     1  0.00    -2.19 0.17

You can then use dplyr to do whatever you want.

df_summary %>% select(mean, median, max, min)