1
votes

I have a data frame with one ID column and multiple numeric columns containing density measurements. To make the densities normally distributed, I need to take the log, but because I have 0 density values, I need to increase all my density measurements by 0.5 to not get Inf data points when I log transform. How do I do that using dplyr?

Sample Data:

  ID    `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
  <chr>       <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 IM_10          NA           608.              755.            51.0             868.             1066. 
2 IM_1…          NA            27.5              69.3            0.550            30.4              75.2
3 IM_1…          NA            19.6              17.0            1.03             53.2              42.0
4 IM_1…          NA           109.               89.0           47.7             725.              594. 
5 IM_1…          NA           219.              171.             0.501           531.              416. 
6 IM_1…          NA             4.00              0              0                 5.94              0  

I tried using

df1 <- df %>% group_by(ID) %>% 
  summarise_all(funs(mean(., na.rm=TRUE))) %>% 
  mutate_at(which(sapply(., is.numeric)), funs(sum(0.5)))

but that replaces all my numeric columns with 0.5, instead of adding 0.5 to the original densities.

  ID    `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
  <chr>       <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 IM_10         0.5              0.5              0.5              0.5              0.5              0.5
2 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
3 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
4 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
5 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5
6 IM_1…         0.5              0.5              0.5              0.5              0.5              0.5

Any ideas how to do this?

2
Please provide some sample data and desired result so we can help you better.davsjob
Thank you davsjob, I edited the question with some example data.S.Al-Khalidi

2 Answers

0
votes

I assume that you want to summarise each ID and then add 0.5 to every value (that is not NA). Then this is how i would do it:

# Sample data
df <- structure(list(ID = c("IM_10", "IM_11", "IM_12", "IM_13", "IM_14", 
                            "IM_15"), Image_Tag = c(NA, NA, NA, NA, NA, NA), CD3_Global_Den = c(608, 
                                                                                                27.5, 19.6, 109, 219, 4), CD8_Global_Den = c(755, 69.3, 17, 89, 
                                                                                                                                             171, 0), CD20_Global_De = c(51, 0.55, 1.03, 47.7, 0.501, 0), 
                     CD3_Tumour_Den = c(868, 30.4, 53.2, 725, 531, 5.94), CD8_Tumour_Den = c(1066, 
                                                                                             75.2, 42, 594, 416, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                           "tbl", "data.frame"), .Names = c("ID", "Image_Tag", "CD3_Global_Den", 
                                                                                                                                                                                                                                            "CD8_Global_Den", "CD20_Global_De", "CD3_Tumour_Den", "CD8_Tumour_Den"
                                                                                                                                                                                                           ))

# Suggested code
library(hablar)
library(dplyr)
options(pillar.sigfig = 6)

df %>% group_by(ID) %>% 
  summarise_all(~mean_(.)) %>% 
  mutate_at(vars(-ID), ~. + 0.5)

which gives the result:

# A tibble: 6 x 7
  ID    Image_Tag CD3_Global_Den CD8_Global_Den CD20_Global_De CD3_Tumour_Den CD8_Tumour_Den
  <chr>     <dbl>          <dbl>          <dbl>          <dbl>          <dbl>          <dbl>
1 IM_10        NA          608.5          755.5       51.5             868.5          1066.5
2 IM_11        NA           28             69.8        1.05             30.9            75.7
3 IM_12        NA           20.1           17.5        1.53             53.7            42.5
4 IM_13        NA          109.5           89.5       48.2             725.5           594.5
5 IM_14        NA          219.5          171.5        1.00100         531.5           416.5
6 IM_15        NA            4.5            0.5        0.5               6.44            0.5
0
votes

If you only want to add one df%>% map_if(is.numeric, ~.+1)