I have a data frame with one ID column and multiple numeric columns containing density measurements. To make the densities normally distributed, I need to take the log, but because I have 0 density values, I need to increase all my density measurements by 0.5 to not get Inf data points when I log transform. How do I do that using dplyr?
Sample Data:
ID `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IM_10 NA 608. 755. 51.0 868. 1066.
2 IM_1… NA 27.5 69.3 0.550 30.4 75.2
3 IM_1… NA 19.6 17.0 1.03 53.2 42.0
4 IM_1… NA 109. 89.0 47.7 725. 594.
5 IM_1… NA 219. 171. 0.501 531. 416.
6 IM_1… NA 4.00 0 0 5.94 0
I tried using
df1 <- df %>% group_by(ID) %>%
summarise_all(funs(mean(., na.rm=TRUE))) %>%
mutate_at(which(sapply(., is.numeric)), funs(sum(0.5)))
but that replaces all my numeric columns with 0.5, instead of adding 0.5 to the original densities.
ID `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IM_10 0.5 0.5 0.5 0.5 0.5 0.5
2 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
3 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
4 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
5 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
6 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
Any ideas how to do this?