0
votes

I need to create a new column which is a function of two or three other columns, one of which contains some missing data (NAs). However, when I use dplyr's mutate function, the new column contains all NAs.

See the example below:

rand_df <- data.frame(replicate(10,sample(0:10,200,rep=TRUE))) # random df
names(rand_df) <- letters[seq(from=1, to=10)]  #renaming header
rand_df$c[2:20] <- NA  # introducing NAs
head(rand_df)

 a b  c d  e f  g h i j
1  3 1  8 2  4 3  1 9 2 9
2  6 1 NA 1  2 8  8 6 0 9
3  5 7 NA 2  4 1  7 7 3 0
4 10 8 NA 6  6 7  0 2 2 0
5  4 1 NA 9  3 8  2 2 5 2
6 10 8 NA 3 10 2 10 4 5 5

Trying to create a new column

rand_df <- rand_df %>% mutate(k = 141 * min((c/88.42), 1))

head(rand_df):

a b  c d  e f  g h i j  k
1  3 1  8 2  4 3  1 9 2 9 NA
2  6 1 NA 1  2 8  8 6 0 9 NA
3  5 7 NA 2  4 1  7 7 3 0 NA
4 10 8 NA 6  6 7  0 2 2 0 NA
5  4 1 NA 9  3 8  2 2 5 2 NA
6 10 8 NA 3 10 2 10 4 5 5 NA

I know I could simply use a for loop to cycle through rows individually and skip those containing NAs, but I would like to think there's a better way to do this.

3

3 Answers

4
votes

you can use pmin().

    library(dplyr)
    rand_df <- data.frame(replicate(10,sample(0:10,200,rep=TRUE))) # random df
    names(rand_df) <- letters[seq(from=1, to=10)]  #renaming header
    rand_df$c[2:20] <- NA  # introducing NAs
    head(rand_df)
    #>   a b  c d  e  f g  h  i  j
    #> 1 4 9  9 6 10  2 1 10 10 10
    #> 2 7 3 NA 2  5  9 1  2 10  6
    #> 3 0 3 NA 4  5  6 1  0 10  6
    #> 4 0 7 NA 5  3  6 6  9  4  7
    #> 5 4 4 NA 5  4 10 8  5  6  0
    #> 6 1 3 NA 3  0 10 1  3  7  4


    rand_df <- rand_df %>% mutate(k = 141 * pmin((c/88.42), 1))
    head(rand_df)
    #>   a b  c d  e  f g  h  i  j        k
    #> 1 4 9  9 6 10  2 1 10 10 10 14.35196
    #> 2 7 3 NA 2  5  9 1  2 10  6       NA
    #> 3 0 3 NA 4  5  6 1  0 10  6       NA
    #> 4 0 7 NA 5  3  6 6  9  4  7       NA
    #> 5 4 4 NA 5  4 10 8  5  6  0       NA
    #> 6 1 3 NA 3  0 10 1  3  7  4       NA

<sup>Created on 2020-08-17 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

3
votes

The following line of code is failing because min((c/88.42),1)) does not do the calculation based on each row, but using the entire column, so you just have same value repeated:

rand_df <- rand_df %>% mutate(k = 141 * min((c/88.42), 1))

This is a good illustration of the behaviour:

rand_df %>% mutate(k = min(f), k1 = max(f))) 

There are different ways to solve this, but one is to add the row number as a column and then use group_by:

rand_df  %>% 
  mutate(row = row_number()) %>% 
  group_by(row) %>% 
  mutate(k = 141 * min((c/88.42), 1))
2
votes

Seems like you've added extra argument within your ifelse() function. I mean 33.5 is needless here.

Also, next time, please, be sure to ask properly (according to this guide How to make a great R reproducible example)