1
votes

I have a dataframe df and would like to make a new column populated by the minimum value by group of a second column. Prior posts do not address this issue in the context of making new columns while preserving the original rows and columns of the dataframe.

Suppose this sample input:

a <- c(1,1,1,1,2,2,2,2)
b <- c(NA,1,2,2,3,5,6,NA)  
df <- data.frame(a,b)
df

a   b
1   NA          
1   1           
1   2           
1   2           
2   3           
2   5           
2   6           
2   NA          

What I want to achieve is this output:

a   b   Min_b
1   NA  1           
1   1   1           
1   2   1           
1   2   1           
2   3   3           
2   5   3           
2   6   3           
2   NA  3       

Here are my attempts with corresponding output:

df %>% group_by(a) %>% mutate(Min_b = min(b, na.rm = TRUE))

a   b   Min_b
1   NA  1           
1   1   1           
1   2   1           
1   2   1           
2   3   1           
2   5   1           
2   6   1           
2   NA  1       

The above gives me the minimum of column b, rather than the minimum of column b by the groups of column a (i.e., I want the latter).

df %>% group_by(a) %>% top_n(-1, wt = b)

a   b
1   1
2   3

The above works for finding the right values but does not seem to work within mutate, as follows:

df1 %>% group_by(a) %>% mutate(Min_of_b = top_n(-1, wt = b))

Error in is_scalar_integerish(n) : argument "n" is missing, with no default

Thank you for any suggestions on alternative ways to do this!

1
It would be much easier for us to help if you provided sample data in your question, perhaps as simple as dput(head(df)). Additionally, it is not clear to me what your expected ouput should look like.r2evans
Not clear how you expect your ideal output, but based on what you said I think you should use df %>% group_by(id) %>% mutate(new_column = min(second_column)) instead.AntoniosK
Check your package version? df %>% group_by(a) %>% mutate(Min_b = min(b, na.rm = TRUE)) works for me....A5C1D2H2I1M1N2O1R2T1

1 Answers

0
votes

I figured out my error. I likely had loaded plyr after loading dplyr, which made group_by work incorrectly. To fix the issue, I detached plyr as follows:

detach(package:plyr)

My group_by then worked properly, with the following code (given the same dataframe above):

df %>% group_by(a) %>% mutate(Min_b = min(b, na.rm = TRUE))

a   b   Min_b
1   NA  1           
1   1   1           
1   2   1           
1   2   1           
2   3   3           
2   5   3           
2   6   3           
2   NA  3