1
votes

Background

I have a dataset, df, where I would like to aggregate multiple columns and create a new column. I need to multiply Type, Span and Population columns and create a new Output column

ID       Status      Type     Span   State   Population

A        Yes         2        70%    Ga      10000

Desired output

ID        Status     Type      Span   State   Population   Output

A         Yes        2         70%    Ga      10000        14000      

dput

structure(list(ID = structure(1L, .Label = "A ", class = "factor"), 
Status = structure(1L, .Label = "Yes", class = "factor"), 
Type = 2L, Span = structure(1L, .Label = "70%", class = "factor"), 
State = structure(1L, .Label = "Ga", class = "factor"), Population = 10000L), class = "data.frame", 
row.names = c(NA, 
-1L))

This is what I have tried

 df %>% 
 mutate(Output = Type * Span * Population)
2

2 Answers

2
votes

Here, we are creating a new column based on the inputs from different column. We can just use mutate to get the Span percent of Population and multiply by 'Type'. Note that 'Span' is not numeric, as it is having %, so we extract the numeric part with parse_number divide by 100, then multiply with Population along with the 'Type'

library(dplyr)
df %>%
  mutate(Output = Type * Population * readr::parse_number(as.character(Span))/100)
#   ID Status Type Span State Population Output
#1 A     Yes    2  70%    Ga      10000  14000

If the columns 'Type', 'Population' are not numeric, it is better to convert to numeric with as.numeric(as.character(df$Type)) and for 'Population' (assuming they are factor class). Another option is type.convert(df, as.is = TRUE) and then work on that modified class dataset

2
votes

We can remove the '%' sign using sub, convert to numeric and multiply values.

This can be done in base R as :

df$output <- with(df, Type * as.numeric(sub('%', '', Span)) * Population/100)
df

#  ID Status Type Span State Population  output
#1 A     Yes    2  70%    Ga      10000   14000