0
votes

I have a 'Agency_Reference' table containing column 'agency_lookup', with 200 entries of strings as below :

  1. alpha
  2. beta
  3. gamma etc..

I have a dataframe 'TEST' with a million rows containing a 'Campaign' column with entries such as :

  1. Alpha_xt2010
  2. alpha_xt2014
  3. Beta_xt2016 etc..

i want to loop through for each entry in reference table and find which string is present within each campaign column entries and create a new agency_identifier column variable in table.

my current code is as below and is slow to execute. Requesting guidance on how to optimize the same. I would like to learn how to do it in the data.table way

 Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta'))
 TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))
 TEST$agency_identifier <- 0
 for (agency_lookup in  as.vector(Agency_Reference$agency_lookup)) {
 TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}

Expected Output :

Campaign----Agency_identifier

alpha_xt123---alpha

ALPHA34----alpha

Beta_xyz_34----beta

BETa_testing----beta

code_delta_-----delta

2
Please show a small reproducible example and expected outputakrun
@akrun : the initial code I had posted, had errors. Hence have edited the code to show the actual code I am using currently. please let me know if additional info is required, to help on this query.jeganathan velu
Your code is giving errors esp. the for loop. What is the expected outputakrun

2 Answers

1
votes

This will not answer your question per se, but from what I understand you want to dissect the Campaign column and do something with the values it provides.

Take a look at Tidy data, more specifically the part "Multiple variables stored in one column". I think you'll make some great progress using tidyr::separate. That way you don't have to use a for-loop.

1
votes

Try

TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))

pattern = tolower(c('alpha','Beta','gamma','delta','zeta'))

TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'),
                              replacement = '\\1',
                              x = tolower(TEST$Campaign))