2
votes

I am new to R, and I have researched vectorization. But I am still trying to train my mind to think in vectorization terms. Very often examples of vectorization instead of loops are either too simple, so it's difficult for me to generalize them, or not present at all.

Can anyone suggest how I can vectorize the following?

Model2 <- subset(Cor.RMA, MODEL == Models.Sort[2,1])
RCM2 <- count(Model2$REPAIR_CODE)
colnames(RCM2) <- c("REPAIR_CODE", "FREQ")
M2M <- merge(RCM.Sort, RCM2, by = "REPAIR_CODE", all.x = TRUE)
M2M.Sort <- M2M[order(M2M$FREQ.x, decreasing = TRUE), ]
M2M.Sort[is.na(M2M.Sort)] <- 0

In the above code, each "2" needs to run from 2 to 85

writeWorksheetToFile(file="CL2 - TC - RC.xlsx", 
                     data = M2M.Sort[ ,c("FREQ.y")], 
                     sheet = "RC by Model", 
                     clearSheets = FALSE,
                     startRow = 6,
                     startCol = 6)

In the above code, "data" should from from "M2M..." to "M85M..." and "startCol" should run from 6 to 89 for an Excel printout.

The data frame this comes from (Cor.RMA) has columns "MODEL", "REPAIR_CODE", and others that are unused. RCM.Sort is a frequency table of each "REPAIR_CODE" across all models that I use as a Master list to adjoin Device-specific Repair Code counts. (left-join: all.x = TRUE) Models.Sort is a frequency table I generated using the "count" function from the plyr package, so I can create subsets for each MODEL. Then I merge a list of each "REPAIR_CODE" that I generated using the "unique" function.

Sample Data:

CASE_NO   DEVICE_TYPE   MODEL   TRIAGE_CODE   REPAIR_CODE
12341     Smartphone    X       TC01          RC01
12342     Smartphone    Y       TC02          RC02
12343     Smartphone    Z       TC01, TC05    RC05
12344     Tablet        AA      TC02          RC37
12345     Wearable      BB      TC05          RC37
12346     Smartphone    X       TC07          RC01
12347     Smartphone    Y       TC04          RC02

I very much appreciate your time and effort if you are willing to help.

1
Could you provide a sample data set? That would make it much easier to understand what you are trying to do. Also, you forgot to define RCM.Sort. As far as I can see you are simply trying to apply count to $REPAIR_CODE for Models.Sort[2:85,1], right? - slamballais
I've made the 2 changes to my query as you suggested. And yes, I need to apply count as you suggest, then feed that into the merge process. - el_dewey
How about aggregate(Cor.RMA$REPAIR_CODE,list(Cor.RMA$MODEL),table)? It's not exactly what you want, but I think that this would be a more coherent format anyway. - slamballais
The only issue with this is the display of the outcome is different in the console as it is when saved as an object, so then the printout to Excel doesn't function well. - el_dewey
Solved that, see answer. - slamballais

1 Answers

1
votes

Alright, this is not what your original script did, but here goes:

models <- c("X","Y","Z","AA","BB") # in your case it would be Models.Sort[2:85,1]
new <- Cor.RMA[Cor.RMA$MODEL %in% models,]
new2 <- aggregate(new$REPAIR_CODE, list(new$MODEL), table)

temp <- unlist(new2[[2]])
temp <- temp[, order(colSums(temp), decreasing = T)]
out <- data.frame(group=new2[,1], temp)
out <- out[order(rowSums(out[,-1]), decreasing = T),]

out
#   group RC01 RC02 RC37 RC05
# 3     X    2    0    0    0
# 4     Y    0    2    0    0
# 1    AA    0    0    1    0
# 2    BB    0    0    1    0
# 5     Z    0    0    0    1

You can then write it easily to an xlsx file, e.g. with:

require(xlsx)
xlsx:::xlsx.write(out,"test.xlsx",row.names=F) 

Edit: Added sorting. Edit2: Fixed sorting.