0
votes

I have two dataframes (DfA and DfB). Each dataframe has three factor variables: species, type and region. DfA also has a numeric value column, and I want to use it to estimate numeric values in a new column of DfB, based on shared attributes.

I have a function which asks for the species, type and region, then creates a subset of DfA with those attributes and runs an algorithm on the subset to estimate the new value. When I run the function and specify the values manually as a test, it works fine.

If all of the factor levels and combinations in DfB have matching factors in DfA, the function works fine with mapply. But if any row in DfB contains a factor level that is not present in DfA, I get an error (level sets of factors are different). Example: if DfA includes data for regions A,B and C, and DfB contains data for regions A,B,C and D, mapply returns the error; if I remove the rows with region D, the mapply function works.

How can I specify that, if the row contains a factor level that makes the function impossible, to skip it or put NA in instead and move on to run the function on the rows for which the function works?

1
can you post a representative sample of your two data frames, and the sample code for your function? - Gary Weissman

1 Answers

0
votes

You can drop/add levels to your data.frames to make sure your function works rather than cater for a special case:

# dropping and setting levels
Z = as.factor(sample(LETTERS[1:5],20,replace=T))
levels(Z)
Y = as.factor(Z[-which(Z %in% LETTERS[4:5])])
levels(Y)
Y=droplevels(Y) # drop the levels
levels(Y)
levels(Y) = levels(Z) # bring them back
levels(Y)
Y = factor(Y,levels=LETTERS[1:7]) # expand them
levels(Y)
attr(Y,"levels")
attr(Y,"levels") = LETTERS[1:8] # keep expanding them
levels(Y)
require(plyr)
Y = mapvalues(Y,levels(Y),letters[1:length(levels(Y))]) # change the labels of the levels
levels(Y)
x<-factor(Y, labels=LETTERS[(length(unique(Y))+1):(2*length(unique(Y)))]) # change the labels of the levels on another variable

In your case:

dfa = data.frame("LVL1"=as.factor(sample(LETTERS[1:2],20,replace=T)))
dfb = data.frame("LVL2"=as.factor(sample(LETTERS[2:5],20,replace=T)))
newLevels = sort(unique(union(levels(dfa$LVL1),levels(dfb$LVL2))))
dfa$LVL1 = factor(dfa$LVL1,levels=newLevels)
dfb$LVL2 = factor(dfb$LVL2,levels=newLevels)