R: mapply function returning error: level sets of factors are different

Question

I have two dataframes (DfA and DfB). Each dataframe has three factor variables: species, type and region. DfA also has a numeric value column, and I want to use it to estimate numeric values in a new column of DfB, based on shared attributes.

I have a function which asks for the species, type and region, then creates a subset of DfA with those attributes and runs an algorithm on the subset to estimate the new value. When I run the function and specify the values manually as a test, it works fine.

If all of the factor levels and combinations in DfB have matching factors in DfA, the function works fine with mapply. But if any row in DfB contains a factor level that is not present in DfA, I get an error (level sets of factors are different). Example: if DfA includes data for regions A,B and C, and DfB contains data for regions A,B,C and D, mapply returns the error; if I remove the rows with region D, the mapply function works.

How can I specify that, if the row contains a factor level that makes the function impossible, to skip it or put NA in instead and move on to run the function on the rows for which the function works?

can you post a representative sample of your two data frames, and the sample code for your function? — Gary Weissman

Hans Roggeman Hans Roggeman · Accepted Answer · 2014-03-29T08:13:17

You can drop/add levels to your data.frames to make sure your function works rather than cater for a special case:

# dropping and setting levels
Z = as.factor(sample(LETTERS[1:5],20,replace=T))
levels(Z)
Y = as.factor(Z[-which(Z %in% LETTERS[4:5])])
levels(Y)
Y=droplevels(Y) # drop the levels
levels(Y)
levels(Y) = levels(Z) # bring them back
levels(Y)
Y = factor(Y,levels=LETTERS[1:7]) # expand them
levels(Y)
attr(Y,"levels")
attr(Y,"levels") = LETTERS[1:8] # keep expanding them
levels(Y)
require(plyr)
Y = mapvalues(Y,levels(Y),letters[1:length(levels(Y))]) # change the labels of the levels
levels(Y)
x<-factor(Y, labels=LETTERS[(length(unique(Y))+1):(2*length(unique(Y)))]) # change the labels of the levels on another variable

In your case:

dfa = data.frame("LVL1"=as.factor(sample(LETTERS[1:2],20,replace=T)))
dfb = data.frame("LVL2"=as.factor(sample(LETTERS[2:5],20,replace=T)))
newLevels = sort(unique(union(levels(dfa$LVL1),levels(dfb$LVL2))))
dfa$LVL1 = factor(dfa$LVL1,levels=newLevels)
dfb$LVL2 = factor(dfb$LVL2,levels=newLevels)

R: mapply function returning error: level sets of factors are different

1 Answers