How to avoid NA columns in dcast() output?

Question

How can I avoid NA columns in dcast() output from the reshape2 package?

In this dummy example the dcast() output will include an NA column:

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length))
##     Species setosa versicolor virginica NA
##1     setosa     44          0         0  6
##2 versicolor      0         50         0  0
##3  virginica      0          0        50  0

For a somewhat similar usecase, table() does have an option that allows to avoid this:

table(iris[ , c(5,6)], useNA = "ifany")  ##same output as from dcast()
##            Species2
##Species      setosa versicolor virginica <NA>
##  setosa         44          0         0    6
##  versicolor      0         50         0    0
##  virginica       0          0        50    0
table(iris[ , c(5,6)], useNA = "no")  ##avoid NA columns
##            Species2
##Species      setosa versicolor virginica
##  setosa         44          0         0
##  versicolor      0         50         0
##  virginica       0          0        50

Does dcast() have a similar option that removes NA columns in the output? How can I avoid getting NA columns? (This function has a number of rather cryptic options that are sternly documented and that I cannot quite grasp...)

You could do dcast(na.omit(iris), Species ~ Species2, value.var = "Sepal.Width"), but this isn't very general solution if you are interested in some other columns too. — David Arenburg
@DavidArenburg Indeed. I was aware of na.omit(iris)-like solutions, but I was looking for a different approach. I didn't include this requirement in the question to avoid making it too confusing... — landroni
If I had to guess, I'd say it's intended behaviour so you need to consciously remove missing data (instead of doing that accidentally). I would solve it by selecting the data first, so iris[!is.na(iris$Species2),]. — Heroka
@David if it's only NA's in a certain column that need to be removed. — Heroka

pgoel6uc pgoel6uc · Accepted Answer · 2017-02-17T21:56:15

Here is how I was able to get around it:

iris[is.na(iris)] <- 'None'

x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)

x$None <- NULL

The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.

How to avoid NA columns in dcast() output?

4 Answers