3
votes

How can I avoid NA columns in dcast() output from the reshape2 package?

In this dummy example the dcast() output will include an NA column:

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length))
##     Species setosa versicolor virginica NA
##1     setosa     44          0         0  6
##2 versicolor      0         50         0  0
##3  virginica      0          0        50  0

For a somewhat similar usecase, table() does have an option that allows to avoid this:

table(iris[ , c(5,6)], useNA = "ifany")  ##same output as from dcast()
##            Species2
##Species      setosa versicolor virginica <NA>
##  setosa         44          0         0    6
##  versicolor      0         50         0    0
##  virginica       0          0        50    0
table(iris[ , c(5,6)], useNA = "no")  ##avoid NA columns
##            Species2
##Species      setosa versicolor virginica
##  setosa         44          0         0
##  versicolor      0         50         0
##  virginica       0          0        50

Does dcast() have a similar option that removes NA columns in the output? How can I avoid getting NA columns? (This function has a number of rather cryptic options that are sternly documented and that I cannot quite grasp...)

4
You could do dcast(na.omit(iris), Species ~ Species2, value.var = "Sepal.Width"), but this isn't very general solution if you are interested in some other columns too. - David Arenburg
@DavidArenburg Indeed. I was aware of na.omit(iris)-like solutions, but I was looking for a different approach. I didn't include this requirement in the question to avoid making it too confusing... - landroni
If I had to guess, I'd say it's intended behaviour so you need to consciously remove missing data (instead of doing that accidentally). I would solve it by selecting the data first, so iris[!is.na(iris$Species2),]. - Heroka
@Heroka how would that be better than na.omit? - David Arenburg
@David if it's only NA's in a certain column that need to be removed. - Heroka

4 Answers

1
votes

Here is how I was able to get around it:

iris[is.na(iris)] <- 'None'

x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)

x$None <- NULL

The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.

0
votes

One solution that I've found, which I'm not positively unhappy with, is based on the dropping NA values approach suggested in the comments. It leverages the subset argument in dcast() along with .() from plyr:

require(plyr)
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
            fun.aggregate = length, subset = .(!is.na(Species2))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50

For my particular purpose (within a custom function) the following works better:

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length, subset = .(!is.na(get("Species2")))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50
0
votes

You could rename the NA column of the output and then make it NULL. (This works for me).

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length)) 

setnames(x , c("setosa", "versicolor", "virginica", "newname"))

x$newname <- NULL
-2
votes
library(dplyr)
library(tidyr)
iris %>%
  filter(!is.na(Species2)) %>%
  group_by(Species, Species2) %>%
  summarize(freq = n()) %>%
  spread(Species2, freq)