0
votes

I have a large data frame that I would like to cast into wide-form data using the dcast() function in the reshape2 package. However, the value column is a character column, but some of the values in it are numeric values in string format. I tried to create a custom aggregate function to deal with this, that will return the mean if there are numeric entries, but return the first entry if all entries are non-numeric. Although the function seems to work, it returns an error when used as fun.aggregate. Below is code with a smaller toy example to demonstrate. What I want is a 3x5 data frame with the first column the grouping variable, 3 columns of numeric values, and 1 column of character values.

mean_with_char <- function(x) {
 xnum <- as.numeric(x)
 if (any(!is.na(xnum))) mean(xnum, na.rm=TRUE) else x[1]
}

library(reshape2)

fakedata <- data.frame(grp1 = rep(letters[1:3],times=20), grp2 = rep(LETTERS[17:20],each=15), val=rnorm(60))
fakedata$val[46:60] <- rep(c('foo','bar','bla','bla','bla','bla'), length.out=15)

# This returns a 3x5 data frame with NA entries.
dcast(fakedata, grp1 ~ grp2, value.var='val', fun.aggregate=mean)

# This returns an error.
dcast(fakedata, grp1 ~ grp2, value.var='val', fun.aggregate=mean_with_char)

Error in vapply(indices, fun, .default) : values must be type 'character', but FUN(X[[1]]) result is type 'double'

1
It looks like vapply wants all the results to be of one variable type instead of a mix. One work around would be to make your numbers (the means) characters and then convert types after. I've used readr::type_convert for this sort of thing. - aosmith

1 Answers

1
votes

Here is the workaround suggested by aosmith. The mean_with_char function returns only character output, and the numstring2num function converts numeric strings to numerics.

mean_with_char <- function(x) {
  xnum <- as.numeric(x)
  if (any(!is.na(xnum))) as.character(mean(xnum, na.rm=TRUE)) else x[1]
}

library(reshape2)

fakedata <- data.frame(grp1 = rep(letters[1:3],times=20), grp2 = rep(LETTERS[17:20],each=15), val=rnorm(60))
fakedata$val[46:60] <- rep(c('foo','bar','bla','bla','bla','bla'), length.out=15)

fakecast <- dcast(fakedata, grp1 ~ grp2, value.var='val', fun.aggregate=mean_with_char)

# Function to change columns in a df that only consist of numeric strings to numerics.
numstring2num <- function(x) {
  xnum <- as.numeric(x)
  if (!any(is.na(xnum)) & !is.factor(x)) xnum else x
}


fakecast[] <- lapply(fakecast[], numstring2num)