0
votes

I have a function that behaves incorrectly when passed through the mutate function from the dplyr package. The function takes a UK postcode and returns a postal area. It works fine with individual post codes or vectors of postcodes.

Here is the function:

pArea_parse <- function(x) {
z <- any(grep('[A-Z][A-Z]',substr(x,1,2)))
y <- any(grep('[A-Z][0-9]',substr(x,1,2)))

if (z) {
    return(substr(x,1,2))
    }
else if (y) {
        return(substr(x,1,1))
        }
else if (!y & !z) {
    return(NA)
        }
}

It works:

x <- "B30 1AA" # plucked randomly from a postcode site
> pArea_parse(x)
[1] "B"

Here is some sample data:

test <- data.frame(id = c(1,2,3,4), post_code = c("B30 1AA", "B30 3FT", "B30 
3AZ", "BA1 8TU"))

Here is my dplyr code:

test %>% mutate(postal_area = pArea_parse(post_code))

Instead of returning the first letter when there is a letter followed by a number, it returns the letter and the number, even though this doesn't happen with a vector of postcodes or an individual postcode.

id post_code postal_area
1   B30 1AA          B3
2   B30 3FT          B3
3   B30 3AZ          B3
4   BA1 8TU          BA

How can a function do something it's not programmed to do when used in conjunction with mutate? I am stumped!

1
I think the issue is that your function does not work correctly with vectors. - Kerry Jackson
I think you probably wanted to structure this around ifelse, or even better case_when rather than a traditional if/else clause. The former are vectorized. - joran
How does one properly vectorize a function? Why would not vectorizing my function produce the observed behvaiour? Thanks. - CClarke
If you use purrr::map with your function and tidyr::unnest, you could avoid vectorizing. test %>% mutate(postal_area = map(post_code, pArea_parse)) %>% unnest() - AndS.

1 Answers

2
votes

Your use of any() and if/else makes your function non-vectorized. That is, if you pass in a vector of values, you do not get the right vector of values out. This is not specific to mutate(). If you try your function outside of mutate,, you'll get the same result

pArea_parse(c("B30 1AA", "B30 3FT", "B30 3AZ", "BA1 8TU"))
# [1] "B3" "B3" "B3" "BA"

You can make this easier using the dplyr helper function case_when. For example

pArea_parse <- function(x) {
  z <- grepl('[A-Z][A-Z]',substr(x,1,2))
  y <- grepl('[A-Z][0-9]',substr(x,1,2))

  case_when(z~substr(x,1,2),
            y~substr(x,1,1),
            TRUE~NA_character_)
}