2
votes

I have demographic data set, which includes the age of people in a household. This is collected via a survey and participants are allowed to refuse providing their age.

The result is a data set with one household per row (each with a household ID code), and various household characteristics such as age in the columns. Refused responses as coded as "R", and you could re-create a sample using the code below:

df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
           AGE1 = c("25", "47", "39", "50", "R"),
           AGE2 = c("66", "23", "71", "R", "16"),
           AGE3 = c("28", "17", "R", "R", "80"),
           AGE4 = c("81", "22", "48", "59", "R"))

df <- as_tibble(df)

> df
# A tibble: 5 x 5
  Household_ID AGE1  AGE2  AGE3  AGE4 
  <chr>        <chr> <chr> <chr> <chr>
1 1A           25    66    28    81   
2 1B           47    23    17    22   
3 1C           39    71    R     48   
4 1D           50    R     R     59   
5 1E           R     16    80    R 

For our intents and purposes we re-code the "R" to "-9" so that we can subsequently convert the format of the AGE columns to integer, and carry out analysis. We usually do this in another software and my objective is to replicate this process in R.

I have managed to do this with the following code:

df <- df %>% mutate(AGE1 = case_when(AGE1 == "R" ~ "-9", TRUE ~ as.character(AGE1)))
df <- df %>% mutate(AGE2 = case_when(AGE2 == "R" ~ "-9", TRUE ~ as.character(AGE2)))
df <- df %>% mutate(AGE3 = case_when(AGE3 == "R" ~ "-9", TRUE ~ as.character(AGE3)))
df <- df %>% mutate(AGE4 = case_when(AGE4 == "R" ~ "-9", TRUE ~ as.character(AGE4)))

Given that this feels clumsy, I tried to find a solution using mutate_if etc. but read that these have been superseded by across(). Hence, I tried to replicate this operation using across():

df <- df %>%
  mutate(across(AGE1:AEG4),
          ~ (case_when(. == "R" ~ "-9")))

But I get the following error:

Error: Problem with `mutate()` input `..2`.
x Input `..2` must be a vector, not a `formula` object.
i Input `..2` is `~(case_when(. == "R" ~ "-9"))`.

Been wrestling with this and googling for a while now but can't figure out what I am missing. Would really appreciate some input on how to get this working, please and thank you.

EDIT: Solved!

df <- df %>%
  mutate(across(AGE1:AGE4, ~ (case_when(.x == "R" ~ "-9", TRUE ~ as.character(.x)))))
3
You need to pass the function to across() itself: df %>% mutate(across(AGE1:AGE4,~case_when(.x == "R" ~ "-9"))) You basically just have a parentheses in the wrong place.MrFlick
now you have a tibble/data.frame full of NAs and some -9 -> from what I understood "R" should be converted to "-9" keep all != "R" as they areDPH
@DPH Good point, I was just trying to fix the code they wrote and point out why the across() wasn't working.MrFlick
Great, thank you both. Now I got this which works: df <- df %>% mutate(across(AGE1:AGE4, ~ (case_when(.x == "R" ~ "-9", TRUE ~ as.character(.x))))) Biased_Observer

3 Answers

2
votes

Why not simply?

df[,2:5][df[, 2:5] == 'R'] <- '-9'

# A tibble: 5 x 5
  Household_ID AGE1  AGE2  AGE3  AGE4 
  <chr>        <chr> <chr> <chr> <chr>
1 1A           25    66    28    81   
2 1B           47    23    17    22   
3 1C           39    71    -9    48   
4 1D           50    -9    -9    59   
5 1E           -9    16    80    -9
2
votes

Or maybe this one which is not much difference from dear @TarJae's interpretation:

library(dplyr)
library(stringr)


df %>%
  mutate(across(AGE1:AGE4, ~ str_replace(., "R", "-9")),
         across(AGE1:AGE4, as.integer))

# A tibble: 5 x 5
  Household_ID  AGE1  AGE2  AGE3  AGE4
  <chr>        <int> <int> <int> <int>
1 1A              25    66    28    81
2 1B              47    23    17    22
3 1C              39    71    -9    48
4 1D              50    -9    -9    59
5 1E              -9    16    80    -9

Data:

df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
           AGE1 = c("25", "47", "39", "50", "R"),
           AGE2 = c("66", "23", "71", "R", "16"),
           AGE3 = c("28", "17", "R", "R", "80"),
           AGE4 = c("81", "22", "48", "59", "R"))

df <- as_tibble(df)
1
votes

You could use across with replace.

  1. List to tibble with as_tibble()
  2. replace R with -9
  3. integer class for AGE
df %>% 
  as_tibble() %>% 
  mutate(across(everything(), ~replace(., . ==  "R" , "-9"))) %>% 
  type.convert(as.is=TRUE)

Output:

  Household_ID  AGE1  AGE2  AGE3  AGE4
  <chr>        <int> <int> <int> <int>
1 1A              25    66    28    81
2 1B              47    23    17    22
3 1C              39    71    -9    48
4 1D              50    -9    -9    59
5 1E              -9    16    80    -9