1
votes

I have a dataframe with incorrect values for two variables (Lat and Lon). The incorrect values in the dataframe are listed as 999.00, and the correct values should be 42.68 and -72.47, respectively.

I would like an easy way to replace these values using dplyr, but my attempts (see below) have been unsuccessful (errors provided below).

df$Lat2 <- recode(df$Lat, "999.00"="42.68", .default=x)

Error in lapply(x, f) : object 'x' not found

df <- df %>%
mutate(Lat2 = if_else(Lat == 999.00, 42.68, NULL, NULL))

Error in mutate_impl(.data, dots) : Evaluation error: unused argument (recvLat = 999).

df <- df %>%
mutate(Lat2 = ifelse(Lat == 999.00, 42.68, NULL))

Error in mutate_impl(.data, dots) : Evaluation error: replacement has length zero. In addition: Warning message: In rep(no, length.out = length(ans)) : 'x' is NULL so the result will be NULL

df <- df %>%
mutate(Lat2 = case_when(Lat == 999.00 ~ 42.68, TRUE ~ NULL))

Error in mutate_impl(.data, dots) : Evaluation error: subscript out of bounds.

For the latter three attempts, I get the same error if the number are in quotes (i.e. "999.00" and "42.68")

3
Your ifelse statement should have the actual column as the alternative, not NULL, i.e. ifelse(..,..., df$Lat)Sotos
couldn't this be solved using df$Lat <- df$Lat %>% gsub("999.00", "42.68")?huan
@huan... No. df$Lat is a numeric variable, not a stringSotos
Something like: as_tibble(df) %>% mutate(Lon = case_when(revcLat == 999 ~ 42.68), Lat2 = case_when(Lat == 999 ~ -72.47))Mads Obi

3 Answers

2
votes

why not use this.

data=as.data.frame(matrix(0,3,3))
names(data)=c("a","b","c")
data$a[1]=999
data$c[2]=999
data$a[which(data$a==999)]=42.68
data$c[which(data$c==999)]=-72.47
data
      a b      c
1 42.68 0   0.00
2  0.00 0 -72.47
3  0.00 0   0.00
1
votes

Actually, this is a nice case to show the elegance of data.table as well.

library(data.table)

## Create example
data <- data.table(lat = c(999, 0, 0),
                   lon = c(0, 999, 0))

## Reassign values
data[lat==999, lat := 42.68]
data[lon==999, lon := -72.47]

## Print results
data
#      lat    lon
# 1: 42.68   0.00
# 2:  0.00 -72.47
# 3:  0.00   0.00

The downside is that you have to remember that := is needed for assignment.

The upsides are

  • You can refer to variables by name without quotes
  • Memory efficient, especially important for large data sets
  • Doesn't have dependencies
  • The data.table syntax is much simpler and consistent (i, j, group by)
  • You don't need to memorize a bunch of functions with weird names that may and/or may not be spelled Aussie style (like colour or summarise)
  • You can use base R more, which makes your code more portable and widely understood
  • The data.table class inherits the data.frame class, so it's more compatible within R
0
votes

We can place NULL in a list

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, list(NULL)))
#  recvLat  Lat2
#1   999.0 42.68
#2     1.5  NULL
#3     2.5  NULL

Instead of NULL, it can be NA

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, NA_real_))
#  recvLat  Lat2
#1   999.0 42.68
#2     1.5    NA
#3     2.5    NA

If we want to do the opposite, just use !=

df %>%
    mutate(Lat2 = ifelse(recvLat != 999.00, 42.68, NA_real_))

Based on the OP's comment,

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, recvLat))

In base R, we can do this by creating an index

i1 <- df$recvLat == 999
df$recLat[i1] <- 42.68

NOTE: Both the solutions work.

data

df <- data.frame(recvLat = c(999, 1.5,  2.5))