1
votes

I have a data frame like in the example bellow that contains names of cities, and I need to replace some names in order to remove spaces and special characters like ~ and ´.

df = data.frame( city = c('São Paulo', 'Belo Horizonte', 'Natal', 'Goiânia', 'Manaus'))

The problem is that I need to keep the names that do not need to be changed. I am using the mutate function bellow, but it replaces the names with no space or special characters by numbers.

df = df %>% mutate(city_correct = ifelse(city == 'São Paulo', 'Sao.Paulo', ifelse(city == 'Belo Horizonte', 'Belo.Horizonte', ifelse(city == 'Goiânia', 'Goiania', city ))))

Does anyone know how I can make the function above work?

2

2 Answers

0
votes

The column is factor, either we convert to as.character or make use of stringsAsFactors = FALSE

df <- data.frame( city = c('São Paulo', 'Belo Horizonte', 'Natal', 
     'Goiânia', 'Manaus'), stringsAsFactors = FALSE)

Now, the OP's code would work

library(dplyr)
df %>%
    mutate(city_correct = ifelse(city == 'São Paulo', 'Sao.Paulo', 
      ifelse(city == 'Belo Horizonte', 'Belo.Horizonte',
      ifelse(city == 'Goiânia', 'Goiania', city ))))
#            city   city_correct
#1      São Paulo      Sao.Paulo
#2 Belo Horizonte Belo.Horizonte
#3          Natal          Natal
#4        Goiânia        Goiania
#5         Manaus         Manaus

The issue is that factor columns gets coerceed to integer storage values within the ifelse and this result in showing numbers such as 4, 3 in the output column


In addition to the OP's method, this could be done in a simpler way with chartr and str_replace

library(stringr)
df %>% 
    mutate(city_correct = str_replace(chartr('ãâ', 'aa', city), ' ', '.'))
#           city   city_correct
#1      São Paulo      Sao.Paulo
#2 Belo Horizonte Belo.Horizonte
#3          Natal          Natal
#4        Goiânia        Goiania
#5         Manaus         Manaus
0
votes

I don't know if it would be just for this specific case. But if you complete all the information of the 'city' column, it works.

df = data.frame( city = c('São Paulo', 'Belo Horizonte', 'Natal', 'Goiânia', 'Manaus'))

df = df %>% mutate(city_correct = ifelse(city == 'São Paulo', 'Sao.Paulo', ifelse(city == 'Belo Horizonte', 'Belo.Horizonte', ifelse(city == 'Natal', 'Natal', ifelse(city == 'Goiânia', 'Goiania', ifelse(city == 'Manaus', 'Manaus', city ))))))

df