So " xx yy 11 22 33 "
will become "xxyy112233"
. How can I achieve this?
10 Answers
In general, we want a solution that is vectorised, so here's a better test example:
whitespace <- " \t\n\r\v\f" # space, tab, newline,
# carriage return, vertical tab, form feed
x <- c(
" x y ", # spaces before, after and in between
" \u2190 \u2192 ", # contains unicode chars
paste0( # varied whitespace
whitespace,
"x",
whitespace,
"y",
whitespace,
collapse = ""
),
NA # missing
)
## [1] " x y "
## [2] " ← → "
## [3] " \t\n\r\v\fx \t\n\r\v\fy \t\n\r\v\f"
## [4] NA
The base R approach: gsub
gsub
replaces all instances of a string (fixed = TRUE
) or regular expression (fixed = FALSE
, the default) with another string. To remove all spaces, use:
gsub(" ", "", x, fixed = TRUE)
## [1] "xy" "←→"
## [3] "\t\n\r\v\fx\t\n\r\v\fy\t\n\r\v\f" NA
As DWin noted, in this case fixed = TRUE
isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.
If you want to remove all types of whitespace, use:
gsub("[[:space:]]", "", x) # note the double square brackets
## [1] "xy" "←→" "xy" NA
gsub("\\s", "", x) # same; note the double backslash
library(regex)
gsub(space(), "", x) # same
"[:space:]"
is an R-specific regular expression group matching all space characters. \s
is a language-independent regular-expression that does the same thing.
The stringr
approach: str_replace_all
and str_trim
stringr
provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top of stringi
, mentioned below). The equivalents of the above commands, using [str_replace_all][3]
, are:
library(stringr)
str_replace_all(x, fixed(" "), "")
str_replace_all(x, space(), "")
stringr
also has a str_trim
function which removes only leading and trailing whitespace.
str_trim(x)
## [1] "x y" "← →" "x \t\n\r\v\fy" NA
str_trim(x, "left")
## [1] "x y " "← → "
## [3] "x \t\n\r\v\fy \t\n\r\v\f" NA
str_trim(x, "right")
## [1] " x y" " ← →"
## [3] " \t\n\r\v\fx \t\n\r\v\fy" NA
The stringi
approach: stri_replace_all_charclass
and stri_trim
stringi
is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:
library(stringi)
stri_replace_all_fixed(x, " ", "")
stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")
Here "\\p{WHITE_SPACE}"
is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to "[[:space:]]"
, "\\s"
and space()
. For more complex regular expression replacements, there is also stri_replace_all_regex
.
stringi
also has trim functions.
stri_trim(x)
stri_trim_both(x) # same
stri_trim(x, "left")
stri_trim_left(x) # same
stri_trim(x, "right")
stri_trim_right(x) # same
The function str_squish()
from package stringr
of tidyverse does the magic!
library(dplyr)
library(stringr)
df <- data.frame(a = c(" aZe aze s", "wxc s aze "),
b = c(" 12 12 ", "34e e4 "),
stringsAsFactors = FALSE)
df <- df %>%
rowwise() %>%
mutate_all(funs(str_squish(.))) %>%
ungroup()
df
# A tibble: 2 x 2
a b
<chr> <chr>
1 aZe aze s 12 12
2 wxc s aze 34e e4
This way you can remove all spaces from all character variables in your data frame. If you would prefer to choose only some of the variables, use mutate
or mutate_at
.
library(dplyr)
library(stringr)
remove_all_ws<- function(string){
return(gsub(" ", "", str_squish(string)))
}
df<-df %>% mutate_if(is.character, remove_all_ws)