0
votes

How do I replace Unicode hex digits with blanks? While scraping a website, I've found character strings that print as blanks, but are not blanks. For example:

print(str)

prints

3 Max. 11

but

print(charToRaw(str))

prints

33 c2 a0 4d 61 78 2e 20 31 31

How can I replace the hex digits 0xc2a0 with a single blank (" ")?

I have tried

library(stringr)
str_replace_all(str, "[^[:alnum:]]", " ")  

But that also replaces the period

1
c2 a0 is the UTF-8 encoding of U+00A0, NO-BREAK SPACE. You'll be better off using a unicode-aware string function to replace that character with a normal space than dealing with raw UTF-8 bytes.Shawn
A quick search for 'R string functions' suggests that something like gsub("\u00a0", " ", str) might do the trick.Shawn

1 Answers

0
votes

Shawn's suggestion works perfectly - thank you Shawn. Refer to his comments above.

The answer is

gsub("\u00a0", " ", str)