4
votes

I'm having a problem with hidden non-ASCII characters (spaces) in my database.

How can I replace them with normal spaces and convert them before being inserted to avoid future problems?

I'm still not 100% sure what's happening, but I think it's with the non-ASCII spaces. Any advice to help track it down will help.


Here's what's happening:

I have a database with keywords and if I search for "test keyword", nothing shows up. I know for a fact that "test keyword" is in the database.

If I search for "test" or "keyword", it will show up.

If I do a query with:

SELECT * FROM keywords WHERE keyword regexp '[^ -~]'; (found here)

It will display "test keyword" - giving me the conclusion there is a non-ASCII character with the space in "test keyword".

4
ASCII refers to the first 128 characters in most character sets, the latin alphabet and numbers and such. I don't think it's what you mean to be saying.Dan Grossman
An ASCII space is a normal space. Perhaps you could give an example of what problem you're experiencing.thomasrutter
That aside, please describe the actual problem you're having, not what you think you should fix even though you're not sure what's happening. Bad questions get bad answers.Dan Grossman
Do you mean control or non-printable characters? Spaces are considered both visible and printable, not hidden.Ray Toal
@Dan Grossman actually ASCII is the first 128 charactersthomasrutter

4 Answers

5
votes

This works with PHP:

str_replace("\xA0", ' ', $keyword)

Now i'm trying to replace all existing ones in the database.

I think this should be working, but it's not:

update keywords set keyword = replace(keyword, char(160), " ") WHERE keyword regexp char(160);

Any ideas?

3
votes

I had the same issue and was able to create a update query to replace (in my case) non breaking spaces.

First I analyzed the binary values of the strings that had those chars (I used Mysql workbench 'Open value in editor" to do so). I realized that in my case the char(s) that I wanted to replace had a hex value of 'a0'.

Next I went to this page http://www.fileformat.info/info/unicode/char/a0/charset_support.htm and checked all the encodings that interpret a0 as a non breaking space.

Next I built this query

UPDATE keywords SET keyword = TRIM(REPLACE(keyword, CONVERT(char(160) USING hp8), ' '));

, I chose hp8 but utf8 worked as well.

It took me some time to reach this solution...so I hope this helps someone with the same problem, not to lose his mind trying to figure a solution.

1
votes

How about:

update keywords
set keyword = replace(keyword, char(160), ' ') 
WHERE keyword LIKE concat('%',char(160),'%');
0
votes

Do you want to remove all non alphanumeric characters?

$string = “Here! is some text, and numbers 12345, and symbols !£$%^&”;

$new_string = preg_replace(“/[^a-zA-Z0-9\s]/”, “”, $string);