4
votes

Short version:

What kind of escape sequence can one use to search for unicode characters in command line mysql?

Long version:

I'm looking for a way to search a column for records containing a unicode sequence, U+200B, in mysql from the command line. I can't figure out which kind of escape to use. I've tried \u200B and x200B and even ​ I finally found one blog that suggested the _utf8 syntax. This will produce the character on the command line:

select _utf8 x'200B';

Now I'm stuck trying to get that working in a "LIKE" query.

This generates the characters, but the % seem to lose their special meaning when placed in the LIKE part:

select _utf8 x'0025200B0025';

I also tried a concat but it didn't work either:

select concat('%', _utf8 x'200B', '%');

More background:

I have some data that has zero width space characters (zwsp) in it, Unicode Point U+200B. This is typically caused by copy/paste from websites that use the zwsp in their output. With most unicode characters, I can just paste the character into the terminal (or create it with a keycode), but since this one is invisible it's a bit more challenging. I can create a file that generates a "%%" sequence and copy/paste it to the terminal and it will work but it leaves my command history and terminal output screwy. I would think there is a straightforward way to do this in MySQL, but so far I've come up short.

Thanks in advance,

-Paul Burney

2

2 Answers

5
votes
select _utf8 x'0025200B0025';

That's not UTF-8, it's UTF-16/UCS-2. You might be able to say SELECT _ucs2 0x0025200B0025 if you have UCS-2 support in your copy of MySQL.

Otherwise, the byte sequence encoding character U+200B in UTF-8 would be 0xE2, 0x80, 0x8B:

select 0xE2808B;
2
votes

If it is Linux then hold Ctrl + Shift + U then release the U and type 200B.