0
votes

We have website that uses this query:

SELECT did, kid FROM top_keywords WHERE MATCH('@keyword "^EXAMPLE KEYWORD$"') LIMIT 0,

100;

It works great in 99% times, but with some encoding it doesn't work. Example:

SELECT did, kid FROM top_keywords WHERE MATCH('@keyword "^εργον$"') LIMIT 0, 100;

Produces error:

ERROR 1064 (42000): index top_keywords: syntax error, unexpected '$', expecting TOK_KEYWORD or TOK_INT near 'εργον$"'

My sphinx version is 2.0.6.

My only idea is that has something to do with conf-charset-type.

1

1 Answers

1
votes

I tried copy/pasting your word εργον into http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder

It appears to be composed entirely of non-ascii UTF8 chars. (ie the codes are all 255+)

So, ALL those letters whould need to be in charset_table for it to work.


I'm guessing they are not in you charset_table (just setting charset_type=utf8 is NOT enough), in which case they are completely stripped, so the query becomes

SELECT did, kid FROM top_keywords WHERE MATCH('@keyword "^ $"') LIMIT 0, 100;

... as the letters are all taken as seperators, which clearly leaves you an invalid query.


Unfortunately I can't give you any good references for charset_table for international support (dont know any!), but perhaps start on the wiki http://sphinxsearch.com/wiki/doku.php?do=search&id=charset_table