0
votes

I use Sphinx search version 2.2.11.

Sphinx returns data without accents/diacritics, eg. "cerny" instead of "černý".

It will return correct items, even if the query itself has accents/diacritics, only the encoding of the results is wrong.

I know I had this problem before, but I can't remember how I solved, it was 3 years ago, I think it was on version 2.1.something then.

Maybe its somehow badly indexed?

Relevant part of my config:

searchd {
        ...
        collation_server = utf8_general_ci
}

index xxx {
        source = xxxSrc
        path = /var/lib/sphinxsearch/xxx
        charset_table = 0..9, A..Z->a..z, _, a..z, U+0e1->a, U+0c1->a, U+10d->c, U+10c->c,
        U+10f->d, U+10e->d, U+0e9->e, U+0c9->e, U+11b->e, U+11a->e, U+0ed->i, U+0cd->i, U+148->n,
        U+147->n, U+0f3->o, U+0d3->o, U+159->r, U+158->r, U+161->s, U+160->s, U+165->t, U+164->t,
        U+0fa->u, U+0da->u, U+16f->u, U+16e->u, U+0fd->y, U+0dd->y, U+17e->z, U+17d->z,
        index_exact_words = 1
        docinfo = extern
        morphology = stem_cz
        min_stemming_len = 5
        min_infix_len = 3
}

Thx for any help.

1

1 Answers

0
votes

Ok, not actaually sphinx issue, but in ODBC/Oracle

this fixed it

export NLS_LANG="CZECH_CZECH REPUBLIC.AL32UTF8"