So I am making a search engine for a site using Zend_Search_Lucene
I am currently using Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive which works fine, except for one thing: it makes distinctions between accented and not accented characters
In google (and other search engines) when you search for "χιονι" it will return results for all variations of it, like "χιόνι" which is the correct accented version in greek (χιόνι = snow btw). In lucene (in general, not only Zend_Search_Lucene) this is not default or even bundled behavior from what I've seen
My first attempt for a solution was to do kind of what lucene does for case insensitive search - analyzers, remove accents from letters the same way case insensitive analyzers simply make everything lowercase during indexing & searching (ie $str = strtr($str, 'ό', 'ο'))
The only reason this failed is because php does not have an mb_strtr and strtr does not work for multibyte characters like this, and preg_replace just didn't work either
Is there a way to make lucene search in "accent-insensitive" mode (an analyzer probably?), or alternatively a way to unaccent multibyte characters in php (I also did search on this with no results)?
Mind that what I want to search for is not western-european accented characters for which there are some unaccent solutions for php on the web