I have indexed a property on OrientDB using Lucene's keyword analyzer:
CREATE INDEX Snippet.ssdeep ON Snippet (ssdeep) FULLTEXT ENGINE LUCENE METADATA {"analyzer":"org.apache.lucene.analysis.core.KeywordAnalyzer"}
The filed contains simhashes that I have indexed for test.
Now when I search using Lucene, I get a response for the exact queries, but not for the fuzzy queries (despite properly escaping the query text).
For instance, given a field with the value "192:d4e1GDZYDUZrw9AfCB+A66ancCZmx9n2P:2e1GW18A66ac/YP", the following query yields one record:
SELECT FROM Snippet WHERE ssdeep LUCENE "192\\:d4e1GDZYDUZrw9AfCB\\+A66ancCZmx9n2P\\:2e1GW18A66ac\\/YP"
While this query yields no records:
SELECT FROM Snippet WHERE ssdeep LUCENE "192\\:d4e1GDZYDUZrw9AfCB\\+A66ancCZmx9n2P\\:2e1GW18A66ac\\/YP~0.9"
I wonder what is preventing Lucene from finding approximative results? More particularly is it Lucene (or the KeywordAnalyzer) that is not apt in fuzzy searching such strings, or is it the interface between Lucene and OrientDB that is at cause?
i.e. I have other full text Lucene indexes on the same database that work, but all those fields contain ordinary text and are analyzed using Simple or Standard analyzers. This is the only field I really need a full text index on, and it fails to work.