0
votes

I'm thinking about migration from Sphinx to Crate, but I can't find any documentation for fulltext query syntax. In Sphinx I can search:

("black cat" -catalog) | (awesome creature)

this stands for EITHER exact phrase "black cat" and no term "catalog" in document OR both "awesome" and "creature" at any position in document

black << big << cat

this requires document to contain all "black", "big" and "cat" terms and also requires match position of "black" be less than match position of "big" and so on.

And I need to search at specific place in the document. In sphinx I was able to use proximity operator as follows

hello NEAR/10 (mother|father -dear)

this requires document to contain "hello" term and "mother" or "father" term at most 10 terms away from "hello" and also term "dear" must not be closer than 10 terms to "hello"

The last construction with NEAR is heavily used in my application. Is it all possible in Crate?

1

1 Answers

0
votes

Unfortunately I cannot comment on how it compares to Sphinx, but I will stick to your questions :)

Crate's fulltext search comes with SQL and Lucene's matching power and therefore should be able to handle complex queries. I'll just provide the queries matching your output I think it should be quite readable.

("black cat" -catalog) | (awesome creature)

select * 
from mytable 
where 
  (match(indexed_column, 'black cat')  using phrase 
     and not match(indexed_column, 'catalog')) 
  or match(indexed_column, 'awesome creature') using best_fields with (operator='and');

black << big << cat

select * 
from mytable 
where 
  match(indexed_column, 'black big cat') using phrase with (slob=100000);

This one is tricky, there doesn't seem to be an operator that does exactly the same as in Sphinx, but it could be adjusted with a "slop" value. Depending on the use case there might be another (better) solution as well...

hello NEAR/10 (mother|father -dear)

select * 
from mytable 
where 
  (match(indexed_column, 'hello mother')  using phrase with (slop=10)
     or match(indexed_column, 'hello father') using phrase with (slop = 10))
  and not match(indexed_column, 'hello dear') using phrase with (slop = 10)

They might look a bit clunky compared to Sphinx's language, but they work fine :)

Performance wise, they should still be super fast, thanks to Lucene..

Cheers, Claus