1
votes

I have 4842 documents with a sample format {"ID":"12345","NAME":"name_value","KIND":"kind_value",...,"Secondary":{...},"Tertiary":{...}} where “...” are a few more varying number of key value pairs per object

I have indexed KIND as a full text index using - db.collection.ensureFulltextIndex("KIND") before inserting data.Also, KIND is just a one word string. ie. without spaces

Via AQL following queries were executed:

  1. FOR doc IN FULLTEXT(collection, 'KIND', 'DeploymentFile') RETURN doc --> takes 3.54s (avg)

  2. FOR doc IN collection FILTER doc.KIND == 'DeploymentFile' RETURN doc --> takes 1.16s (avg)

2944 Objects returned in both queries

Q1. Assuming that we have used a fulltext index and I haven't hash indexed KIND, shouldn't the query using FULLTEXT function be faster than the normal == operation (since == doesn't utilize the full text index). If so, what am I doing wrong here?

Q2. Utilizing the fulltext index, can i perform a query which does a CONTAINS string or LIKE string?

---UPDATE Q2.The requirement is searching for a substring within a parent string (which is only one word). The substring can lie anywhere within the parent string. (SQL equivalent of LIKE '%text%')

1
I've tried the queries on my notebook. The ran in 40ms / 56ms using arangosh. Which driver are you using? - fceller

1 Answers

2
votes

Q1: The fulltext index does allow for more complex query. It splits the text at word breaks and checks if a word occurs within a larger text. All of these features are not needed in your example. Therefore it generates more overhead than it is saving.

In your example it would be better to create a skip-list or hash-index and search for equality.

Q2: In the simplest form, a fulltext query contains just the sought word. If multiple search words are given in a query, they should be separated by commas. All search words will be combined with a logical AND by default, and only such documents will be returned that contain all search words. This default behavior can be changed by providing the extra control characters in the fulltext query, which are:

  • +: logical AND (intersection)
  • |: logical OR (union)
  • -: negation (exclusion)

Examples:

  • "banana": searches for documents containing "banana"
  • "banana,apple": searches for documents containing both "banana" AND "apple"
  • "banana,|orange": searches for documents containing either "banana" OR "orange" OR both
  • "banana,-apple": searches for documents that contains "banana" but NOT "apple".

Logical operators are evaluated from left to right.

Each search word can optionally be prefixed with complete: or prefix:, with complete: being the default. This allows searching for complete words or for word prefixes. Suffix searches or any other forms are partial-word matching are currently not supported.

Examples:

  • "complete:banana": searches for documents containing the exact word "banana"
  • "prefix:head": searches for documents with words that start with prefix "head"
  • "prefix:head,banana": searches for documents contain words starting with prefix - "head" and that also contain the exact word "banana".

Complete match and prefix search options can be combined with the logical operators.