1
votes

I have a somewhat complex use-case for the fulltext features in AQL. I have a large, hierarchical document that is returned as the result of a graph traversal. This constructs something like a social network feed. It's analogous to posts of various categories with comments as child documents that contain their own structures. The returned data looks something like this:

[
  {
     "data": {
       "_key": "",
       "_id": "someCollection/someKey",
       "_rev": "",
       "userID": "12345",
       "otherAttributeOfFeedEvent": "",
       .
       .
       .
     },
     "date": "2016-10-25",
     "category": "",
     "children": [
       {
         "category": "",
         "child": "myCollection/childDocumentKey",
         "date": "2016-10-26"
       },
       { sameStructureAsAbove },
       { anotherChildLikeAbove },
    ]
  },
  { etc }
]

Of course, the attributes that would be fulltext searched for each of these event types that go into a feed are different and numerous, and I need to, for a given user input, search them all simultaneously. My initial thought is that, since the _key of each document, no matter whether a parent or child in the feed, is guaranteed to be listed in this structure, I could create some sort of collection that contains all the documents as identified by their keys.

A challenge is that this fulltext search needs to retain the hierarchy. Back to the social network comments analogy, if a user searches a term that exists in a comment (i.e. a child event), the query should return the parent event with a flag on every child event that matched the term, so that the interface can display the context for the search result (else, a secondary query to get the context would be needed).

This hierarchical structure as defined above is generated by a graph traversal on a graph with a structure that looks something like this:

profile ---> event ---> childEvent
     |                    ^
     |                    |
      \------------------/

The query that generates the data looks something like this:

let events = (
    for v, e, p in 1..3 outbound @profileKey graph 'myGraph' options { "uniqueEdges": "global"}
        filter e.type == "hasEvent"
        filter p.edges[0].category in ["cat1", "cat2", "cat3"]
        filter e.category in ["cat1", "cat2", "cat3"]

        let children = (
           for v1, e1, p1 in outbound v._id graph 'myGraph'
              filter e1.type =="hasEvent" or e1.isChildEvent == "True"
              sort (e1.date) desc
              return {category: e1.category, child: v1._id, date: e1.date }
        )

        let date = e.date
        let category = e.category
        let data = v
        return distinct { data: data, date: date, category: category, children: children }
)

for event in events
   sort(event.date) desc
   return event

Bottom line

So to sum up my question: I need to write AQL that will perform fulltext search on several attributes from every document that shows up in the described feed and return a structured result, or something that can be used in a structured result, to display a feed of the same structure as described above containing only events that match or have children that match the fulltext search results.


In my testing, I tried creating a query like this:

    let events = (
            FOR v, e, p in 1..3 OUTBOUND 'myCollection/myDocument' GRAPH 'myGraph' OPTIONS { "uniqueEdges": "global" }
                FILTER e.type == "hasEvent"
                FILTER (p.edges[0].category in ["cat1", "cat2", "cat3"] )  
                FILTER (e.category in ["cat1","cat2","cat3] ) 



            LET children = ( 
                FOR v1, e1, p1 in OUTBOUND v._id GRAPH 'myGraph'
                    FILTER e1.type == "hasEvent" OR e1.isChildEvent == "True"
                    SORT(e1.date) DESC
                    RETURN {category: e1.category, _id: v1._id, date: e1.date}
            )

            let date = e.date
            let category = e.category
            let data = v
            RETURN DISTINCT {data: data, date: date, category: category, children: children}    
    )

    let eventIds = (
        for event in events
            return event.data._id
        )
    let childEventIds = (
        for event in events
            for child in event.children
                return child._id
        )
    let allIds = append(eventIds, childEventIds)

    let allDocs = (for doc in allIds
        return document(doc))

    let firstAttributeMatches = (for doc in fulltext(allDocs, "firstAttribute", @queryTerm)
                                return doc._id)
    let secondAttributeMatches = (for doc in fulltext(allDocs, "secondAttribute", @queryTerm)
                                return doc._id)
    let nthAttributeMatches = (for doc in fulltext(allDocs, "nthAttribute", @queryTerm)
                                return doc._id)
    let results = union_distinct(firstAttributeMatches,secondAttributeMatches,nthAttributeMatches)
    return results

But this had the error: Query: invalid argument type in call to function 'FULLTEXT()' (while executing)

Presumably, even though there are fulltext indices on all of the attributes I used, because I've collected all these documents into a new collection that is not also fulltext indexed, I cannot simply call fulltext() on them. Does this mean my best bet is to just get a list of all the document collections returned by my first query, perform global fulltext searches on those collections, then inner-join the result to the result of my first query? That sounds extremely complex and time-intensive. Is there some simpler way to do what I'm after?


My next try looked more like this:

let events = (
        FOR v, e, p in 1..3 OUTBOUND 'myCollection/myDocument' GRAPH 'myGraph' OPTIONS { "uniqueEdges": "global" }
            FILTER e.type == "hasEvent"
            FILTER (p.edges[0].category in ["cat1", "cat2", "cat3"] )  
                 FILTER (e.category in ["cat1", "cat2", "cat3"] ) 



            LET children = ( 
                FOR v1, e1, p1 in OUTBOUND v._id GRAPH 'myGraph'
                    FILTER e1.type == "hasEvent" OR e1.isChildEvent == "True"
                    SORT(e1.date) DESC
                    RETURN {category: e1.category, _id: v1._id, date: e1.date}
            )

            let date = e.date
            let category = e.category
            let data = v
            RETURN DISTINCT {data: data, date: date, category: category, children: children}    
    )

    let eventIds = (
        for event in events
            return event.data._id
        )
    let childEventIds = (
        for event in events
            for child in event.children
                return child._id
        )
    let allIds = append(eventIds, childEventIds)

    let losCollections = (for id in allIds
        return distinct parse_identifier(id).collection)

    let searchAttrs = ["attr1","attr2","attr3","attrN"]

    for col in losCollections
        for attr in searchAttrs
            return (for doc in fulltext(col, attr, @queryTerm) return doc._id)

But this seems to fail whenever it tries an attribute that isn't a fulltext index in the collection. Maybe there's a way in AQL to check if the attribute has a fulltext index, then only perform the query in that case?

1

1 Answers

1
votes

First a few general remarks:

  1. Currently, a fulltext index can only index documents from one collection and can only look at the string value of a single attribute. Corresponding FULLTEXT searches in AQL will only be able to use a single such index and thus will only look into one collection and one attribute. If this is not enough one has to run multiple FULLTEXT queries and unite the results.

  2. A graph query is faster, if the full path does not have to be built, so instead of

    for v, e, p in 1..3 outbound @profileKey graph 'myGraph' options {"uniqueEdges": "global"}
      filter e.type == "hasEvent"
      filter p.edges[0].category in ["cat1", "cat2", "cat3"]
      filter e.category in ["cat1", "cat2", "cat3"]
    

    one should rather write

    for v, e in 1..3 outbound @profileKey graph 'myGraph' options {"uniqueEdges": "global"}
      filter e.type == "hasEvent"
      filter e.category in ["cat1", "cat2", "cat3"]
    

    which is equivalent but faster (the last filter implies the middle one).

  3. If you have a query of the form

    let events = (... return xyz)
    for event in events
      sort event.date desc
      return event
    

    it is usually better to avoid the subquery by writing

    ...
    let event=xyz
    sort event.date desc
      return event
    

    because then the query engine is not forced to compute the result of the whole subquery before starting with the bottom for statement.

Now I am coming to your concrete question at hand: Both your approaches fail because the FULLTEXT function in AQL can only be used for an existing collection with an existing fulltext index. In particular, it cannot be used to perform a fulltext search on intermediate results produced earlier in the AQL query. That is, because for an efficient full text search a fulltext index structure is needed which does not exist for the intermediate results.

Therefore, my hunch would be that if you want to perform a fulltext search on profiles, events and child-events at the same time that you would have to first perform the fulltext search using an index, and then from each result put together the hierarchy as needed using a graph query.

I see two basic approaches to this. The first would be to do three independent fulltext searches on each of the existing collections, and then run a separate graph query for each result to put together the hierarchy. This would have to be different depending on whether your fulltext search finds a profile, event or child-event. Using subqueries, these three approaches could all be done in a single AQL query.

The second is to have an additional collection for fulltext search, in which there would be a document for each of the documents in all of the three other collections, which contains the attribute to be fulltext searched. Yes, this is a data denormalisation and it needs extra memory space and extra effort when saving and updating the data, but it would probably speed up the fulltext search.

The other idea I would like to mention is that the complexity of your query has reached a level that one should consider writing it in Javascript (run on the server, probably in a Foxx app). There it would be relatively straightforward to implement the query logic in a procedural manner. My hunch would be that one could even improve performance in this way, even if the JS code has to issue multiple AQL queries. At the very least I would expect that the code is better understandable.