2
votes

I'm trying to list vertex objects, each being the one with the the earliest date from the same group name. I'm attempting to use COLLECT and AGGREGATE:

    FOR v,e IN 1..2 OUTBOUND obj.id
        GRAPH 'CollectionGraph'
        COLLECT groupName = v.groupName 
        AGGREGATE minDate = MIN(v.date) 
        RETURN {groupName, minDate}

This query returns just the group name and date (assuming that the MIN function works the way I had intended it to work). I'm unsure how to actually get the entire vertex object from the grouping based on the group name and using the earliest date criteria.

1

1 Answers

3
votes

COLLECT sorts the records into buckets if you will, and you only get the labels that are on the buckets. Optionally, the number of documents in each of them can also be returned. Everything else in the same scope is discarded (i.e. the documents v and e). Variables defined adjacent to your FOR loop / traversal would still be accessible after COLLECT however.

If you want to know what's inside of each bucket, you need to use the alternative syntax
COLLECT ... INTO .... It comes in multiple variants, mainly for performance tuning. But let's stick to the basic variant:

FOR v, e IN 1..2 OUTBOUND obj.id GRAPH 'CollectionGraph'
    COLLECT group = v.groupName INTO groups
    RETURN (
        FOR g IN groups
            SORT g.v.date
            LIMIT 1
            RETURN g
    )

group is not used in this example, but it is syntactically required. groups however stores what fell INTO which bucket. We can use it to sort each group based on the date attribute and return the first document.

Note: without RETURN ( ), LIMIT 1 would affect the outer FOR loop and the entire query only ever return one document - which is not what we want. Using a subquery creates a new scope and makes the limit apply to the "inner" loop.

You could solve your problem in another way as well:

FOR v, e IN 1..2 OUTBOUND obj.id GRAPH 'CollectionGraph'
    COLLECT group = v.groupName
    AGGREGATE minDate = MIN(v.date)
    RETURN (
        FOR doc IN yourCollection
            FILTER doc.groupName == group AND doc.date == minDate
            LIMIT 1
            RETURN doc
    )

The groups are determined as well as the earliest date in each group. Then, a collection yourCollection is filtered using the group names and dates and one document is returned. A drawback here is that only a single collection can be scanned (efficiently), but the traversal might give you documents from multiple collections. If you know which collections need to be checked, it would be possible to write a subquery for each collection, then combine the results. Compared to my first query given above, it will be more AQL code however, and probably perform worse.