COLLECT
sorts the records into buckets if you will, and you only get the labels that are on the buckets. Optionally, the number of documents in each of them can also be returned. Everything else in the same scope is discarded (i.e. the documents v
and e
). Variables defined adjacent to your FOR
loop / traversal would still be accessible after COLLECT
however.
If you want to know what's inside of each bucket, you need to use the alternative syntax
COLLECT ... INTO ...
. It comes in multiple variants, mainly for performance tuning. But let's stick to the basic variant:
FOR v, e IN 1..2 OUTBOUND obj.id GRAPH 'CollectionGraph'
COLLECT group = v.groupName INTO groups
RETURN (
FOR g IN groups
SORT g.v.date
LIMIT 1
RETURN g
)
group
is not used in this example, but it is syntactically required. groups
however stores what fell INTO
which bucket. We can use it to sort each group based on the date
attribute and return the first document.
Note: without RETURN ( )
, LIMIT 1
would affect the outer FOR
loop and the entire query only ever return one document - which is not what we want. Using a subquery creates a new scope and makes the limit apply to the "inner" loop.
You could solve your problem in another way as well:
FOR v, e IN 1..2 OUTBOUND obj.id GRAPH 'CollectionGraph'
COLLECT group = v.groupName
AGGREGATE minDate = MIN(v.date)
RETURN (
FOR doc IN yourCollection
FILTER doc.groupName == group AND doc.date == minDate
LIMIT 1
RETURN doc
)
The groups are determined as well as the earliest date in each group. Then, a collection yourCollection
is filtered using the group names and dates and one document is returned. A drawback here is that only a single collection can be scanned (efficiently), but the traversal might give you documents from multiple collections. If you know which collections need to be checked, it would be possible to write a subquery for each collection, then combine the results. Compared to my first query given above, it will be more AQL code however, and probably perform worse.