0
votes

I am trying to count the number of documents that are in each possible state in a particular Arango collection.

This should be possible in 1 pass over all of the documents using a bucket-sort like strategy where you iterate over all documents, if the value for the state hasn't been seen before, you add a counter with a value of 1 to a list. If you have seen that state before, you increment the counter. Once you've reached the end, you'll have a counter for each possible state in the DB that indicates how many documents are currently stored with that state.

I can't seem to figure out how to write this type of logic in AQL to submit as a query. Current strategy is like this:

  1. Loop over all documents, filtering only docs of a particular state.
  2. Loop over all documents, filtering only docs of a different particular state.
  3. ...
  4. All states have been filtered.
  5. Return size of each set

This works, but I'm sure it's much slower than it should be. This also means that if we add a new state, we have to update the query to loop over all docs an additional time, filtering based on the new state. A bucket-sort like query would be quick, and would need no updating as new states are created as well.

If these were the documents:

  • {A}
  • {B}
  • {B}
  • {C}
  • {A}

Then I'd like the result to be { A:2, B:2, C:1 } Where A,B,&C are values for a particular field. Current strategy filters like so

LET docsA = (
    FOR doc in collection
        FILTER doc.state == A
        RETURN doc
)

Then manually construct the return object calling LENGTH on each list of docs

Any help or additional info would be greatly appreciated

1

1 Answers

1
votes

What about using a COLLECT function? (see docs here)

FOR doc IN collection
    COLLECT s = doc.state WITH COUNT INTO c
    RETURN { state: s, count: c }

This would return something like:

[
  { state: 'A', count: 23 },
  { state: 'B', count: 2 },
  { state: 'C', count: 45 }
]

Would that accomplish what you are after?