1
votes

I am struggling to figure out how to do a nested GROUP BY in Gremlin.

My graph is like this (vertices in [...]):

[survey]<-- forSurvey--[ce]--collectedFrom-->[ce sample]--currentResponseState-->[responseState]

  • 1 survey has many CEs.
  • 1 CE has many CE samples.
  • Each CE sample has a current response state.
  • There are 3 possible states (not started, in progress, completed).

I want to count the CE samples by CE and response state. For example:

CE=CE01, State=not started, count=45.

CE=CE01, State=in progress, count=34.

CE=CE01, State=completed, count=25.

CE=CE02, State=not started, count=60.

CE=CE02, State=in progress, count=33.

CE=CE02, State=completed, count=72.

CE=CE03, State=not started, count=44. etc

I can get the group counts using the following query, which displays the correct counts for each status (in a separate column for each state) with a separate row for each CE, but it doesn't show the CE name:

g.V().hasLabel("survey").has("surveyId","LFS")
.in("forSurvey").as("ce")
.local(out("collectedFrom")
.out("currentResponseState")
.groupCount().by("code")).as("counts")
.select("counts")

But I cannot figure out how get Gremlin to return the CE name together with the counts. If I include the "ce" in the "select()" projection to get the name of the CE vertex, I get no results at all.

This would be trivial in SQL, where I would just GROUP BY the relevant columns, but Gremlin is a different world.

Any tips on how to do this with DSE Graph 5.0 Gremlin would be greatly appreciated.

1
Sorry, forgot to explain: the 'groupCount().by("code")' bit is grouping and counting by the status code. - Chris Webster

1 Answers

2
votes

A sample graph would have been helpful, but let's try. I think you're looking for this traversal:

g.V().hasLabel("survey", "surveyId", "LFS").in("forSurvey").as("CE").
  out("collectedFrom").out("currentResponseState").as("State).
  select("CE", "State").by("name", "code").groupCount()