find duplicates in gremlin

Question

I had data on AWS Neptune GraphDB. Every record in it has KeyId(property) with unique values. Some data got duplicated, which can be found through the KeyId(property) and groupCount() step. My question is, can I filter the values with groupCount() more than one. Or any other best way to find duplicates through key.

gremlin> g.V().has('keyId').groupCount().by('keyId')
==> [HJ001:2, HJ002:1,HJ003:1,HJ004:2,HJ005:3]

I need only the result having groupCount > 1. (but not count=1) could anyone help me with that.

Kelvin Lawrence Kelvin Lawrence · Accepted Answer · 2018-05-24T13:31:48

If you add to your query as follows this should give you what you want.

g.V().has('keyId').groupCount().by('keyId').
      unfold().where(select(values).is(gt(1)))

Cheers Kelvin

find duplicates in gremlin

1 Answers