I am using TITAN 0.4, and gremlin for traversals. My requirement is to identify duplicate vertices in graph, and to merge those. There are > 15 M vertices in graph.
gremlin> g.V.has('domain').groupBy{it.domain}{it.id}.cap
==>{google.com=[4], yahoo.com=[16, 24, 20]}
I am able to group the vertices, but I need only those domains(vertices) which exists more than once.
In the above example, I need to return only ==>{yahoo.com=[16, 24, 20]}
The key "domain" is indexed, if that makes any difference.
Please help me here