3
votes

enter image description here

Consider the above graph. I would like a gremlin query that returns all nodes that have multiple edges between them as shown in the graph.

this graph was obtained using neo4j cypher query: MATCH (d:dest)-[r]-(n:cust) WITH d,n, count(r) as popular RETURN d, n ORDER BY popular desc LIMIT 5

for example: between RITUPRAKA... and Asia there are 8 multiple edges hence the query has returned the 2 nodes along with the edges, similarly for other nodes.

Note: the graph has other nodes with only a single edge between them, these nodes will not be returned.

I would like same thing in gremlin.

I have used given below query g.V().as('out').out().as('in').select('out','in').groupCount().unfold().filter(select(values).is(gt(1))).select(keys)

it is displaying out:v[1234],in:v[3456] .....

but instead of displaying Ids of the node I want to display values of the node like out:ICIC1234,in:HDFC234

I have modified query as g.V().values("name").as('out').out().as('in').values("name").select('out','in'). groupCount().unfold().filter(select(values).is(gt(1))).select(keys)

but it is showing the error like classcastException, each vertex to be traversed use indexes for fast iteration

2

2 Answers

4
votes

Your graph doesn't seem to indicate bi-directional edges are possible so I will answer with that assumption in mind. Here's a simple sample graph - please consider including one on future questions as it makes it much easier than pictures and textual descriptions for those reading your question to understand and to get started writing a Gremlin traversal to help you:

g.addV().property(id,'a').as('a').
  addV().property(id,'b').as('b').
  addV().property(id,'c').as('c').
  addE('knows').from('a').to('b').
  addE('knows').from('a').to('b').
  addE('knows').from('a').to('c').iterate()

So you can see that vertex "a" has two outgoing edges to "b" and one outgoing edge to "c", thus we should get the "a b" vertex pair. One way to get this is with:

gremlin> g.V().as('out').out().as('in').
......1>   select('out','in').
......2>   groupCount().
......3>   unfold().
......4>   filter(select(values).is(gt(1))).
......5>   select(keys)
==>[out:v[a],in:v[b]]

The above traversal uses groupCount() to count the number of times the "out" and "in" labelled vertices show up (i.e. the number of edges between them). It uses unfold() to iterate through the Map of <Vertex Pairs,Count> (or more literally <List<Vertex>,Long>) and filter out those that have a count greater than 1 (i.e. multiple edges). The final select(keys) drops the "count" as it is not needed anymore (i.e. we just need the keys which hold the vertex pairs for the result).

Perhaps another way to go is with this method:

gremlin> g.V().filter(outE()).
......1>   project('out','in').
......2>     by().
......3>     by(out().
......4>        groupCount().
......5>        unfold().
......6>        filter(select(values).is(gt(1))).
......7>        select(keys)).
......8>   select(values)
==>[v[a],v[b]]

This approach with project() forgoes the heavier memory requirements for a big groupCount() over the whole graph in favor of building a smaller Map over an individual Vertex that becomes eligible for garbage collection at the end of the by() (or essentially per initial vertex processed).

0
votes

My suggestion is similar to Stephen's, but also includes the edges or rather the whole path (I guess the Cypher query returned the edges too).

g.V().as("dest").outE().inV().as("cust").
  group().by(select("dest","cust")).by(path().fold()).
  unfold().filter(select(values).count(local).is(gt(1))).
  select(values).unfold()