0
votes

I'm after getting some gremlin together which would, for a given vertex, find other vertices which share the same or similar relationship.

So for example, imagine a graph of Persons and these Persons have relationships to other entities like subjects and grade. Lets say I select a Person, lets call them Dave. He is linked to English and History and is a grade B.

Dave -STUDIES-> English
Dave -STUDIES-> History 
Dave -IS_IN -> B

How would find other Persons who shared these relationships using Gremlin?

I've got as far as

g.V('Dave').out()

Find everything Dave is related to

How can I use this to find other Persons who share some or all of the same relationships as Dave?

(i've done this in neo4j and its quite straight forward

match ((p1:Person{name:'Dave'})-[r]->(n)),((p2:Person)-[s]->(n)) 
return distinct p1.name,p2.name, count(p2.name)
order by count(p2.name) desc

)

thanks !

1

1 Answers

4
votes

When asking questions about Gremlin it is always helpful to include a script that creates your sample graph - for example:

g.addV('person').property('name','dave').as('d').
  addV('person').property('name','rick').as('r').
  addV('person').property('name','mavis').as('m').
  addV('person').property('name','larry').as('l').
  addV('course').property('name','english').as('e').
  addV('course').property('name','history').as('h').
  addV('grade').property('name','b').as('b').
  addE('studies').from('d').to('e').
  addE('studies').from('r').to('e').
  addE('studies').from('m').to('h').
  addE('studies').from('d').to('h').
  addE('studies').from('r').to('h').
  addE('isIn').from('l').to('b').
  addE('isIn').from('d').to('b').iterate()

Here's a fairly direct way to get your answer:

gremlin> g.V().has('person','name','dave').as('d').
......1>   out('studies','isIn').
......2>   in('studies','isIn').
......3>   where(neq('d')).
......4>   dedup().
......5>   values('name')
==>mavis
==>rick
==>larry

First you find "dave", label that step as "d" so that you can reference its contents later, then traverse out() over the edges you want to match on and then back in() on those same edges. At this point, you're back at "person" vertices who are in the same "grades" and "courses" as "dave", but you want to exclude "dave" vertices from the output so you use that where() step. You might have duplicates if a "person" shares more than one "course" or "grade" with "dave" so you must dedup().

That's the basic algorithm, but you can get more advanced. Maybe you want to sort those "matches" by the number of things that each "person" has in common with "dave":

gremlin> g.V().has('person','name','dave').as('d').
......1>   out('studies','isIn').
......2>   in('studies','isIn').
......3>   where(neq('d')).
......4>   groupCount().
......5>     by('name').
......6>   order(local).
......7>     by(values, decr)
==>[rick:2,mavis:1,larry:1]

"rick" has two "matches" (i.e. shares "english" and "history" classes) with "dave" and thus has the highest ranking. Note that the use of local in the order() step is important in that it means to sort within the current traverser (i.e. the Map of name/count) - without that designation the sort would be applied to the objects in the traversal stream itself.