I'd like to create an edge list that shows connections and connection strength. This sample graph contains 4 people and information about their attendance at workshops A and B, including the day attended and the number of hours they stayed. I'd like to form connections through the workshop node, where I would consider two people to be connected if they attended the same workshop on the same day, and the connection strength would be the minimum number of hours spent at the workshop.
Here is the sample graph:
g.addV('person').property(id, '1').property('name', 'Alice').next()
g.addV('person').property(id, '2').property('name', 'Bob').next()
g.addV('person').property(id, '3').property('name', 'Carol').next()
g.addV('person').property(id, '4').property('name', 'David').next()
g.addV('workshop').property(id, '5').property('name', 'A').next()
g.addV('workshop').property(id, '6').property('name', 'B')
g.V('1').addE('attended').to(g.V('5')).property('hours', 2).property('day', 'Monday').next()
g.V('1').addE('attended').to(g.V('6')).property('hours', 2).property('day', 'Monday').next()
g.V('2').addE('attended').to(g.V('5')).property('hours', 5).property('day', 'Monday').next()
g.V('3').addE('attended').to(g.V('6')).property('hours', 5).property('day', 'Monday').next()
g.V('4').addE('attended').to(g.V('5')).property('hours', 4).property('day', 'Tuesday').next()
g.V('4').addE('attended').to(g.V('6')).property('hours', 4).property('day', 'Monday').next()
g.V('2').addE('attended').to(g.V('6')).property('hours', 1).property('day', 'Monday')
This would be step 1, showing minimum hours on each workshop for each pair that took a workshop on the same day:
Note that David doesn't have any connections through workshop A because he attended it on a different day than Alice and Bob.
We can then find the total strength of the relationship by adding up hours together across workshops for each pair (now Alice and Bob have 3 total hours together, which were across workshops A and B):
I'm struggling with how to write this in a Neptune graph using Gremlin. I'm more familiar with Cypher, and could find this type of edge list using something like this:
match (p:Person)-[a:ATTENDED]->(w:Workshop)<-[a2:ATTENDED]-(other:Person)
where a.day = a2.day
and p.name <> other.name
unwind [a.hours, a2.hours] as hrs
with p, w, other, a, min(hrs) as hrs
return a.name, other.name, sum(hrs) as total_hours
This is as far as I've gotten with Gremlin, but I'm not sure how to finish up the summarization:
g.V().
hasLabel('person').as('p').
outE().as('e').
inV().as('ws').
inE('attended').
where(eq('e')).by('day').as('e2').
otherV().
where(neq('p')).as('other').
select('p','e','other','e2','ws').
by(valueMap('name','hours','day'))
Would anyone be able to help?
name
andother
? - Kelvin Lawrence