0
votes

I would like to determine the relative percentage of conversation duration with neighbors who know specific person.

For example when observing node A first we have to know how much time he spent talking to all of his neighbors which is executed with the following query:

neo4j-sh (0)$ start a = node(351061) match (a)-[r:TALKED_TO]->(b) return sum(r.duration)
==> +-----------------+
==> | sum(r.duration) |
==> +-----------------+
==> | 12418           |
==> +-----------------+
==> 1 row, 0 ms

Next we have to check which of his neighbors know specific person (say c) and sum only the durations of conversations among a and b where b knows c:

neo4j-sh (0)$ start a = node(351061) match (a)-[r:TALKED_TO]->(b)-[p:KNOWS]->(c) return sum(r.duration)
==> +-----------------+
==> | sum(r.duration) |
==> +-----------------+
==> | 21013           |
==> +-----------------+
==> 1 row, 0 ms

What here doesn't seem logical is that the second sum is larger than first one whereas the second one is supposed to be just the part of first. Does anyone know what could be the problem for getting such result? The error appeared on 7 users out of 15000.

1

1 Answers

2
votes

You're not looking at a specific person C in that query. You're matching all paths to any :KNOWS relationship, so if you have a->b->c and a->b->d your duration between a->b will get counted twice.

What you probably need to do is this instead:

start a = node(351061), c=node(xxxxx) // set c explicitly
match (a)-[r:TALKED_TO]->(b)
where b-[:KNOWS]->c // putting this in the where clause forces you to set C
return sum(r.duration)

Here's an example in console: http://console.neo4j.org/r/irm0zy

Remember that match broadens and where tightens the results. You can also do this with match, but you need to specify c in start.

A good way to test out what your aggregate functions are doing is to return all of your named variables (or set a path you can return)--this way you see the aggregation separated into subtotals. Like so:

start a=node(1) 
match a-[r:TALKED_TO]->b-[:KNOWS]->c 
return sum(r.duration), a,b,c;
+-----------------------------------------------------------------------------------------------+
| sum(r.duration) | a                       | b                       | c                       |
+-----------------------------------------------------------------------------------------------+
| 20              | Node[1]{name:"person1"} | Node[2]{name:"person2"} | Node[4]{name:"person4"} |
| 20              | Node[1]{name:"person1"} | Node[2]{name:"person2"} | Node[3]{name:"person3"} |
| 20              | Node[1]{name:"person1"} | Node[5]{name:"person5"} | Node[6]{name:"person6"} |
+-----------------------------------------------------------------------------------------------+