3
votes

I have duplicate relationships between nodes e.g:

A ->{weight: 1} B
A ->{weight: 1} B
A ->{weight: 1} B

and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.

I tried something like the following:

start n = node(*) 
match (n)-[r:OCCURENCE]->()
Set r.weight = count(*)
count(*)

But my graph is really big and with this query edges are updated twice for each node A and B. Furthermore the old relationships are not deleted. Don't know how to model these two aspects in one query. Hope someone can help.

EDIT:

Tried some other querys with node() and relationship() e.g

start n = node(*) match ()-[r:OCCURENCE]->() set n.SumEdgeWeight = sum(r.weight)

They are processing horrible slow. Is there any other faster way when I need to update all nodes? I found this topic [1] in the Neo4j community. Is it possible that my querys run with the java core api faster?

[1] https://groups.google.com/forum/#!topic/neo4j/4SSxvNsuQsY

Regards.

2
Expect any query starting with "start n = node(*)" to be extremly slowly, as you are running through the whole graph. And yes, java API is faster :)bendaizer
Thank you. So it would be better to calculate as much as possible before creating the graph. For example the adjacency matrix instead of inserting each edge individual and try to merge the edges afterwards. I'm a little bit disappointed from neo4j right know, but I will try the java core api. Regards.user2715478
It depends on what you want to do. Neo4j does no miracles, and working with graphs is always hard. Its main interest comes from its "local" approach of the graph, in the sense that you don't need to run through all the graph to query for specific things. The import process is as much important as the query process, and if you can simplify things on the process, do it, always!bendaizer
Now in your case, it also depends on what is the reason of duplicates. Is it an error that needs to be corrected ? in this it's better to correct it, otherwise is it really necessary to have one link that sum up the other 3 ? because as you can see, you can always count the links in your queries when necessarybendaizer
I import my stuff from a csv file and thought it would be easier to add edges to the csv file even if they are doubled and merge these duplicates afterwards. Expect of that I want to calculate PMI: en.wikipedia.org/wiki/Pointwise_mutual_information on the whole graph. So this use case is then also a problem?user2715478

2 Answers

9
votes

Instead of starting with a very general pattern that matches each node (node(*)) you can start with the more specific pattern that you are after (A-[:OCCURRENCE]->B). This might speed things up a bit.

Instead of counting nodes to arrive at an aggregate weight you can aggregate the weight value (you seem to move towards that in your edit, but you are setting the weight aggregate as a property on a node). Maybe with your data all the relationships have a weight of 1, if so some kind of counting could work (you could try counting the relationships instead of the nodes), but it might be worth having a query that doesn't produce the right result accidentally. Such a query would work also with varying weight values, for instance if you import more data in the future and need to merge new [OCCURRENCE] relationships, perhaps with a weight of 1, with ones that are already merged and in place.

Could you try something like this?

MATCH (A)-[r:OCCURRENCE]->(B)
WITH A, COLLECT(r) as oldRels, B, SUM(r.weight) as W
FOREACH(r IN oldRels | DELETE r)
WITH A, W, B
CREATE (A)-[O:OCCURRENCE {weight:W}]->(B);

I take this query to mean something like: For all A-[r:OCCURRENCE]->B patterns in the graph, COLLECT the relationships and bring that collection WITH so they can be deleted later. Also bring WITH the related nodes and the SUM of the relationships' weight. FOREACH of the old relationships, delete it, and bring WITH only the two nodes and the aggregated weight. Create a new relationship and set the weight to the aggregated weight.

2
votes

Though this is an old question, there is some new apoc functionality that can be used here. You need to install the apoc plugin for your version of neo.

MATCH (A)-[r:OCCURRENCE]->(B) 
WITH  A,B,collect(distinct(r.weight)) as values, count(r) as relsCount
MATCH (A)-[r:OCCURRENCE]->(B)
WHERE size(values) = 1 AND relsCount > 1 
WITH A,B,collect(r) as rels
CALL apoc.refactor.mergeRelationships(rels,{properties:"combine"})
YIELD rel RETURN rel

the "combine" property returns the weights of each duplication relationship in an array which you can sum. Or you can add the sum to the relationships as per previous example first then remove this property.

More documentation here