1
votes

the question I have is here : cypher-how-get-relation-between-every-two-node-and-the-distance-from-start-node, more detail here: 2 million companies, each of them must and only belong to a leading company,called group, so every node has properties: groupId and companyId; what's more, companies in different group may have relationship. QUESTION: given a groupId and the leading company id, return all relations in this group and every node in the group's shortest distance to leading company.

since the sql that anwser have big performance issue, especially the shortPath one, so my question is can we narrow down the search scope when use shortPath, only search nodes with same property?

or are there other way to solve the original question?

sorry since I am in China mainland, cannot reach the console.neo4j.com(even with VPN), so I put my sample here:

create (a :COMPANY {companyId:"a",groupId:"ag"}),
       (b:COMPANY  {companyId:"b",groupId:"ag"}),
       (c:COMPANY  {companyId:"c",groupId:"ag"}),
       (d:COMPANY {companyId:"d",groupId:"ag"}),
       (e:COMPANY  {companyId:"e",groupId:"eg"})
create (a)-[:INVESTMENT]->(b),
       (b)-[:INVESTMENT]->(c),
       (c)-[:INVESTMENT]->(d),
       (a)-[:INVESTMENT]->(c),
       (d)-[:INVESTMENT]->(b),
       (c)-[:INVESTMENT]->(e) 
return *

here the node a,b,c,d are same group and a is leading company,e are another group but has relationship with c. so I want get the node-node relation in ag group, for example: a-b,a-c,b-c,c-d,d-b and the shortest distance from a to group member, for example,return dist.a=0,dist.b=1,dist.c=1,dist.d=2

3
What about sample test data? And an example of the cypher query that you reached? What about console.neo4j.com?stdob--
@stdob-- sorry stdob, I cannot reach the online console, I updated my questionWesleyHsiung
Correctly I understand you, that the short path should consist only of the nodes included in the group? And what about the direction of relations in a short path?stdob--
And how many companies in one group can there be?stdob--
only distance between them,should not concern the direction, a large group may have 2 thousand, but most are hundreds, but the relationships in different group are also complex.WesleyHsiung

3 Answers

0
votes

I think that this can not be solved with the help of pure cypher. You can try using the APOC library by adding a temporary property to the relation, and applying the Dijkstra algorithm.

Input params:

{
  "groupId": "ag",
  "leadingCompany": "a"
}

Query:

// Search for a leading company
MATCH (lc:COMPANY {companyId: $leadingCompany, groupId: $groupId})
WITH lc, 
     apoc.create.uuid() as tmpProp // Temporary property name

// All relationships in the group are found. 
// And the value of the temporary property is set ..
MATCH (c1:COMPANY {groupId: $groupId})-[r:INVESTMENT]->(c2:COMPANY {groupId: $groupId})
CALL apoc.create.setRelProperty(r, tmpProp, 1) yield rel
WITH lc, tmpProp, 
     count(r) as tmp

// For each node in the group, need to find short paths to the leading company
MATCH (c:COMPANY {groupId: $groupId})
CALL apoc.algo.dijkstraWithDefaultWeight(lc, c, 'INVESTMENT', tmpProp, 2000000) yield path
WITH tmpProp, c, 
     min(length(path)) as distanceToLeading

// All paths in the group are found, and the temporary property is deleted
MATCH (c)-[r:INVESTMENT]->(:COMPANY {groupId: $groupId})
CALL apoc.create.removeRelProperties(r, [tmpProp]) yield rel
RETURN c as groupNode, distanceToLeading, 
       collect(r) as groupRelations
0
votes

APOC Procedures can help out here, as some of the path expander procedures can be used to find the shortest distance to each node in the group, and there's also a cover() procedure that will find all relationships between a group of nodes.

You'll want to make sure you have an index on :Company(groupId) and :Company(companyId) first.

MATCH (c:Company{groupId:$groupId})
WITH collect(c) as companies
WITH companies, [c in companies | id(c)] as companyIds, [c in companies 
 WHERE NOT (c)<-[:INVESTMENT]-(:Company{groupId:$groupId})][0] as lead
// for the above, if you already know the lead companyId, just MATCH to the lead instead of this filter
CALL apoc.algo.cover(companyIds) YIELD rel
WITH companies, lead, collect(rel {start:startNode(rel).companyId, end:endNode(rel).companyId}) as relationships
UNWIND companies as company
MATCH path = shortestPath((lead)-[:INVESTMENT*]->(company))
WHERE all(node in nodes(path) WHERE node in companies)
RETURN relationships, collect(company {.companyId, distance:length(path)}) as distance
0
votes

This query will get you the desired output:

 match p=((c:COMPANY{companyId:'a'})-[i:INVESTMENT*0..99]->(l:COMPANY)) 
    where l.groupId=c.groupId 
    with c,i,l,nodes(p) as path  order by c.companyId
    with c,l,collect(distinct l.companyId) as Companies,min(size(path))-1 as Dist
    match pp=shortestpath((cc:COMPANY{companyId:'a'})-[ii:INVESTMENT*0..99]->(ll:COMPANY)) 
    where ll.companyId in Companies
    with c,Companies,Dist,reduce(s='',x in nodes(pp)|s + x.companyId ) as CompanyPath     
return c.companyId,Companies,Dist,CompanyPath order by Dist

You will notice, it does not require advanced knowledge of the groupId. If a lead company can be in two groups, you would need to include this in the initial where.