0
votes

TL:DR:

I need to find the most efficient Cypher query that would get the nodes connected to a certain node type with a certain type of relation and to then retrieve the connections between those nodes, filter out the top 150 most connected ones, and show them to the user.

I propose one below using APOC relationships property query, but I think it can be made more efficient, so I'm looking for your advice.

LONG EXPLANATION:

In my datamodel I have the nodes of the type:

:Concept :Context :User :Statement

This is used for text network analysis, so the basic idea is that the :Concepts appear in :Statements that belong to a certain :Context added by a certain :User.

They also have properties, such as uid (the unique ID), and name (the name).

Every :Concept is connected to every other :Concept with the :TO type of directed relation.

If a :Concept belongs to a :Context it has the :AT relation to that :Context

If a :Concept is made by a :User it is connected to that user with the :BY type of relation.

I also added properties to relations, so that they show which user made the :TO connection and in which context they appeared.

I need to get a list of nodes and their relationships in a certain context, so I currently use the Cypher / APOC query of the type:

CALL apoc.index.relationships('TO','user:15229100-b20e-11e3-80d3-6150cb20a1b9') 
YIELD rel, start, end 
WITH DISTINCT rel, start, end 
MATCH (ctx:Context) 
WHERE rel.context = ctx.uid 
AND (ctx.name="decon" ) 
RETURN DISTINCT start.uid AS source_id, 
start.name AS source_name, 
end.uid AS target_id, 
end.name AS target_name, 
rel.uid AS edge_id, 
ctx.name AS context_name, 
rel.statement AS statement_id, 
rel.weight AS weight 

It works pretty well, however, the problem is that if the graph is large (e.g. more than 1000 nodes and 5000 connections) it takes too long to query it.

So I want to be able to filter the number of relations I get.

Using the request above it's quite difficult to do so, as I want to filter out the top 150 most connected nodes and I need to get the data first in order to do that.

So I thought that maybe I should change the logic of my request and instead:

1) Query the :Context I'm interested in;

2) Get all the :Concept nodes connected to it;

3) Find all the relations of the retrieved :Concept nodes to one another;

4) Get the top X (150) most connected :Concept nodes, disregard the rest.

5) Show them to the user.

I tried the following query:

MATCH (ctx:Context{name:'decon',by:'15229100-b20e-11e3-80d3-6150cb20a1b9'}) 
WITH ctx MATCH (c1:Concept)-[:AT]->(ctx),
(c2:Concept)-[:AT]->(ctx) 
WITH c1, c2 
MATCH (c1)-[rel:TO]->(c2) 
RETURN DISTINCT rel;

But it seems to be taking much much longer.

I also need to filter out the relations between those nodes, so that they only show the relations made by a certain :User and only appearing in certain :Statement.

Anyone has an idea what else I could try?

PS The source-code is in https://github.com/noduslabs/infranodus/blob/master/lib/entry.js#L573

1

1 Answers

2
votes

You're generating a cartesian product of those :Concept nodes which is slowing down your query.

You could try this instead:

MATCH (c:Concept)-[:AT]->(:Context{name:'decon',by:'15229100-b20e-11e3-80d3-6150cb20a1b9'}) 
WHERE (c)-[:BY]->(:User {uid:'15229100-b20e-11e3-80d3-6150cb20a1b9'})
// AND <additional predicate for desired :Statement>
WITH collect(c) as concepts
UNWIND concepts as c
WITH c, size([(c)-[:TO]->(c2) WHERE c2 in concepts | c2]) as connections
ORDER BY connections DESC
LIMIT 150
RETURN c

You'll of course want an index on :Context(by) for the initial match to be quick.