TL:DR:
I need to find the most efficient Cypher query that would get the nodes connected to a certain node type with a certain type of relation and to then retrieve the connections between those nodes, filter out the top 150 most connected ones, and show them to the user.
I propose one below using APOC relationships property query, but I think it can be made more efficient, so I'm looking for your advice.
LONG EXPLANATION:
In my datamodel I have the nodes of the type:
:Concept
:Context
:User
:Statement
This is used for text network analysis, so the basic idea is that the :Concepts
appear in :Statements
that belong to a certain :Context
added by a certain :User
.
They also have properties, such as uid
(the unique ID), and name
(the name).
Every :Concept
is connected to every other :Concept
with the :TO
type of directed relation.
If a :Concept
belongs to a :Context
it has the :AT
relation to that :Context
If a :Concept
is made by a :User
it is connected to that user with the :BY
type of relation.
I also added properties to relations, so that they show which user made the :TO
connection and in which context they appeared.
I need to get a list of nodes and their relationships in a certain context, so I currently use the Cypher / APOC query of the type:
CALL apoc.index.relationships('TO','user:15229100-b20e-11e3-80d3-6150cb20a1b9')
YIELD rel, start, end
WITH DISTINCT rel, start, end
MATCH (ctx:Context)
WHERE rel.context = ctx.uid
AND (ctx.name="decon" )
RETURN DISTINCT start.uid AS source_id,
start.name AS source_name,
end.uid AS target_id,
end.name AS target_name,
rel.uid AS edge_id,
ctx.name AS context_name,
rel.statement AS statement_id,
rel.weight AS weight
It works pretty well, however, the problem is that if the graph is large (e.g. more than 1000 nodes and 5000 connections) it takes too long to query it.
So I want to be able to filter the number of relations I get.
Using the request above it's quite difficult to do so, as I want to filter out the top 150 most connected nodes and I need to get the data first in order to do that.
So I thought that maybe I should change the logic of my request and instead:
1) Query the :Context
I'm interested in;
2) Get all the :Concept
nodes connected to it;
3) Find all the relations of the retrieved :Concept
nodes to one another;
4) Get the top X (150) most connected :Concept
nodes, disregard the rest.
5) Show them to the user.
I tried the following query:
MATCH (ctx:Context{name:'decon',by:'15229100-b20e-11e3-80d3-6150cb20a1b9'})
WITH ctx MATCH (c1:Concept)-[:AT]->(ctx),
(c2:Concept)-[:AT]->(ctx)
WITH c1, c2
MATCH (c1)-[rel:TO]->(c2)
RETURN DISTINCT rel;
But it seems to be taking much much longer.
I also need to filter out the relations between those nodes, so that they only show the relations made by a certain :User
and only appearing in certain :Statement
.
Anyone has an idea what else I could try?
PS The source-code is in https://github.com/noduslabs/infranodus/blob/master/lib/entry.js#L573