0
votes

I'm writing a cypher query to load data from my Neo4J DB, this is my data model

data model

So basically what I want is a query to return a Journal with all of its properties and everything related to it, Ive tried doing the simple query but it is not performant at all and my ec2 instance where the DB is hosted runs out of memory quickly MATCH p=(j:Journal)-[*0..]-(n) RETURN p

I managed to write a query using UNIONS

`MATCH p=(j:Journal)<-[:BELONGS_TO]-(at:ArticleType) RETURN p 
UNION 
MATCH p=(j:Journal)<-[:OWNS]-(jo:JournalOwner) RETURN p 
UNION
 
MATCH p=(j:Journal)<-[:BELONGS_TO]-(s:Section) RETURN p 
UNION 

MATCH p=(j:Journal)-[:ACCEPTS]->(fc:FileCategory) RETURN p 
UNION
MATCH p=(j:Journal)-[:CHARGED_BY]->(a:APC) RETURN p 
UNION
MATCH p=(j:Journal)-[:ACCEPTS]->(sft:SupportedFileType) RETURN p 
UNION
MATCH p=(j:Journal)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) RETURN p

SKIP 0 LIMIT 100`

The query works fine and its performance is not bad at all, the only problem I'm finding is in the limit, I've been googling around and I've seen that post-processing queries with UNIONS is not yet supported.

The referenced github issue is not yet resolved, so post processing of UNION is not yet possible github link

Logically the first thing I tried when I came across this issue was to put the pagination on each individual query, but this had some weird behaviour that didn't make much sense to myself.

So I tried to write the query without using UNIONS, I came up with this

`MATCH (j:Journal)
WITH j LIMIT 10
MATCH pa=(j)<-[:BELONGS_TO]-(a:ArticleType) 
MATCH po=(j)<-[:OWNS]-(o:JournalOwner)
MATCH ps=(j)<-[:BELONGS_TO]-(s:Section)
MATCH pf=(j)-[:ACCEPTS]->(f:FileCategory)
MATCH pc=(j)-[:CHARGED_BY]->(apc:APC)
MATCH pt=(j)-[:ACCEPTS]->(sft:SupportedFileType)
MATCH pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification)
RETURN pa, po, ps, pf, pc, pt, pl`

This query however breaks my DB, I feel like I'm missing something essential for writing CQL queries...

I've also looked into COLLECT and UNWIND in this neo blog post but couldn't really make sense of it.

How can I paginate my query without removing the unions? Or is there any other way of writing the query so that pagination can be applied at the Journal level and the performance isn't affected?

--- EDIT ---

Here is the execution plan for my second query

enter image description here

1
Are you able to use APOC? If so, this answer can help: stackoverflow.com/questions/41448935/… - Gabor Szarnyas
@GaborSzarnyas I was trying to stay away from APOC, I wanted to achieve it using plain cypher language, but if its the only work around I will look into using it. - Juanpe
@gaborSzarnyas I have installed APOC and I'm playing around with queries, I've followed the answer to the question you supplied but it has the same behaviour as my first query.. the limit is applied to the paths instead of to the journal nodes.. how can I apply a limit to the journal nodes instead of to the paths? - Juanpe

1 Answers

2
votes

You really don't need UNION for this, because when you approach this using UNION, you're getting all the related nodes for every :Journal node, and only AFTER you've made all those expansions from every :Journal node do you limit your result set. That is a ton of work that will only be excluded due to your LIMIT.

Your second query looks like the more correct approach, matching on :Journal nodes with a LIMIT, and only then matching on the related nodes to prepare the data for return.

You said that the second query breaks your DB. Can you run a PROFILE on the query (or an EXPLAIN, if the query never finishes execution), expand all elements of the plan, and add it to your description?

Also, if you leave out the final MATCH to :Classification, does the query behave correctly?

It would also help to know if you really need the paths returned, or if it's enough to just return the connected nodes.

EDIT

If you want each :Journal and all its connected data on a single row, you need to either be using COLLECT() after each match, or using pattern comprehension so the result is already in a collection.

This will also cut down on unnecessary queries. Your initial match (after the limit) generated 31k rows, so all subsequent matches executed 31k times. If you collect() or use pattern comprehension, you'll keep the cardinality down to your initial 10, and prevent redundant matches.

Something like this, if you only want collected paths returned:

MATCH (j:Journal)
WITH j LIMIT 10
WITH j, 
[pa=(j)<-[:BELONGS_TO]-(a:ArticleType) | pa] as pa, 
[po=(j)<-[:OWNS]-(o:JournalOwner) | po] as po,
[ps=(j)<-[:BELONGS_TO]-(s:Section) | ps] as ps,
[pf=(j)-[:ACCEPTS]->(f:FileCategory) | pf] as pf,
[pc=(j)-[:CHARGED_BY]->(apc:APC) | pc] as pc,
[pt=(j)-[:ACCEPTS]->(sft:SupportedFileType) | pt] as pt,
[pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) | pl] as pl
RETURN pa, po, ps, pf, pc, pt, pl