1
votes

I'm trying to model a large knowledge graph. (using v3.1.1).

My actual graph contains only two types of Nodes (Topic, Properties) and a single type of Relationships (HAS_PROPERTIES).

The count of nodes is about 85M (47M :Topic, the rest of nodes are :Properties).

I'm trying to get the most connected node:Topic for this. I'm using the following query:

MATCH (n:Topic)-[r]-()
RETURN n, count(DISTINCT r) AS num
ORDER BY num

This query or almost any query I try to perform (without filtering the results) using the count(relationships) and order by count(relationships) is always extremely slow: these queries take more than 10 minutes and still no response.

Am i missing indexes or is the a better syntax?

Is there any chance i can execute this query in a reasonable time?

1
What does your config (heap/pagecache look like) how much memory do you have, what kind of disk? - Michael Hunger
:Properties nodes sounds scary, in a property graph you store the properties with the nodes :) - Michael Hunger
You should also limit your results if you are trying to run it in browser or shell, otherwise you get 47M records returned which blows up your browser or terminal (or even the pre-processing in each) - Michael Hunger
Here is the config i'm using: dbms.memory.heap.max_size=6G dbms.memory.heap.initial_size=3G dbms.memory.pagecache.size=6g Disk is SSD, Between the description i've represented i not exactly the actual model i'm using, but i just tried to represent somehting extremely simple :) and i appreciate your interest :) - user638564
how big is your graph on disk? - Michael Hunger

1 Answers

2
votes

Use this:

MATCH (n:Topic)
RETURN n, size( (n)--() ) AS num
ORDER BY num DESC
LIMIT 100

Which reads the degree from a node directly.