2
votes

At my Neo4j/SDN 4 application all of my Cypher queries are based on internal Neo4j IDs.

This is an issue because I can't rely on these IDs at my web application urls. Neo4j can reuse these IDs so there is a good chance that at some time in future under the same ID we can found absolutely another node.

I tried to re-implement this logic based on the following solution: Using the graph to control unique id generation but noticed a query performance degradation.

From a theoretical point of view, should a Cypher query based on the property with @Index(unique = true, primary = true)

for example:

@Index(unique = true, primary = true)
private Long uid;

entity.uid = {someId}

work with the same performance as a Cypher query which is based on internal Neo4j ID:

id(entity) = {someId} 

UPDATED

This is :schema output:

Indexes
   ON :BaseEntity(uid) ONLINE
   ON :Characteristic(lowerName) ONLINE
   ON :CharacteristicGroup(lowerName) ONLINE
   ON :Criterion(lowerName) ONLINE
   ON :CriterionGroup(lowerName) ONLINE
   ON :Decision(lowerName) ONLINE
   ON :FlagType(name) ONLINE (for uniqueness constraint)
   ON :HAS_VALUE_ON(value) ONLINE
   ON :HistoryValue(originalValue) ONLINE
   ON :Permission(code) ONLINE (for uniqueness constraint)
   ON :Role(name) ONLINE (for uniqueness constraint)
   ON :User(email) ONLINE (for uniqueness constraint)
   ON :User(username) ONLINE (for uniqueness constraint)
   ON :Value(value) ONLINE

Constraints
   ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
   ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
   ON ( role:Role ) ASSERT role.name IS UNIQUE
   ON ( user:User ) ASSERT user.email IS UNIQUE
   ON ( user:User ) ASSERT user.username IS UNIQUE

As you can see I have an index on :BaseEntity(uid)

BaseEntity is a base class in my entity hierarchy, for example:

@NodeEntity
public abstract class BaseEntity {

    @GraphId
    private Long id;

    @Index(unique = false)
    private Long uid;

    private Date createDate;

    private Date updateDate;

...

}

@NodeEntity
public class Commentable extends BaseEntity {
...
}

@NodeEntity
public class Decision extends Commentable {

    private String name;

}

Will this uid index be used when I'm looking for example for (d:Decision) WHERE d.uid = {uid} ?

PROFILE resuls - internal ID vs indexed property

Query based on internal ID

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

PROFILE output:

Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 350554 total db hits in 238 ms.

enter image description here

Query based on indexed property uid

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 671326 total db hits in 426 ms.

enter image description here

Is there any chance to improve the performance based on uid ?

1

1 Answers

5
votes

You are right not to use Neo4j internal ids in web urls, as they can be reused after node is deleted etc..

From performance point of view the internal id is as fast as you can get - it is actually an offset in file with node/relationship records (you could have noticed these are 2 separate id sequences, you can have node with id=z and relationship with same id=x).

Any use of an index has to be slower, because the database does index lookup first, gets the internal id and then reads the node record.

However for vast majority of the applications the difference in performance is negligible - will be likely an much smaller than network latency or general OGM overhead.

If you see a noticeable difference

  • verify the indexes exists in the database (e.g :schema in Neo4j browser)
  • turn on logging and verify your queries have the correct label (set info level for org.neo4j.ogm)
  • if the index exists and query contains right label then use PROFILE to check the query plan

UPDATED

Yes, index will be used for queries like:

MATCH (d:Decision) WHERE d.uid = {uid} ...

which should get generated by

session.load(Decision.class, uid)

if your index is primary or findByUid on DecisionRepository.

Beware that the index might not be used when the where clause appears in the middle of the query:

...
WITH x
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...

This depends on the query plan and you should use PROFILE to investigate this.