Given the the following graph from this question: Cypher 2 not using schema index with OR operator:
CREATE
(:Application {Name: "Test Application", Aliases: ["Test", "App", "TestProject"]}),
(:Application {Name: "Another Application", Aliases: ["A-App", "XYZ", "XYProject"]}),
(:Application {Name: "Database X", Aliases: ["DB-App", "DB", "DB-Project"]}),
(:System {Name: "Server1", Application: "TestProject"}),
(:System {Name: "Server2", Application: "Test Application"}),
(:System {Name: "Server3", Application: "another App"}),
(:System {Name: "Server4", Application: "Some Database"}),
(:System {Name: "Server5", Application: "App"}),
(:System {Name: "Server6", Application: "App XY"}),
(:System {Name: "Server7", Application: "App DB"}),
(:System {Name: "Server8", Application: "Test"}),
(:System {Name: "Server9", Application: "TestProject"}),
(:System {Name: "Server10", Application: "test"}),
(:System {Name: "Server11", Application: "App XY"});
CREATE INDEX ON :Application(Name);
CREATE INDEX ON :Application(Aliases);
CREATE INDEX ON :System(Application);
But with 900 Application
and 200.000 System
nodes.
I added a new alias (e.g. "Test MiniApp") to one of the applications (that will finally match ~27.000 new System
nodes in the production database) and run the following query:
MATCH (a:Application { Name: "Test Application"})
WITH a
MATCH (s:System)
WHERE s.Application IN (a.Aliases + a.Name)
AND NOT (a)-[:InstalledOn]->(s)
CREATE UNIQUE (a)-[:InstalledOn]->(s)
This query is using the schema index on the production database (tested with PROFILE) but runs simply too long, ~5 minutes. I wonder why it takes so long to create a relation for ~27k nodes that are found with an index.
Neo4j 2.1.6 runs with default settings on Linux system (SLES 11) with 96 GB RAM.
EDIT
The above query just return a single node of type Application
and is only executed when a application is renamed and/or when an alias is added/removed. Since both entities are coming from external systems at any time i cannot only use the case where a new system could directly related to an application, because it may not exist during the import time. So when someone add a new alias, etc. to an application i need to find all matching systems and create that relation.
:Application
to:System
as needed to capture that relationship (where s.Application IN (a.Aliases + Name)
) - checking a property against a long list of possibilities across 200kSystem
nodes seems like something you don't want to recompute every time, even with an index. Maybe you don't even need theApplication
property on:System
? – FrobberOfBits