2
votes

I have a graph where the nodes are US cities and edges are the cost to travel between cities. I have data about the cost(edges) coming in constantly and need a quick way to insert the edges. Here's what I want to do: Let's say the current incoming data is

"New York, New York; Los Angeles, California; 1000"

  • Case 1(no edge exists between NYC and LA): Create edge with cost of 1000
  • Case 2a(edge exists but cost is higher than 1000): Replace cost with 1000
  • Case 2b(edge exists and cost is lower than 1000): Do nothing

Currently, my cypher query looks like this:

MERGE (a:City{name:"New York, New York"})-[r:TO]->(b:City{name:"Los Angeles"})  
SET r.price = CASE WHEN (NOT exists(r.price) OR r.price>1000)THEN 1000 ELSE r.price END

This took ~100ms to finish on my computer and is too slow for my application. Is there a faster way to do it?

1

1 Answers

0
votes

For one your query isn't going to behave as you intend if the relationship doesn't exist.

MERGE will first attempt to MATCH the entire pattern, and if there is no match (there won't be if the relationship doesn't exist) then it will CREATE the entire pattern. This means it will create a duplicate New York and Los Angeles node with those properties and create the relationship between them, leaving your original New York and Los Angeles nodes alone (and not connecting them).

To avoid this, MATCH on both nodes (unless there's a chance they might not exist yet, in which case use MERGE instead), then MERGE on the relationship between them. You can also use the ON MATCH and ON CREATE clauses to divide up at least one of your cases, eliminating the need to test for a non-existent price:

WITH {basePrice} as basePrice
MATCH (a:City{name:{fromCity}})
MATCH (b:City{name:{toCity}})
MERGE (a)-[r:TO]->(b)
ON CREATE SET r.price = basePrice
ON MATCH SET r.price = CASE WHEN r.price > basePrice THEN basePrice ELSE r.price END

And for the sake of performance, you should have indexes or unique constraints (whichever is appropriate) on the label/property combination.

After the index or constraint is up, if you're still seeing performance problems, you may want to PROFILE your query and add the plan (with nodes expanded) to your description, that can often provide clues on other means of optimization.

EDIT: Dave Bennett's suggestion to parameterize city names and the base price is a good one, altered the query accordingly.