2
votes

My graph is set up like this:

(:User)-[:LIVES_IN_COUNTRY]->(:Country)
(:User)-[:LIVES_IN_STATE]->(:State)
(:User)-[:LIVES_IN_CITY]->(:City)

I'm trying to build a dynamic cypher query to check all relationships but return results from only those that are valid

For e.g if I only know Country name and state name, in that case I only pass valid names for country and state and "Not Available" for city. I have tried something like this:

Match(n:User)
Match(n)-[:LIVES_IN_COUNTRY]->(:Country {name:'Canada'})
Match(n)-[:LIVES_IN_STATE]->(:State {name:'Ontario'}) 
Match(n)-[:LIVES_IN_CITY]->(:City {name:'Not Available'}) 

But, in this case, nothing is returned. How can I get users for Canada and Ontario in this case when City name is unavailable?

Perhaps it is possible using seperate cypher queries but I was wondering if it possible to handle within one cypher query

I have tried to use optional but it returns all users in all countries, states and so on. Not sure if it is possible with optional or something wrong in query

Match(n:User) 
OPTIONAL Match (n)-[:LIVES_IN_COUNTRY]->(:Country{name:'Canada'}) 
OPTIONAL Match (n)-[:LIVES_IN_STATE]->(:State {name:'Ontario'}) 
OPTIONAL Match (n)-[:LIVES_IN_CITY]->(:City {name:'Not Available'}) 
Return n.username,n.country, n.state, n.city
1

1 Answers

3
votes

Try using OPTIONAL MATCH instead of MATCH for country, state, and city. This will handle non-matches without eliminating the row from your results.

EDIT

I missed that this was a general query, not one where you were querying from a specific user.

I think the best approach would be to leave off matches entirely when the data isn't available (if you're assembling the query from Java or another language).

But if you can't do that, there is another approach we can use.

First, I would change your relationship types to just :LIVES_IN, that will let us do some easier matching later.

The approach for this one is to first use OPTIONAL MATCHes on country, state, and city, then collect these into a single list (removing null nodes where the data isn't available), then perform matching to a person where that same person has a :LIVES_IN relationship to all nodes in the collection.

WITH {country:'Canada', state:'Ontario', city:'Not Available'} as params
OPTIONAL MATCH (c:Country{name:params.country})
OPTIONAL MATCH (s:State{name:params.state})
OPTIONAL MATCH (ci:City{name:params.city})
WITH FILTER(loc in [c, s, ci] WHERE loc IS NOT null) as locations
WITH locations, LAST(locations) as place
MATCH (n:User)-[:LIVES_IN]->(place)
WHERE ALL(loc in locations WHERE (n)-[:LIVES_IN]->(loc))
RETURN n

Rather than match on all users then perform the WHERE ALL, we instead narrow our potential set of users to those who live in the narrowest location, if available (city, then state, then country).

EDIT - extending the query to other nodes/relationships

A warning for the above query, it does require at least one location to be present (country, state, or city). If none of these are available, then the query will fail. If you need to account for this possibility, let me know in the comments.

As far as extending this with other nodes and different relationships, it's possible, but it does require a bit more thought to figure out the approach. It helps if several things fall into the same type, and if all things of a type are represented in the db by nodes of the same label.

In your example of sports, you would most likely pass in a list of sport names for the query, and you'd have several :Sports nodes for each sport.

A modified query might look like:

WITH {country:'Canada', state:'Ontario', city:'Not Available', sports:['Hockey', 'Karate']} as params
WITH params, [(s:Sport)-[*0]-() WHERE s.name in params.sports | s] as sports
// keep the location section of your query the same
OPTIONAL MATCH (c:Country{name:params.country})
OPTIONAL MATCH (s:State{name:params.state})
OPTIONAL MATCH (ci:City{name:params.city})
WITH sports, FILTER(loc in [c, s, ci] WHERE loc IS NOT null) as locations
WITH sports, locations, LAST(locations) as place
MATCH (n:User)-[:LIVES_IN]->(place)
WHERE ALL(loc in locations WHERE (n)-[:LIVES_IN]->(loc))
// now do additional filtering on sports played
AND ALL(sport in sports WHERE (n)-[:PLAYS]->(sport))
RETURN n

The most complicated part is the pattern comprehension (only available in Neo4j 3.1 and up) on the second line:

[(s:Sport)-[*0]-() WHERE s.name in sports | s] as sports

While it would have been easier to use (s:Sport) on its own, Cypher doesn't recognize that as a pattern, so we have to fool it by using a pattern of a 0-length relationship (the node matched is only to itself).

The reason for this is it takes your list parameter of sports names and gives you back a list of associated :Sports nodes.

If you pass in an empty list for sports, this pattern comprehension will result in an empty list, and you can continue the query.

If we had tried a different approach, UNWINDing your list into sports name rows (with the intent to match :Sports with those names), if performed on an empty list it would have resulted in no rows, which would prevent the query from executing any further.

EDIT - all fields optional

All fields being optional imposes an efficiency problem.

In my query, I match on users in some location, using the smallest geographic location possible for matching. This is helpful in that the filtering down of :Users at this level is a constant-time operation, as we don't actually have to inspect ALL users and filter, we just follow the relationship from the location to the relevant set of users. With each subsequent matching or filtering, the set of matched users continues to fall, which is good for efficiency.

But if no locations are in the collection, the match and the query will fail. If I use an OPTIONAL MATCH instead, I'll get a null, which will be useless for reuse on the rest of the pattern matches.

The only way around is to match against all :Users first, and then filter from the entire set of users, so the performance of your queries will be impacted by and slow down as the number of :User nodes grows. As long as every field is optional, with no reliable starting place to match to users, there may be no way around this.

WITH {country:'Canada', state:'Ontario', city:'Not Available', sports:['Hockey', 'Karate']} as params
WITH params, [(s:Sport)-[*0]-() WHERE s.name in params.sports | s] as sports
OPTIONAL MATCH (c:Country{name:params.country})
OPTIONAL MATCH (s:State{name:params.state})
OPTIONAL MATCH (ci:City{name:params.city})
WITH sports, FILTER(loc in [c, s, ci] WHERE loc IS NOT null) as locations
// now start matching using the set of all :Users
MATCH (n:User)
WHERE CASE WHEN SIZE(locations) <> 0 THEN
    ALL(loc in locations WHERE (n)-[:LIVES_IN]->(loc))
  ELSE true END
// now do additional filtering on sports played
AND CASE WHEN SIZE(sports) <> 0 THEN 
    ANY(sport in sports WHERE (n)-[:PLAYS]->(sport))
  ELSE true END
RETURN n

EDIT - case statements to help with optional parameters

I realized that the query I gave will fail if there are no locations, or no sports passed in (due to the ALL() function on an empty list, which will be false), so I edited the above query to use CASE statements to only perform those evaluations if the collections aren't empty.

I also changed the function for sports from ALL() to ANY(), to meet your requirement to return users who play any of the sports given, rather than users who play all of the sports given.

EDIT - Adding properties handling.

If we want to get user between age of 18 and 30, we can add further filtering

WITH {country:'Canada', state:'Ontario', city:'Not Available', sports:['Hockey', 'Karate'], ageMin:18, ageMax:25} as params
WITH params, [(s:Sport)-[*0]-() WHERE s.name in params.sports | s] as sports
OPTIONAL MATCH (c:Country{name:params.country})
OPTIONAL MATCH (s:State{name:params.state})
OPTIONAL MATCH (ci:City{name:params.city})
WITH sports, FILTER(loc in [c, s, ci] WHERE loc IS NOT null) as locations, params.ageMin as ageMin, params.ageMax as ageMax
// now start matching using the set of all :Users
MATCH (n:User)
WHERE CASE WHEN SIZE(locations) <> 0 THEN
    ALL(loc in locations WHERE (n)-[:LIVES_IN]->(loc))
  ELSE true END
// now do additional filtering on sports played
AND CASE WHEN ageMin <> 0 THEN
    n.age >= ageMin
  ELSE true END
AND CASE WHEN ageMax <> 0 THEN
    n.age <= ageMax
  ELSE true END 
AND CASE WHEN SIZE(sports) <> 0 THEN 
    ANY(sport in sports WHERE (n)-[:PLAYS]->(sport))
  ELSE true END
RETURN n

EDIT

Your original requirement was that you would be passing in valid place names or "Not Available", but let's see what we can do to be able to handle invalid place names, but still retain proper behavior with "Not Available".

To help us out, let's create an :Invalid node that we'll use to "spike" our locations to prevent any further matching.

CREATE (:Invalid)

Now we'll do an OPTIONAL MATCH on the :Invalid node and include it in the locations, but we'll add a predicate on it so it will only be matched when one of the city, country, or state OPTIONAL MATCHES fails, but the associated parameter isn't 'Not Applicable'.

In the case where you try to OPTIONAL MATCH to 'Alberta' for the state but the node doesn't exist, then the variable for the state node will be null, and since the associated parameter isn't 'Not Applicable', the :Invalid node will be present in the location list, and will prevent any person from matching (since if the location list is non-empty, persons must have :LIVES_IN relationships to all elements of the list)

WITH {country:'Canada', state:'Ontario', city:'Not Available', sports:['Hockey', 'Karate'], ageMin:18, ageMax:25} as params
WITH params, [(s:Sport)-[*0]-() WHERE s.name in params.sports | s] as sports
OPTIONAL MATCH (c:Country{name:params.country})
OPTIONAL MATCH (s:State{name:params.state})
OPTIONAL MATCH (ci:City{name:params.city})
OPTIONAL MATCH (inv:Invalid)
 WHERE (c is null AND params.country <> 'Not Available') OR
       (s is null AND params.state <> 'Not Available') OR
       (ci is null AND params.city <> 'Not Available')
WITH sports, FILTER(loc in [c, s, ci, inv] WHERE loc IS NOT null) as locations, params.ageMin as ageMin, params.ageMax as ageMax
// now start matching using the set of all :Users
MATCH (n:User)
WHERE CASE WHEN SIZE(locations) <> 0 THEN
    ALL(loc in locations WHERE (n)-[:LIVES_IN]->(loc))
  ELSE true END
// now do additional filtering on sports played
AND CASE WHEN ageMin <> 0 THEN
    n.age >= ageMin
  ELSE true END
AND CASE WHEN ageMax <> 0 THEN
    n.age <= ageMax
  ELSE true END 
AND CASE WHEN SIZE(sports) <> 0 THEN 
    ANY(sport in sports WHERE (n)-[:PLAYS]->(sport))
  ELSE true END
RETURN n