0
votes

I'm trying to build a database that works along the lines of http://static.echonest.com/BoilTheFrog/ where you enter 2 artists and you get a list of tracks that seamlessly transitions between them.

I have built a Neo4j database with around 100k inter-related Artists. The artists and their relationships with each other were taken from the Spotify Web API. Each artist has a popularity value (between 0 and 100). I want to find a path that is at least 20 connections long between two artists where all the artists on the path have a minimum popularity. Here's what i've done so far, and it makes sense in my head, but it just runs infinitely and never finishes.

MATCH (start:Artist {name: 'Ed Sheeran'}), (end:Artist {name: 'The Strokes'})
MATCH path = shortestPath((start)-[:`HAS SIMILAR ARTIST`*..20]-(end))
WHERE ALL(x in nodes(path) WHERE x.popularity > 20) 
AND LENGTH(path) = 20
RETURN path
LIMIT 1

My guess is that MATCH path =.. is finding the same path every time and it is then applying the WHERE filter, so it never succeeds. I have seen approaches that filter based on the relationship itself, but the properties I want to filter are on the nodes themselves.

If I instead use

MATCH (start:Artist {name: 'Ed Sheeran'}), (end:Artist {name: 'The Strokes'})
MATCH path = shortestPath((start)-[:`HAS SIMILAR ARTIST`*..20]-(end))
WHERE LENGTH(path) = 20
RETURN path
LIMIT 1

It succeeds, but some of the connections are extremely obscure, so I was hoping to strengthen the relationships with the popularity requirement.

1
For this sort of workload, Cypher usually does not provide the ideal solution. You are better off using the APOC library, which provides Dijkstra's algorithm.Gabor Szarnyas

1 Answers

1
votes

Since you only want paths of exactly length 20, you should specify 20 as the lower bound (as well as the upper bound) for the variable-length path pattern. That should eliminate (or greatly reduce the number of) repeated traversals of shorter paths.

MATCH (start:Artist {name: 'Ed Sheeran'}), (end:Artist {name: 'The Strokes'})
MATCH path = shortestPath((start)-[:`HAS SIMILAR ARTIST`*20..20]-(end))
WHERE ALL(x in nodes(path) WHERE x.popularity > 20) 
RETURN path
LIMIT 1;