2
votes

I’m new to Neo4j and graph theory and I’m trying to figure out if I can use Neo4j to solve a problem I have. Please correct me if I’m using the wrong words to describe stuff. Since I’m new to the subject I haven’t really wrapped my head around what to call everything.

I think the easiest way to describe my problem is with a lot of pictures. Let’s say you have two disjoint subgraphs that look like this.

enter image description here

From the subgraphs above I want to get a list of subgraphs that fulfills one of two criteria.

Criteria 1. If a node has a unique relationship to another node, the nodes and relationship should be returned as a subgraph.

Criteria 2. If the relations are not unique, I'd like the node with the most relationships to be returned, as a subgraph with its relationships and related nodes.

If other nodes come in tie in criteria 2, I want all subgraphs to be returned.

Or put in the context of this graph,

Give me the people who have unique games, and if there are other people having the same games, give me back the person with the most games. If they come in tie, return all people who come in tie. Or actually, return the whole subgraph, not only the person.

To clarify what I am after here is a picture that describes the result I want to get. The ordering of the result is not important.

enter image description here

Disjoint subgraph A, because of Criteria 1, Andrew is the only person who has Bubble Bobble.

Disjoint subgraph B, because of Criteria 1, Johan is the only person who has Puzzle Bobble 1.

Disjoint subgraph C, because of Criteria 2, Julia since she has the most games.

Disjoint subgraph D, because of Criteria 2, Anna since she comes in tie with Julia having the most games.

Worth noting is that Johan's relationship to Puzzle Bobble 2 is not returned because it's not unique and he has not the most games.

Is this a problem you could solve with only Neo4j and is it a good idea?

If you could solve it how would you do it in Cypher?

Create script:

  CREATE  (p1:Person {name:"Johan"}),
      (p2:Person {name:"Julia"}),
      (p3:Person {name:"Anna"}),
      (p4:Person {name:"Andrew"}),

      (v1:Videogame {name:"Puzzle Bobble 1"}),
      (v2:Videogame {name:"Puzzle Bobble 2"}),
      (v3:Videogame {name:"Puzzle Bobble 3"}),
      (v4:Videogame {name:"Puzzle Bobble 4"}),
      (v5:Videogame {name:"Bubble Bobble"}),

       (p1)-[:HAS]->(v1),
       (p1)-[:HAS]->(v2),

       (p2)-[:HAS]->(v2),
       (p2)-[:HAS]->(v3),
       (p2)-[:HAS]->(v4),

       (p3)-[:HAS]->(v2),
       (p3)-[:HAS]->(v3),
       (p3)-[:HAS]->(v4),

       (p4)-[:HAS]->(v5)
1
A person can have a unique game and share a cluster with someone who has more games–who do you want returned in that case? It's Christmas, Andrew gets Puzzle Bobble 1 and Anna gets Arkanoid. Now Andrew and Anna share a cluster, Anna has the most games in the cluster, but Andrew still has a unique game–who do you want returned?jjaderberg
As @jjaderberg indicates, your problem statement is flawed. Not only can a single "cluster" (which seems to be another name for "disjoint subgraph") have many games played by multiple people, but it can also have many "unique games" played by only a single person each. Just because a game has a single player does not mean that that player does not also play other games. So, please clarify what you are really trying to do.cybersam
Thank you all for the feedback, I will update the post! @cybersam Thanks for pointing out that it's called a "disjoint subgraph" and not a "cluster". It really helps out to know what things are called when you try to find information about them.qeshi

1 Answers

1
votes

I feel like this solution might not be quite what you're looking for, but it could be a good start:

MATCH (game:Videogame)<-[:HAS]-(owner:Person)
OPTIONAL MATCH owner-[:HAS]->(other_game:Videogame)
WITH game, owner, count(other_game) AS other_game_count
ORDER BY other_game_count DESC
RETURN game, collect(owner)[0]

Here the query:

  • Finds all of the games and their owners (games without owners will not be matched)
  • Does an OPTIONAL MATCH against any other games those owners might own (by doing an optional match we're saying that it's OK if they own zero)
  • Pass through each game/owner pair along with a count of the number of other games owned by that owner, sorting so that those with the most games come first
  • RETURN the first owner for each game (the ORDER is preserved when doing the collect)