I am learning the basics of neo4j and I am looking at the following example with credit card fraud https://linkurio.us/stolen-credit-cards-and-fraud-detection-with-neo4j. Cypher query that finds stores where all compromised user shopped is
MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = “Disputed”
MATCH victim-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = “Undisputed” AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN DISTINCT othermerchants.name as suspicious_store, count(DISTINCT t) as count, collect(DISTINCT victim.name) as victims
ORDER BY count DESC
However, when the number of users increase (let's say to millions of users), this query may become slow since the initial query will have to traverse through all nodes labeled person. Is it possible to speed up the query by asigning properties to nodes instead of transactions? I tried to remove "status" property from relationships and add it to nodes (users, not merchants). However, when I run query with constraint WHERE victim.status="Disputed"
query doesn't return anything. So, in my case person has one additional property 'status'. I assume I did a lot of things wrong, but would appreciate comments. For example
MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE victim.status = “Disputed”
returns the correct number of disputed transactions. The same holds for separately quering number of undisputed transactions. However, when merged, they yield an empty set.
If I made a mistake in my approach, how can I speed up queries for large number of nodes (avoid traversing all nodes in the first step). I will be working with a data set with similar properties, but will have around 100 million users, so I would like to index users on additional properties.