4
votes

I have a property A on my nodes that holds an array of string values:

n.A=["ABC","XYZ","123","ABC"]

During merges I frequently will write code similar to n.A = n.A + "New Value". The problem I've been running into is that I end up with duplicate values in my arrays; not insurmountable but I'd like to avoid it.

  • How can I write a cypher query that will remove all duplicate values from the array A? Some duplicates have already been inserted at this point, and I'd like to clean them up.
  • When adding new values to existing arrays how can I make sure I only save a copy of the array with distinct values? (may end up being the exact same logic as used to solve my first question)
3

3 Answers

3
votes

Combined some of the information on UNWIND with other troubleshooting and came up with the following Cypher query for removing duplicates from existing array properties.

match (n)
unwind n.system as x
with distinct x, n
with collect(x) as set, n
set n.system = set
3
votes

The query to add a non-duplicate value can be done efficiently (in this example, I assume that the id and newValue parameters are provided):

OPTIONAL MATCH (n {id: {id}})
WHERE NONE(x IN n.A WHERE x = {newValue})
SET n.A = n.A + {newValue};

This query does not create a temporary array, and will only alter the n.A array if it does not already contain the {newValue} string.

[EDITED]

If you want to (a) create the n node if it does not already exist, and (b) append {newValue} to n.A only if {newValue} is not already in n.A, this should work:

OPTIONAL MATCH (n { id: {id} })
FOREACH (x IN (
  CASE WHEN n IS NULL THEN [1] ELSE [] END ) | 
  CREATE ({ id: {id}, A: [{newValue}]}))
WITH n, CASE WHEN EXISTS(n.A) THEN n.A ELSE [] END AS nA
WHERE NONE (x IN nA WHERE x = {newValue})
SET n.A = nA + {newValue};

If the OPTIONAL MATCH fails, then the FOREACH clause will create a new node node (with the {id} and an array containing {newValue}), and the following SET clause will do nothing because n would be NULL.

If the OPTIONAL MATCH succeeds, then the FOREACH clause will do nothing, and the following SET clause will append {newValue} to n.A iff that value does not already exist in n.A. If the SET should be performed, but the existing node did not already have the n.A property, then the query would concatenate an empty array to {newValue} (thus generating an array containing just {newValue}) and set that as the n.A value.

2
votes

Once you've cleaned up existing duplicates, you can use this when adding new values:

match (n)
set n.A = filter (x in n.A where x<>"newValue") + "newValue"