1
votes

I want to implement a unique ID property on all nodes in my database but need to apply it to existing data. I'm using Ruby to perform generate the IDS and then running the Cypher query from there. I want to avoid one query to find nodes missing the property, another to set the property on each node individually, since that would require total_nodes + 1 queries.

Initially, I was thinking I could do something like this:

MATCH (n:`#{label}`) WHERE NOT HAS(n.my_id) SET n.my_id = '#{gen_method}' RETURN DISTINCT(true)

Of course, this wouldn't work because it would call gen_method once in Ruby and then Neo4j would try to set all nodes IDs to that one value.

I'm thinking now that it might be best to generate a large number of IDs in Ruby first, then include that in the Cypher query. I'd like to loop through the matched nodes and set the missing property equal to its corresponding index in the array. The logic should go something like this

MATCH NODES WHERE GIVEN PROPERTY IS NULL, LIMIT TO 10,000
CREATE A COLLECTION OF THOSE NODES
SET NEW UUIDS ARRAY (provided by Ruby) AS "IDS_ARRAY"
FOR EACH NODE IN COLLECTION
  SET GIVEN PROPERTY VALUE = CORRESPONDING INDEX POSITION IN "IDS_ARRAY"
RETURN COUNT OF NODES WHERE GIVEN PROPERTY IS NULL

Based on the return value, it would know how many more times to do this. Cypher has a foreach loop but how I do this, especially if my unique_ids array is starting from a string in the Cypher query?

unique_ids = ['first', 'second', 'third', 'etc']
i = 0
for node in matched_nodes
  node.my_id_property = unique_ids[i]
  i += 1
end

Is it even possible? Is there a different way of handling this that will work?

1

1 Answers

1
votes

Got it! Found http://java.dzone.com/articles/neo4j-cypher-creating, which provided a method for doing this, and http://jexp.de/blog/2014/03/quickly-create-a-100k-neo4j-graph-data-model-with-cypher-only/ pointed out the range function. My first draft of the Ruby code that performs this looks like this:

def add_ids_to(model)
  label = model.mapped_label_name
  property = model.primary_key
  total = 1

  until total == 0
    total = Neo4j::Session.query("MATCH (n:`#{label}`) WHERE NOT has(n.#{property}) RETURN COUNT(n) as ids").first.ids
    return if total == 0
    to_set = total > 900 ? 900 : total
    new_ids = [].tap do |ids_array|
                to_set.times { ids_array.push "'#{new_id_for(model)}'" }
              end
    Neo4j::Session.query("MATCH (n:`#{label}`) WHERE NOT has(n.#{property})
      with COLLECT(n) as nodes, [#{new_ids.join(',')}] as ids
      FOREACH(i in range(0,#{to_set - 1})| 
        FOREACH(node in [nodes[i]]|
          SET node.#{property} = ids[i]))
      RETURN distinct(true)
      limit #{to_set}")
  end
end

I think that's all pretty readable. Regarding the queries themselves, I'm using Neo4j.rb and neo4j-core, but I'm skipping the Cypher DSL in this case. I'm limiting each query to a max of 900 nodes because that was the highest I could reliably go without running out of memory. Tune for your JVM heap size.