2
votes

I'm looking to transform my in memory Plain old C# classes into a neo4j database. (Class types are node types and derive from, nodes have a List for "linkedTo")

Rather than write a long series of cypher queries to create nodes and properties then link them with relationships I am wondering if there is anything more clever I can do.

For example can I serialize them to json and then import that directly into neo4j? I understand that the .unwind function in the C# neo4j driver may be of help here but do not see good examples of its use and then relationships need to be matched and created separately

Is there an optimal method for doing this? i expect to have around 50k nodes

1
You're going to end up massaging your classes in someway to get into the DB - are you tied to the Neo4j.Driver or can you use Neo4jClient?Charlotte Skardon
happy to use either route, just looking for a clean oneGreyCloud

1 Answers

2
votes

OK, first off, I'm using Neo4jClient for this and I've added an INDEX to the DB using:

CREATE INDEX ON :MyClass(Id)

This is important for the way this works, as it makes inserting the data a lot quicker.

I have a class:

public class MyClass
{
    public int Id {get;set;}
    public string AValue {get;set;}
    public ICollection<int> LinkToIds {get;set;} = new List<int>();
}

Which has an Id which I'll be keying off, and a string property - just because. The LinkToIds property is a collection of Ids that this instance is linked to.

To generate my MyClass instances I'm using this method to randomly generate them:

private static ICollection<MyClass> GenerateMyClass(int number = 50000){
    var output = new List<MyClass>();

    Random r = new Random((int) DateTime.Now.Ticks);

    for (int i = 0; i < number; i++)
    {
        var mc = new MyClass { Id = i, AValue = $"Value_{i}" };
        var numberOfLinks = r.Next(1, 10);  
        for(int j = 0; j < numberOfLinks; j++){
            var link = r.Next(0, number-1);
            if(!mc.LinkToIds.Contains(link) && link != mc.Id)
                mc.LinkToIds.Add(link);
        }
        output.Add(mc);
    }

    return output;
}

Then I use another method to split this into smaller 'batches':

private static ICollection<ICollection<MyClass>> GetBatches(ICollection<MyClass> toBatch, int sizeOfBatch)
{
    var output = new List<ICollection<MyClass>>();

    if(sizeOfBatch > toBatch.Count) sizeOfBatch = toBatch.Count;

    var numBatches = toBatch.Count / sizeOfBatch;
    for(int i = 0; i < numBatches; i++){
        output.Add(toBatch.Skip(i * sizeOfBatch).Take(sizeOfBatch).ToList());
    }

    return output;
}

Then to actually add into the DB:

void Main()
{
    var gc = new GraphClient(new Uri("http://localhost:7474/db/data"), "neo4j", "neo");
    gc.Connect();

    var batches = GetBatches(GenerateMyClass(), 5000);

    var now = DateTime.Now;
    foreach (var batch in batches)
    {
        DateTime bstart = DateTime.Now;
        var query = gc.Cypher
            .Unwind(batch, "node")
            .Merge($"(n:{nameof(MyClass)} {{Id: node.Id}})")
            .Set("n = node")
            .With("n, node")
            .Unwind("node.LinkToIds", "linkTo")
            .Merge($"(n1:{nameof(MyClass)} {{Id: linkTo}})")
            .With("n, n1")
            .Merge("(n)-[:LINKED_TO]->(n1)");

        query.ExecuteWithoutResults();
        Console.WriteLine($"Batch took: {(DateTime.Now - bstart).TotalMilliseconds} ms");
    }
    Console.WriteLine($"Total took: {(DateTime.Now - now).TotalMilliseconds} ms");
}

On my aging (5-6 years old now) machine it takes about 20s to put 50,000 nodes in and around about 500,000 relationships.

Let's break into that important call to Neo4j above. The key things are as you rightly suggesting UNWIND - here I UNWIND a batch and give each 'row' in that collection the identifier of node. I can then access the properties (node.Id) and use that to MERGE a node. In the first unwind - I always SET the newly created node (n) to be the node so all the properties (in this case just AValue) are set.

So up to the first With we have a new Node created with a MyClass label, and all it's properties set. Now. This does include having an array of LinkToIds which if you were a tidy person - you might want to remove. I'll leave that to yourself.

In the second UNWIND we take advantage of the fact that the LinkToIds property is an Array, and use that to create a 'placeholder' node that will be filled later, then we create a relationship between the n and the n1 placeholder. NB - if we've already created a node with the same id as n1 we'll use that node, and when we get to the same Id during the first UNWIND we'll set all the properties of the placeholder.

It's not the easiest to explain, but in the best things to look at are MERGE and UNWIND in the Neo4j Documentation.