0
votes

I find that when importing data into the graph from a DB or any format in which I have a relationships as column keys, I have the need to create Edges using these keys which are already properties in the vertex.

How can I go through all Vertices creating Edges by using these FKs that I already ingested into the graph?

And I need this to be doable programmatically, because I have a lot of data where this step is required. Currently I'm using Gremlin.Net because the majority of the code I use is already C#

Example: Imagine I have ingested some customers

g.addV('customer').property('id', c_id).property('product', product_id)

And some products

g.addV('product').property('id', product_id)

I want to create edges like: costumer[bought-> project] How can I use the ids to create edges? I can't seem to be able to reference a property in the context of its vertex.

I want to do something like:

g.V.hasLabel('customer').as('c').addE('bought').to(g.V(c.product))

Obviously I cannot do c.product, and if there is any solution using loops, sadly it's out of the question since Cosmos Graph does not support it.

So far I've been resorting to looping in C# but even my sample data is not scalable.

1

1 Answers

1
votes

There might be a nicer way to do this, but I'll offer this:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('customer').property('id', 321).property('productBought', 123)
==>v[0]
gremlin> g.addV('product').property('id', 123)
==>v[3]
gremlin> g.addV('customer').property('id', 987).property('productBought', 789)
==>v[5]
gremlin> g.addV('product').property('id', 789)
==>v[8]
gremlin> g.V().hasLabel('customer').as('c').
......1>   V().hasLabel('product').as('p').
......2>   where('p', eq('c')).
......3>     by('id').
......4>     by('productBought').
......5>   select('p').
......6>   addE('buys').from('c').to('p')
==>e[10][0-buys->3]
==>e[11][5-buys->8]

That concept above is somewhat based on "traversal induced values" which is described here in more detail.

I've seen a lot of questions lately that are asking this sort of question - where people want to do joins without edges (i.e. joins on vertex property values). That's not a place where graph queries shine and for most implementations of Gremlin, likely CosmosDB as well, this will be an expensive operation depending on how much data you have.

Edges are best generated when knowledge of the relationship is known. So, if you knew at one point that "productBought" existed then it should not have been loaded as a "productBought" property key but as an edge to a "product" vertex. Making these kinds of choices up front in you schema design will save a lot of difficulty later.