0
votes

I created a toy application that adds vertices and edges to a CosmosDB graph collection, and prints the consumed RUs for each request.

This is the output:

Running g.V().drop()
RUs: 34.88 //clean slate

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 1.97  //don't actually create any edge, no vertices present

Running g.addV('person').property('id', '1').property('tenantId', '1')
RUs: 5.71

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 11.4  //1st edge, 1 vertex present

Running g.addV('person').property('id', '2').property('tenantId', '1')
RUs: 5.71 //constant vertex creation cost

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 11.76 //2nd edge, 2 vertices + 1 edge present - cost goes up

Running g.addV('person').property('id', '3').property('tenantId', '2')
RUs: 5.71 //constant vertex creation cost - this vertex is on a different partition

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 12.1 //3rd edge, 3 vertices + 2 edges present - cost goes up

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 12.28 // 4th edge, 3 vertices + 3 edges present - cost goes up

Running g.V('1').addE('knows').to(g.V('1'))
RUs: 12.46 // 5th edge, 3 vertices + 4 edges present - cost goes up

The cost of adding a vertex is constant, but the cost of adding an edge increases with the number of vertices and edges already in the graph.

Any idea why this is happening?

EDIT: I tried the same thing on a collection WITHOUT partitions and whaddaya know?
All creation costs are constant!
I'd really like to understand what inter-partition communication is going on if the collection is partitioned.

2

2 Answers

2
votes

You need a slight modification of the query,

g.V('1').has('tenantID', '1').addE('knows').to(g.V('1').has('tenantId', '1'))

for partitioned collection. Querying a vertex by ‘id’ is inefficient for partitioned collection, as the vertex will be searched in all the partitions.

0
votes

What's the gremlin queries for adding vertices and edges.

Note that, adding a vertex is a single write. While, adding an edge can be two reads and a write.

Now, writes are typically independent, but reads can be correlated with the amount of data in the system, depending on how you are reading them.