3
votes

I'm looking at performing graph aggregate (groupBy,groupCount) queries across edges on a TitanGraph DB over two data sets:

  1. About 10,000 nodes and about 1 million edges

  2. About 200,000 nodes and about 1 billion edges

Does anyone know at what point I need to put in the effort to install Faunus to be able to do this type of gremlin query within say 1 minute?

1

1 Answers

5
votes

At 10000 nodes and 1M edges, you shouldn't have problems with plain Gremlin (no Faunus). See the code below where I generate a graph of approximately that size using Furnace:

gremlin>  g = TitanFactory.open('/tmp/titan/generated')
==>titangraph[local:/tmp/titan/generated]
gremlin> import com.tinkerpop.furnace.generators.*
==>import com.tinkerpop.gremlin.*
==>import com.tinkerpop.gremlin.java.*
...
==>import com.tinkerpop.furnace.generators.*
gremlin> for (int i=0;i<10000;i++) g.addVertex(i)
==>null
gremlin> r = new java.util.Random()
==>java.util.Random@137f0ced
gremlin> generator = new DistributionGenerator("knows", { it.setProperty("weight", r.nextInt(100)) } as EdgeAnnotator)
==>com.tinkerpop.furnace.generators.DistributionGenerator@111a3ce4
gremlin> generator.setOutDistribution(new PowerLawDistribution(2.1))
==>null
gremlin> generator.generate(g,1000000)
==>1042671

Recalling your post here on aggregates, I basically execute the same query on this data set.

gremlin> start=System.currentTimeMillis();g.E.groupBy{it.getProperty("weight")}{it}.cap.next();System.currentTimeMillis()-start
==>1415
gremlin> m.size()
==>100

As you can see, it takes about 1.5 seconds to do this traversal (it's a bout 500ms on TinkerGraph which is all in memory).

At 1B edges you will likely need Faunus. I don't think you would get through iteration of all those edges in under a minute even if you could fit it all in memory somehow. Note that with Faunus, you might not get 1 minute query/answer times. You will need to experiment a bit I think.