3
votes

My graph contains an undirected topological network data and my goal is to build a query that finds all sub networks that apply to specific networking rules, create vertex for each subnetwork and connect those who have path between them. The intention is to minimize the big graph by replacing each subnetwork-subgraph in one vertex. To find all subnetworks I took 'connected components' query from gremlin recopies And added my networking rules to the stopping conditions. But right now I'm having hard time connecting this sub network to each other.

I'm providing here sample graph script (using different networking domain) that contains PC, Routers and other equipment nodes. Query should find all LANs by grouping connected PCs, and for each LAN return other LAN ids that have path to it.

Direction has no meaning in this graph, and path between subgraphs may contain many types of nodes (routers, equipment etc.).
My GraphDB is OrientDB.

Networking Graph Image

Result should look like this:

==>LAN 1: {pcs: [1, 2, 3], connected LANs: [LAN 2, LAN 3]}  
==>LAN 2: {pcs: [4, 5, 6], connected LANs: [LAN 1]}  
==>LAN 3: {pcs: [8, 7], connected LANs: [LAN 1]}  

This is query's first part (finding all sub networks):

g.V().hasLabel('PC').emit(cyclicPath().or().not(both())).
 repeat(__.where(without('a')).store('a').both()).until(or(cyclicPath(), hasLabel('Router'))).
 group().by(path().unfold().limit(1)).
 by(path().local(unfold().filter(hasLabel('PC')).values('id')).unfold().dedup().fold()).unfold()

My questions are:

  1. I can identify connectivity between sub networks by traversing some arbitrary node from every sub network till I reach node that exist on other sub network. How do I write it in gremlin?
  2. How can I create new graph out of this query results?
  3. What is the performance of this type of query in a big graph, say 30M nodes?

Create graph script:

g = TinkerGraph.open().traversal()
g.addV("PC").property("id","1").as("pc1").
addV("PC").property("id","2").as("pc2").
addV("PC").property("id","3").as("pc3").
addV("PC").property("id","4").as("pc4").
addV("PC").property("id","5").as("pc5").
addV("PC").property("id","6").as("pc6").
addV("PC").property("id","7").as("pc7").
addV("PC").property("id","8").as("pc8").
addV("Router").property("id","9").as("router1").
addV("Router").property("id","10").as("router2").
addV("Equipment").property("id","11").as("eq1").
addV("Equipment").property("id","12").as("eq2").
addV("Equipment").property("id","13").as("eq3").
addV("Equipment").property("id","14").as("eq4").
addE("Line").from("pc1").to("pc2").
addE("Line").from("pc1").to("eq3").
addE("Line").from("pc2").to("pc3").
addE("Line").from("pc3").to("eq1").
addE("Line").from("pc3").to("eq3").
addE("Line").from("pc4").to("pc5").
addE("Line").from("pc4").to("pc6").
addE("Line").from("pc5").to("pc6").
addE("Line").from("pc7").to("pc8")
addE("Line").from("router1").to("pc7").
addE("Line").from("router1").to("pc8").
addE("Line").from("router1").to("eq2").
addE("Line").from("router2").to("eq4").
addE("Line").from("eq1").to("router1").
addE("Line").from("eq3").to("router2").
addE("Line").from("eq4").to("pc4").
iterate()
1
You can use this workspace to demonstrate your problem: gremlify.com/tgpdyk6n3ro - gremlify

1 Answers

1
votes

This isn't a great answer because I think that I have to jump to your last question and ignore the first two of the three:

What is the performance of this type of query in a big graph, say 30M nodes?

If you modified the "Connected Component" recipe found here then I assume you read further down about the general expense of this sort of query for both OLTP and OLAP. I'd imagine that for 30M vertices you should be looking at OLAP-based processing (as opposed to that script you presented above). I suppose you might be able to do it with TinkerGraph/GraphComputer on a large enough machine with a lot of memory, but this might just be a job for SparkGraphComputer as suggested toward the end of the recipe.

I think that your first two questions seem to depend on your approach to and success around the third question and that those initial questions might get more focused or even change a bit once you get that far. Perhaps it would be best to try to get your OLAP approach to "connected components" settled and then come back with some more specific questions.