My graph contains an undirected topological network data and my goal is to build a query that finds all sub networks that apply to specific networking rules, create vertex for each subnetwork and connect those who have path between them. The intention is to minimize the big graph by replacing each subnetwork-subgraph in one vertex. To find all subnetworks I took 'connected components' query from gremlin recopies And added my networking rules to the stopping conditions. But right now I'm having hard time connecting this sub network to each other.
I'm providing here sample graph script (using different networking domain) that contains PC, Routers and other equipment nodes. Query should find all LANs by grouping connected PCs, and for each LAN return other LAN ids that have path to it.
Direction has no meaning in this graph, and path between subgraphs may contain many types of nodes (routers, equipment etc.).
My GraphDB is OrientDB.
Result should look like this:
==>LAN 1: {pcs: [1, 2, 3], connected LANs: [LAN 2, LAN 3]}
==>LAN 2: {pcs: [4, 5, 6], connected LANs: [LAN 1]}
==>LAN 3: {pcs: [8, 7], connected LANs: [LAN 1]}
This is query's first part (finding all sub networks):
g.V().hasLabel('PC').emit(cyclicPath().or().not(both())).
repeat(__.where(without('a')).store('a').both()).until(or(cyclicPath(), hasLabel('Router'))).
group().by(path().unfold().limit(1)).
by(path().local(unfold().filter(hasLabel('PC')).values('id')).unfold().dedup().fold()).unfold()
My questions are:
- I can identify connectivity between sub networks by traversing some arbitrary node from every sub network till I reach node that exist on other sub network. How do I write it in gremlin?
- How can I create new graph out of this query results?
- What is the performance of this type of query in a big graph, say 30M nodes?
Create graph script:
g = TinkerGraph.open().traversal()
g.addV("PC").property("id","1").as("pc1").
addV("PC").property("id","2").as("pc2").
addV("PC").property("id","3").as("pc3").
addV("PC").property("id","4").as("pc4").
addV("PC").property("id","5").as("pc5").
addV("PC").property("id","6").as("pc6").
addV("PC").property("id","7").as("pc7").
addV("PC").property("id","8").as("pc8").
addV("Router").property("id","9").as("router1").
addV("Router").property("id","10").as("router2").
addV("Equipment").property("id","11").as("eq1").
addV("Equipment").property("id","12").as("eq2").
addV("Equipment").property("id","13").as("eq3").
addV("Equipment").property("id","14").as("eq4").
addE("Line").from("pc1").to("pc2").
addE("Line").from("pc1").to("eq3").
addE("Line").from("pc2").to("pc3").
addE("Line").from("pc3").to("eq1").
addE("Line").from("pc3").to("eq3").
addE("Line").from("pc4").to("pc5").
addE("Line").from("pc4").to("pc6").
addE("Line").from("pc5").to("pc6").
addE("Line").from("pc7").to("pc8")
addE("Line").from("router1").to("pc7").
addE("Line").from("router1").to("pc8").
addE("Line").from("router1").to("eq2").
addE("Line").from("router2").to("eq4").
addE("Line").from("eq1").to("router1").
addE("Line").from("eq3").to("router2").
addE("Line").from("eq4").to("pc4").
iterate()