1
votes

Suppose you have an graph that you've made from an edgelist, and there are a couple hundred vertices. What I'm looking to do is to identify the initial set of vertices from which all subsequent ones are related to (like a mother, or family tree).

This is a data set that represents 'ice islands', large tabular sheets of ice that break off from glaciers and float around the sea. The initial fractures represent the root nodes. The subsequent vertices are re-observations of these pieces that are either smaller (melted islands), or pieces that have broken off (so the source vertex has a network of two edges and goes on to form two new vertices).

Is there a piece of code or a function that can do this easily for me? If I add labels to my plot it's impossible to read. Most of the methods of manipulating root nodes that I've been able to find involve small sample data sets where you just arbitrarily name things in the graph, or use the vertex's actual name. My data is stuff coming from a huge established CSV with super long number-character names. It makes it difficult.

I'm also super new to coding and R is a nightmare for me to use. Please be gentle and use simple examples! I can attach my code if you think it helps, all my data is being pulled out from a server and I don't know if it will be very clear from your perspective.

Thanks.

2
When asking for help, you should provide a reproducible example with sample input data and the desired result for that input. That way possible solutions can be tested and verified. You need to be specific about what our programming question is. If you have general questions about data analysis or algorithms, you'd probably be better off asking over at Data ScienceMrFlick

2 Answers

3
votes

For any node, n, you can find the number of edges into the node using neighbors(g, n, mode="in"). A node is an initial vertex if it does not have any edges coming into it. So you can just test all of the nodes for how many edges enter the node and select those for which the answer is zero.

Here is a simple example graph:

library(igraph)
set.seed(2017)
g = erdos.renyi.game(12, 20, type="gnm", directed=TRUE)
plot(g)

Example Graph

Now we can find the root nodes.

which(sapply(sapply(V(g), 
    function(x) neighbors(g,x, mode="in")), length) == 0)
[1] 1 2

This says that nodes 1 and 2 are sources.

Since you say that you are a beginner, let me explain this just a little.

function(x) neighbors(g,x, mode="in") is a function that takes a node as an argument and uses neighbors to return a list of nodes y that have a link from y to x (the parents of x).

sapply(V(g), function(x) neighbors(g,x, mode="in")) applies that function to all of the nodes in the graph, and so gives a list of the parents for every node. We are interested in the nodes that have no parents so we want the nodes for which the length of this list is zero. Thus, we apply length to the list of parents and check which lengths are zero.

1
votes

enter code hereThe solution of G5W fails if the graph contains a self-loop. An alternative approach:

library(igraph)
set.seed(2017)
g = erdos.renyi.game(12, 20, type="gnm", directed=TRUE)
rts <- V(g)[degree(g, mode=c("in"), loops = F) == 0]        # find roots, that is, in-degree is zero
paste(c("Roots: ", rts), collapse=" ")
plot(g, layout=layout_with_sugiyama(g)$layout)              # plot graph in layers

g2 <- simplify(g, remove.loops = T, remove.multiple = T)    # reduce to simple graph without loops
stopifnot(max(clusters(g2, mode=c("strong"))$csize) == 1)   # stop when cycles

E(g2)$weight <- -1                                          # shortest path is longest negative
dis <- (-distances(g2, mode="in") )                         # matrix VxV of depth of layers, depth from top to bottom
lay = as.matrix(apply(dis, 1, max))                         # maximum of distances in successors
as.matrix(apply(-distances(g2, mode="out"), 1, max))        # or reverse from bottom to top

plot(g, layout=layout_with_sugiyama(g, layer=lay)$layout)