0
votes

this is a follow-up question to a recent issue on calculating graph depth I encountered. This involves tidyverse and tidygraph. After reading into tidygraph I felt I'd give it a proper try but I encountered a new problem in my workflow.

When working with the group_by() verb from dplyr to create a graph for each group, the guess_df_type() function in as_tbl_graph() from tidygraph does not what I'm looking for but I can't find a way to set the from and to value as intended. Here's a reproducible example:

library(tidygraph)
library(tidyverse)

tmp <- tibble(
  id_head = as.integer(c(4,4,4,4,4,4,5,5,5,5)),
  id_sec  = as.integer(c(1,1,1,2,2,2,1,1,2,2)),
  token   = as.integer(c(1,2,3,1,2,3,1,2,1,2)),
  head    = as.integer(c(2,2,2,1,1,2,2,2,2,2)),
  root    = as.integer(c(2,2,2,1,1,1,2,2,2,2))
) 
tmp %>%
  group_by(id_head, id_sec) %>% 
  as_tbl_graph()

The result to this is:

# A tbl_graph: 4 nodes and 10 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 4 x 1 (active)
   name
  <chr>
1     4
2     5
3     1
4     2
#
# Edge Data: 10 x 5
   from    to token  head  root
  <int> <int> <dbl> <dbl> <dbl>
1     1     3     1     2     2
2     1     3     2     2     2
3     1     3     3     2     2
# ... with 7 more rows

The nodes are not taken from the token column but from both id_head and id_sec.

After looking further into it I renamed token and head to from and to and this at least solves the first issue:

tmp %>% 
  rename(
    from = token,
    to = head
  ) %>% 
  as_tbl_graph(directed = FALSE) 

Resulting:

# A tbl_graph: 3 nodes and 10 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 3 x 1 (active)
   name
  <chr>
1     1
2     2
3     3
#
# Edge Data: 10 x 5
   from    to id_head id_sec  root
  <int> <int>   <int>  <int> <int>
1     1     2       4      1     2
2     2     2       4      1     2
3     2     3       4      1     2
# ... with 7 more rows

Let me further formulate the issue I'm having. When I try to use group_by(id_head,id_sec) inside the graph, the result is an error:

tmp %>% 
  as_tbl_graph() %>%
  group_by(id_head, id_sec)

Error in grouped_df_impl(data, unname(vars), drop) :

Column id_head is unknown

So either way, I do not understand how to use group_by with tidygraph. Any help is very much appreciated! Thanks in advance.

Also, sorry for using igraph as a tag, it should be tidygraph but that does not exist yet. tidygraph is build upon igraph and the tidyverse tho.

1

1 Answers

2
votes

For the first question I’m a bit unsure how your data.frame should be parsed into a graph - tidygraph contains documentation about all the graph representations it understands and I suggest you consult this.

For the second question - it is simply a matter of nodes being active while the edges contains the variable you want to group on. Simply activate the edges prior to grouping...

tmp %>% 
  rename(
    from = token,
    to = head
  ) %>%
  as_tbl_graph() %>%
  activate(edges) %>%
  group_by(id_head, id_sec)