1
votes

I am doing social network analysis and working with two data frames. Dataframe A (or "nodes") has the information related to each node of the network (i.e. id and name). Dataframe B (or "links") has two columns: "from" and "to" which basically shows how the nodes are connected between them. Each row represents a link "from" one node "to" the other. I want to use the package networkD3 to visualize the network but it has some requirements: id's should start from zero and they have to be consecutive (0,1,2, etc). Because my nodes and links are a random subset from a larger database, they are not consecutive. I sorted the "nodes" data frame based on the id and created a new column (new_id) starting from zero and with consecutive numbers. But now, I don't know how to update the "links" data frame based on the new_id's. Currently, I am converting the values in the "links" data frame to characters and then revaluing them using the plyr package. But I need to do this for a larger dataset. I am copying a sample of the two data frame that I have now:

set.seed(10)
nodes_df <- data.frame(id = c(1,3,5,6,8,10), 
     name = c("Agriculture", "Agriculture_in_Mesoamerica", "Agriculture_in_ancient_Greece",
     "Agriculture_in_ancient_Rome", "Agriculture_in_India", "Agriculture_in_China"), 
     new_id = seq(0,5))

links_df <- data.frame(from = c(3,3,5,6,8,10), 
           to = c(1,5,6,8,10,3))

In summary, I need to update the values in the links_df to correspond to the new_id values from the nodes_df.

Thank you so much in advance. I hope I was clear enough. Best regards,

3
This looks like either merge or just links_df$to[ match(notes_df$id,links_df$from) ]. - r2evans

3 Answers

0
votes

In base you just need to use merge and extract your required column

links_df$new_to <- merge(links_df, nodes_df, 
                         by.x = "to", by.y = "id",
                         all.x = TRUE)$new_id
links_df$new_from <- merge(links_df, nodes_df, 
                         by.x = "from", by.y = "id",
                         all.x = TRUE)$new_id
links_df <- links_df[,c(1,2,4,3)] # Reordering columns
links_df
  from to new_from new_to
1    3  1        1      0
2    3  5        1      1
3    5  6        2      2
4    6  8        3      3
5    8 10        4      4
6   10  3        5      5
0
votes

An alternative to merging or joining could be to use recode. A solution (based in the tidyverse) could look as follows.

library(dplyr)
library(tibble)

swap <- deframe(tibble(id = nodes_df$id, new_id = nodes_df$new_id))

links_df %>%
  mutate(new_from = recode(from, !!!swap),
         new_to = recode(to, !!!swap))

#   from to new_from new_to
# 1    3  1        1      0
# 2    3  5        1      2
# 3    5  6        2      3
# 4    6  8        3      4
# 5    8 10        4      5
# 6   10  3        5      1

0
votes

Technically speaking, networkD3 expects the values in the links data frame to be the (zero-based) index of the nodes they refer to in the nodes data frame. So the first row/node in the nodes data frame is 0, and so forth.

You can use match() to determine the 1-based index of each element in a vector in a target vector, and subtract 1 to get a 0-based index.

links_df$from
#> [1]  3  3  5  6  8 10
nodes_df$id
#> [1]  1  3  5  6  8 10
match(links_df$from, nodes_df$id) - 1
#> [1] 1 1 2 3 4 5

links_df$to
#> [1]  1  5  6  8 10  3
nodes_df$id
#> [1]  1  3  5  6  8 10
match(links_df$to, nodes_df$id) - 1
#> [1] 0 2 3 4 5 1

Created on 2021-03-28 by the reprex package (v1.0.0)