1
votes

I have three variables.

va1 and var2 are names, eg. of database tables.

var3 stores (in a string) the column(s) of the database tables which are necessary to join the two tables.

Without any loss of generality: Each Database table has only one colum to join with another database.

Is it possible in R/ dplyr to get a martix / tibble where var1 are the rows, var2 are the columns and var3 is the value of the var1*var2 combination?

An example dataframe:

df <- data.frame(var1 = paste0("tab",c(seq(1:5),1,1)), 
                 var2 = paste0("tab",c(2,4,5,3,1,3,4)), 
                 var3 = letters[1:7])

Expected result:

      tab1 tab2 tab3 tab4 tab5
 tab1    -    a    f    g    -
 tab2    -    -    -    b    -
 tab3    -    -    -    -    c
 tab4    -    -    d    -    -
 tab5    e    -    -    -    -

How to get it?

Thank you!

2

2 Answers

3
votes
library(dplyr)
library(tidyr)

> df %>% pivot_wider(names_from = "var1",values_from = "var3") %>%
+   arrange(var2) %>% column_to_rownames("var2") %>% t()

gives

     tab1 tab2 tab3 tab4 tab5
tab1 NA   "a"  "f"  "g"  NA  
tab2 NA   NA   NA   "b"  NA  
tab3 NA   NA   NA   NA   "c" 
tab4 NA   NA   "d"  NA   NA  
tab5 "e"  NA   NA   NA   NA

Use as.data.frame() to convert it into a df, as_tibble() don't keep the row names. If you want NAs to be like "–", add coalesce("–").

1
votes

This is not a very elegant base R solution, but it still does its job.

# create positions for final table
df$row <- as.numeric(sub("\\D+", "", df$var1))
df$col <- as.numeric(sub("\\D+", "", df$var2))

# create vector of tables (combining names from var1 and var2)
tables <- unique(c(df$var1, df$var2))

m <- matrix("-", nrow = length(tables), ncol = length(tables),
            dimnames = list(tables, tables))
for(i in 1:nrow(df))
  m[df$row[i], df$col[i]] <- df$var3[i]

Output

#      tab1 tab2 tab3 tab4 tab5
# tab1 "-"  "a"  "f"  "g"  "-" 
# tab2 "-"  "-"  "-"  "b"  "-" 
# tab3 "-"  "-"  "-"  "-"  "c" 
# tab4 "-"  "-"  "d"  "-"  "-" 
# tab5 "e"  "-"  "-"  "-"  "-"