0
votes

I am trying to plot Sankey diagrams using sankeyPlot() in networkD3 package. The visualization works great on a sample data such as this

Nodes

node
1124107186
1124132760
1124119016
20150517
/matte-low-dome-49354
/accounts/account-order-list.html
/Stepp

and Links

 source target   value
0        3       5
1        3       9
2        3       1
3        4       6
3        5       12
3        6       8

But it is difficult to do prepare the link table from a csv file in the format

       URI                          DATE_KEY    TIME_KEY GUID_KEY
/matte-low-dome-49354               20150517    145755  1124107186
/matte-low-dome-49355               20150517    145755  1124107186
/accounts/account-order-list.html   20150517    143857  1124132760
/accounts/account-order-list.html   20150517    143857  1124132760
/Stepp                              20150517    143416  1124119016
/Stepp                              20150517    143415  1124119016
/platinum-47184                     20150517    145637  1124107186

Is there a reproducible way to prepare the source, target row numbers for such a dataset?

1

1 Answers

0
votes

I figured out a way to do in Excel using vlookup and pivot. I assigned row numbers to each unique variable entry using the row()

variable          row
20150517           1
20150518           2
/platinum-47184    3

and did a vlookup with its name in another table hence getting the source and the target in the form of their sequence numbers. I then ran a pivot to get the count of each unique combination of variables, like this,

 source                              target   value
/matte-low-dome-49354               20150517    12 
/matte-low-dome-49355               20150517    6

I was able to use this as the input for making a Sankey plot.

This is not a very programmable way of doing it but it serves the purpose.