Sankey Diagram with multiple colums and weight column - using NetworkD3 package

Question

I am trying to make an interactive Sankey with the networkd3 package. I have a dataset with eight columns.

df <- read.csv(header = TRUE, as.is = TRUE, text = '
clientcode,year1,year2,year3,year4,year5,year6,year7
1,DBC,DBBC,DBBC,DBC,DBC,"Not in care","Not in care"
2,DBC,DBBC,DBBC,"Not in care","Not in care","Not in care","Not in care"
3,DBC,DBBC,"Not in care","Not in care","Not in care","Not in care","Not in care"
4,DBC,DBBC,"Not in care","Not in care","Not in care","Not in care","Not in care"
5,DBC,DBBC,DBBC,"Not in care","Not in care","Not in care","Not in care"
')

I am using the code below in this post starting with "This question comes up a lot...": https://stackoverflow.com/a/52237151/4389763

This is the code I have:

df <- df %>% select(year1,year2,year3,year4,year5,year6,year7) 

links <-
df %>%
mutate(row = row_number()) %>%
gather('column', 'source', -row) %>%
mutate(column = match(column, names(df))) %>%
group_by(row) %>%
arrange(column) %>%
mutate(target = lead(source)) %>%
ungroup() %>%
filter(!is.na(target))

links <-
links %>%
mutate(source = paste0(source, '_', column)) %>%
mutate(target = paste0(target, '_', column + 1)) %>%
select(source, target)

nodes <- data.frame(name = unique(c(links$source, links$target)))

links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
links$value <- 1

nodes$name <- sub('_[0-9]+$', '', nodes$name)

library(networkD3)
library(htmlwidgets)

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
          Target = 'target', Value = 'value', NodeID = 'name')

But I don't know how to add the value of the flow. For example from DBC to DBBC occurs five times in year1 to year2. And DBBC to DBBC occurs three times from year2 to year3. With the code above I see every occurance as 1 and I would like to see the total value of a flow.

Like this example of a Sankey. Where you can see the total of for example group_A to group_C and not every occurance.

And is it possible to see the percentages in the mouse over? For example Year1 = DBC to Year2 = DBBC value is 5 out of 5 and percentage is 100%.

Can someone help me? Thank you.

also, you will have to explain what the weight means... as it is, you have one value for weight per row, but each row has multiple links... if you want that to give the "value" for each link, then you're missing a bunch of data — CJ Yetman
Thanks for your reaction. I have changed the question and added an example. I hope you can help me. @CJYetman — SuGer

SuGer SuGer · Accepted Answer · 2018-09-18T12:36:43

I have changed the code:

Instead of:

links$value <- 1

The new code:

links <- links %>% group_by(source, target) %>% tally()
names(links)[3] <- "value"

Sankey Diagram with multiple colums and weight column - using NetworkD3 package

2 Answers