I am trying to reorganize some data from long to wide format.
There are many individuals (MRNs), each of whom had sequencing done a different number of times (seq_date) and I want to create a data frame that shows the change in the Val over time.
The data frame I'm starting with looks like this:
dat_data <- data.frame(
MRN = c("012345", "012345", "012345", "012345", "012345", "012345"),
seq_date = c("1-Aug-18", "27-Mar-19", "27-Mar-19", "27-Mar-19", "7-May-19", "7-May-19"),
Gene = c("SRSF2", "TET2", "IDH1", "SRSF2", "IDH1", "SRSF2"),
AA = c("p.A2B", "p.C2D", "p.E2F", "p.A2B", "p.E2F", "p.A2B"),
Val = c("0.1", "0.2", "0.3", "0.4", "0.5", "0.6")
)
> dat_data
MRN seq_date Gene AA Val
1 012345 1-Aug-18 SRSF2 p.A2B 0.1
2 012345 27-Mar-19 TET2 p.C2D 0.2
3 012345 27-Mar-19 IDH1 p.E2F 0.3
4 012345 27-Mar-19 SRSF2 p.A2B 0.4
5 012345 7-May-19 IDH1 p.E2F 0.5
6 012345 7-May-19 SRSF2 p.A2B 0.6
I'm trying to figure out how to add a column containing D1, D2, D3, etc. based on when the samples were collected:
MRN seq_date Gene AA Val ord_date
1 012345 1-Aug-18 SRSF2 p.A2B 0.1 D1
2 012345 27-Mar-19 TET2 p.C2D 0.2 D2
3 012345 27-Mar-19 IDH1 p.E2F 0.3 D2
4 012345 27-Mar-19 SRSF2 p.A2B 0.4 D2
5 012345 7-May-19 IDH1 p.E2F 0.5 D3
6 012345 7-May-19 SRSF2 p.A2B 0.6 D3
and subsequently use gather / spread to create a wide-format data frame that looks like this:
MRN Gene AA D1 D2 D3
1 012345 SRSF2 p.A2B 0.1 0.4 0.6
2 012345 IDH1 p.E2F 0 0.3 0.5
3 012345 TET2 p.C2D 0 0.2 0
I am most familiar w/ dplyr and have been trying mutate = case_when for the first step and gather / spread for the second, without success. Would greatly appreciate any help.