My dataframe (df):
ID1 | ID2 | V1 | V2 | V3
A | B | var1 | foo | 1
C | D | var2 | bar | 2
E | F | var3 | foo | 3
G | F | var3 | foo | 3
H | I | var4 | zap | 2
...
ID1 and ID2 contain overlapping values, as it's a long format version of an upper matrix triangle with identical comparsions (eg A, A) removed and some additional metadata (V1,V2,V3) added.
The above must be grouped by V1, V2 & V3, and the final output is to be a list of IDs (ID1 and ID2 contain overlapping variables) that make up each group (with each list being a seperate file).
So far, I've grouped the variables but stuck on how to go on about iterating through each of dplyr's groups and obtaining the values for each.
A pseudocode of what I have in mind is below:
# Group
cluster <- df %>% group_by(V1,V2,V3)
[?] # loop through each group in cluster
[?] # get group values as x, y and z
# Get IDs into lists and merge
ID1 <- df %>% filter(V1 == x, V2 ==y, V3 == z) %>%
pull(ID1)
ID2 <- df %>% filter(V1 == x, V2 ==y, V3 == z) %>%
pull(ID2)
merged <- c(ID1,ID2)
merged_unique <- unique(unlist(merged))
# Print out to file
fileConn <- file(paste(X ,Y, Z,"txt", sep="."))
writeLines(merged_unique, fileConn)
close(fileConn)
I would like my final output to be:
- file var1.foo.1.txt :
A
B
- file var2.bar.2.txt :
C
D
- file var3.foo.3.txt :
E
F
G
- file var4.zap.2.txt :
H
I
Any help is appreciated.