R - Group by, then iterate through group and extract original column values

Question

My dataframe (df):

ID1 | ID2 |  V1  |  V2 |  V3
A   | B   | var1 | foo |  1   
C   | D   | var2 | bar |  2
E   | F   | var3 | foo |  3
G   | F   | var3 | foo |  3
H   | I   | var4 | zap |  2
...

ID1 and ID2 contain overlapping values, as it's a long format version of an upper matrix triangle with identical comparsions (eg A, A) removed and some additional metadata (V1,V2,V3) added.

The above must be grouped by V1, V2 & V3, and the final output is to be a list of IDs (ID1 and ID2 contain overlapping variables) that make up each group (with each list being a seperate file).

So far, I've grouped the variables but stuck on how to go on about iterating through each of dplyr's groups and obtaining the values for each.

A pseudocode of what I have in mind is below:

# Group
cluster <- df %>% group_by(V1,V2,V3) 

[?] # loop through each group in cluster
    
    [?] # get group values as x, y and z
    
    # Get IDs into lists and merge
    ID1 <- df %>% filter(V1 == x, V2 ==y, V3 == z) %>%
           pull(ID1)

    ID2 <- df %>% filter(V1 == x, V2 ==y, V3 == z) %>%
           pull(ID2)

    merged <- c(ID1,ID2) 
   
    merged_unique <- unique(unlist(merged))

    # Print out to file
    fileConn <- file(paste(X ,Y, Z,"txt", sep="."))
    writeLines(merged_unique, fileConn)
    close(fileConn)

I would like my final output to be:

file var1.foo.1.txt :

A
B

file var2.bar.2.txt :

C
D

file var3.foo.3.txt :

E
F
G

file var4.zap.2.txt :

H
I

Any help is appreciated.

Illari Illari · Accepted Answer · 2020-10-09T09:47:03

Generating your "data":

df <- data.frame("ID1" = c("A","B","C","E","G","H"), "ID2" = c("B","B","D","Fe","Fe","I"), "V1" = c("var1","var1","var2","var3","var3","var4"),"V2" = c("foo","foo","bar","foo","foo","zed"), "V3" = c(1,1,2,3,3,2))

Clustering the data and getting the unique clusters:

library(dplyr)
df_clust <- df %>% group_by(V1,V2,V3) 
df_tally <- df_clust %>% tally()

Looping through, assuming only two ID columns and 3 feature columns, and printing each result to a new file:

for (i in c(1:nrow(df_tally))){
    pull1 <- df %>% filter(V1 == unlist(df_tally[i,1]), V2 == unlist(df_tally[i,2]), V3 == unlist(df_tally[i,3])) %>% pull(ID1)
    pull2 <- df %>% filter(V1 == unlist(df_tally[i,1]), V2 == unlist(df_tally[i,2]), V3 == unlist(df_tally[i,3])) %>% pull(ID2)
    mergeID <- c(type.convert(pull1,as.is = TRUE),type.convert(pull2,as.is = TRUE))
    mergeID <- unique(mergeID)

    filename <- paste("TEST_",i, ".txt", sep="")
    fileConn<-file(filename)
    writeLines(mergeID, fileConn)
    close(fileConn)
}

R - Group by, then iterate through group and extract original column values

2 Answers