1
votes

While doing my data work I have this problem. Data is as below,

row_number      var1 var2
1               1921 16
2               1922 16
3               1921 17
4               1922 17
5               1703 29
6               1704 29
7               1705 29
8               1703 30
9               1704 30
10              1705 30
11              1703 31
12              1704 31
13              1705 31

I want to make pairs by only using unique var1 and unique var2.

In other words, 1~4 rows can be a group and I only need to remain 1st and 4th column. And, 5~13 rows can be an another group and I only need to remain this pair (1703 29, 1704 30, 1705 31). That is, I want to have this outcome

row_number      var1 var2
1               1921 16
4               1922 17
5               1703 29
9               1704 30
13              1705 31


I have much more observations.

1
If 5-13 is another group, why is 9 in included?akrun
From 5th to 13th rows (total 9 rows) is a another groupJohn legend2
I understand that, but the number of elements in first one i.e. 1-4 is 4, from 5-9 it is 5akrun
5~13= > number of elements is 9. (because number of unique var1 is three and number of unique var is three) => 3*3=>9John legend2

1 Answers

1
votes

Suppose your data is in a dataframe named d. Then

out <- data.frame(row_number = NA, var1 = NA, var2 = NA)
for (i in 1:nrow(d)) {
  if (!(d[i, "var1" ] %in% out[, "var1"]) & !(d[i, "var2"] %in% out[, "var2"])) {
    out <- rbind(out, d[i,])
  }
}
out <- out[-1, ]
out
#    row_number var1 var2
# 2           1 1921   16
# 4           4 1922   17
# 5           5 1703   29
# 9           9 1704   30
# 13         13 1705   31

gives you your desired result by iterating through rows of d and extracting only rows where neither var1 nor var2 has previously appeared in the output dataframe.