I cant seem to find an answer through search to this on SO. I'm trying to select a subset of a data.frame
based on four conditions (lon1, lon2, lat1 and lat2). I have a huge dissimilarity matrix that has been vectorized and the sites (lon1, lon2, lat1 and lat2) cbind
to it. Here is an example data frame:
out1 <- data.frame(lon1 = sample(1:10), lon2 = sample(1:10),
lat1 = sample(1:10), lat2 = sample(1:10),
dissimilarity = sample(seq(0,1,.1),10))
> out1
lon1 lon2 lat1 lat2 dissimilarity
1 2 6 4 4 0.6
2 4 2 1 3 1.0
3 10 9 2 6 0.0
4 3 1 10 8 0.5
5 9 5 9 1 0.8
6 5 7 5 9 0.9
7 1 8 6 7 0.2
8 8 3 8 5 0.7
9 7 4 3 10 0.3
10 6 10 7 2 0.1
out2 <- out1[c(2,5,6,8),]
lon1 lon2 lat1 lat2 dissimilarity
1 4 2 1 3 1.0
2 9 5 9 1 0.8
3 5 7 5 9 0.9
4 8 3 8 5 0.7
I tried using %in%
function a few times in this manner:
test <- out1[(out1$lon1 %in% out2$lon1) & (out1$lon2 %in% out2$lon2) &
(out1$lat1 %in% out2$lat1) & (out1$lat2 %in% out2$lat2), ]
This seems to work for the basic example I provide here. But, when I apply it to my huge data frame (with many lat
and lons
repeated) I get back a larger subset than the unqiue combinations I require. I assume because the match function in %in%
can only match a vector. So it's matching condition1 &
condition2 &
condition3 &
condition4 And thus is returning a results that gives a subset which is the same as the orginal out1
. I want to get only the case when all four values are the same for that row. This way I'll get a subset of the data for the pairwise dissimilarities I'm interested in.
Any ideas on how to subset by row based on a unique combination of four variables would be greatly appreciated.
your expected output? β Nishanth