2
votes

I have two logical vectors and I want to measure how close (similar) the TRUE values are. So for example if we have these two vectors:

df<- data.frame(c(T,F,F,F,T,T,F,T),c(F,T,F,T,F,T,F,T))

And I tried this:

sum((df[1]&df[2])==T)
[1] 2

But the problem is that I only have the number of TRUE that are at the same place, but I'd like to know how close they are and compare different vectors with this method. I know for numerical vectors there are ways to do that (euclidian distance for example) but I didn't find any equivalent for logical vectors.

EDIT: It is important that the position of the values changes the similarity between the two vectors, for example in this dataframe:

  [,1] [,2] [,3] [,4]
a    1    0    0    0
b    0    1    0    0
c    0    0    0    1

The similarity between vector a and b should be greater than between b and c

1
Maybe try: sum(abs(df[1] - df[2])) or sqrt(sum((df[1] - df[2])^2))GKi
You only have two values, T and F. Measuring how close the T values are is the same as measuring how close the F values are. Think of them as numbers, 1's and 0's. You can measure the distance between two vector using the norm of their difference. That could be an L1 norm (count how many values are different), or L2 norm (Euclidean distance), or any other norm function that fits your needs.Aziz
Sound more like a stats question. Voted to move to stats.stackexchange.com. See e.g. How to get correlation between two categorical variableHenrik
or sum(df[1] != df[2])GKi

1 Answers

2
votes

ade4 package has a convenient function dist.binary() to calculate various distances/indices for binary data (think of the TRUE/FALSE as of 1/0). You might want to look up details about simple matching coefficient or jaccard index, here is a paper dealing with similarity measures on categorical data.

For instance similarity using Simple matching coefficient:

names(df) <- c("a", "b")
df <- t(as.matrix(sapply(df, as.numeric)))

ade4::dist.binary(df, method = 2L)
          a
b 0.7071068