2
votes

I have a data frame, tmp:

  class          x          y
1   A -2.8959969 -0.3192259
2   B -0.2401775  0.5801373

I compute dist(tmp, method="euclidean", diag=TRUE, upper=FALSE, p=2) which yields :

         1        2
1 0.000000         
2 3.434144 0.000000

I simply cannot figure out how this is the Euclidean distance. If I compute what I think the Euclidean distance should be, it should be :

((A_{x} - B_{x})^2 + (A_{y} - B_{y})^2 )^0.5 = 
((-2.8959969 + 0.2401775)^2 + (-0.3192259 - 0.5801373)^2)^0.5 = 
2.803967

This is discrepant from what dist() returns.

According to the docs it says

Available distance measures are (written for two vectors x and y):

‘euclidean’: Usual distance between the two vectors (2 norm aka L_2), sqrt(sum((x_i - y_i)^2)).

Where am I going wrong?

1
try dist(tmp[-1]) - Sandipan Dey
dist(tmp[-1]) gets rid of my class column and works. In my case above, how is R is treating characters of the class? - irritable_phd_syndrome
As the factor values, as.numeric(tmp$class). - A. Webb
This still strikes me as a bit mysterious. If A and B are being treated as the factor values then they would be integers, but if you look at the output and work backwards it seems that you would need (A-B)^2 = 3.93, which isn't consistent with A,B being converted to integer values. Whatever is happening, it isn't that. - John Coleman
Yeah, agreed A,B != 1,2 as you'd expect from factor values. - irritable_phd_syndrome

1 Answers

1
votes

Do tmp[-1]

Doing just tmp seems to behave strangely. Is this something that should be reported as a bug???