Euclidean distance in R using two variables in a matrix

Question

I am quite new to R and I am trying to compute the gross distance (or the sum of the Euclidean distance on all data points) from two variables in my matrix and net distance (Euclidean distance between the first and last point of my data. So just a background on my data. My data is normally a csv file comprising of 5 variables: tracks of cells (called A), time interval, X and Y position of each cell, V=velocity. There is around 90 tracks per data and each track should be treated independent of each other.

dput(head(t1))
structure(list(A = c(0L, 0L, 0L, 0L, 0L, 0L), T = 0:5, X = c(668L, 
668L, 668L, 668L, 668L, 668L), Y = c(259L, 259L, 259L, 259L, 
259L, 259L), V = c(NA, 0, 0, 0, 0, 0)), .Names = c("A", "T", 
"X", "Y", "V"), row.names = c(NA, 6L), class = "data.frame")

I was not aware of the dist() function before, so I made my own function:

GD.data <- function (trackdata)
{A= trackdata(, 1); V=trackdata(, 5);
 for (i in min(A):max(A))
   while (A<=i) {GD(i) = (sum (V)*(1/25))
                 return (GD(i))}

This did not work. I used A as an identifier of the track and since gross distance could be also computed as: distance=velocity (t1-t0), I just did summation of all velocity times my time interval (since it is constantly 1/25 secs.

How do I use the dist() function with my A as identifier? I need this since the computation of each track should be separate. Thanks!

Thanks! But he is computing from one driver (track) only. What if computing with many tracks in one matrix (in his case, many drivers)? Thanks — Kaye11
When you say didn't work what error messages did you get? I am assuming at least argument "x" is missing, with no default ? You have a dataframe (trackdata) and are trying to reference the columns using e.g. (,1). You need to use [ which is the subsetting function - i.e. trackdata[,1] will get you the first column. HTH! — Simon O'Hanlon
It would help a lot if you would post the output from dput(head(trackdata)) into your question as a code block, then we can recreate a small sample of your data on our computers. — Simon O'Hanlon
@SimonO101 When I run it on the console, it sorts of wait for another command because the arrow does not show up. So I don't know if it really does not work or if I end it the wrong way. — Kaye11

Simon O'Hanlon Simon O'Hanlon · Accepted Answer · 2013-04-22T13:43:31

Since you have velocity measured at constant time intervals, which you can sum over to get the total euclidean distance moved, you can actually just use the base R function aggregate to sum the V data by each track identifier A, which is what the command below does:

aggregate( V ~ A , data = t1 , sum , na.rm = TRUE )

Basically this says, aggregate V for each value of A. The aggregation function is sum (you can imagine this could easily be the mean velocity for each track by using mean instead of sum). We pass an additional argument to sum which is na.rm, telling it to ignore NAs in the data (which I assume are at t = 0 for each track).

Calculating 'as the crow flies' distance between first and last position by track:

For this we can split the dataframe into sub-dataframes by the track identifier A and then operate on each subset of the data, using lapply to apply a simple hypotenuse calculation to the first and last row of each sub-dataframe.

## Split the data
dfs <- split(t1,t1$A)

## Find hypotenuse between first and last rows for each A
lapply( dfs , function(x){
  j <- nrow(x)
  str <- x[1,c("X","Y")]
  end <- x[j,c("X","Y")]
  dist <- sqrt( sum( (end - str)^2 ) )
  return( dist )
} )

Euclidean distance in R using two variables in a matrix

1 Answers