0
votes

I have a dataframe (df) which has three column likes so: (all numbers random)

ID  Lat    Lon
1   25.32 -63.32
1   25.29 -64.21
1   24.12 -62.43
2   12.42  54.64
2   12.11  53.43
.   ....   ....

Basically I wanted to have the centroid per ID like so:

ID  Lat    Lon    Cent_lat   Cent_lon
1   25.32 -63.32  25.31      -63.25
1   25.29 -64.21  25.31      -63.25
1   24.12 -62.43  25.31      -63.25
2   12.42  54.64  12.20       53.60
2   12.11  53.43  12.20       53.60

I tired the following:

library(geosphere)
library(rgeos)
library(dplyr)

df1 <- by(df,df$ID,centroid(df$Lat, df$Long))

But this gave me this error:

Error in (function (classes, fdef, mtable): unable to find an inherited method for function ‘centroid’ for signature ‘"numeric"’

I even tired

df1 <- by(df,df$ID,centroid(as.numeric(df$Lat), as.numeric(df$Long)))

But this gave me this error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘centroid’ for signature ‘"function"’

5
Isn't the centroid of three points the average of their components (mean(long), mean(lat))?lmo
We have more than three points for most cases, and the average method would work if earth was flat :-)Anubhav Dikshit
To use centroid you need a poligon as matrix object, or a dataframe with appropriate rownames for each pointRobert

5 Answers

3
votes
library(geosphere)
library(ggplot2)
library(dplyr)

states <- map_data("state")

head(states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

cntrd <- function(x) {
  data.frame(centroid(as.matrix(x[,c("long", "lat")])))
}

by(states, states$group, cntrd) %>% head()
## $`1`
##         lon      lat
## 1 -86.82976 32.82735
## 
## $`2`
##         lon      lat
## 1 -111.6698 34.34309
## 
## $`3`
##         lon      lat
## 1 -92.43826 34.92167
## 
## $`4`
##         lon      lat
## 1 -119.6713 37.40289
## 
## $`5`
##         lon      lat
## 1 -105.5526 39.02653
## 
## $`6`
##         lon      lat
## 1 -72.72553 41.62706

group_by(states, group) %>%
  do(cntrd(.))
## Source: local data frame [63 x 3]
## Groups: group [63]
## 
##    group        lon      lat
##    <dbl>      <dbl>    <dbl>
## 1      1  -86.82976 32.82735
## 2      2 -111.66978 34.34309
## 3      3  -92.43826 34.92167
## 4      4 -119.67130 37.40289
## 5      5 -105.55264 39.02653
## 6      6  -72.72553 41.62706
## 7      7  -75.51543 39.00879
## 8      8  -77.03411 38.91083
## 9      9  -82.51260 28.69498
## 10    10  -83.46361 32.67562
## # ... with 53 more rows
3
votes

To use centroid you need polygons with longitude and latitude, in that order. See this example:

df<-rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20),
c(-100,-50), c(-160,-60), c(-180, -10), c(-160,10), c(-60,0),c(-100,-50))
df<-data.frame(ID=rep(c(1,2),times=c(5,6)),Lon=df[,1],Lat=df[,2])
df1 <- by(df[,c("Lon", "Lat")],df$ID,centroid)
df1
df[,c("Cent_lon","Cent_lat")]<-NA
for(i in names(df1))df[df$ID==i,c("Cent_lat","Cent_lon")]<-df1[[i]]
df

   ID  Lon Lat   Cent_lon   Cent_lat
1   1 -180 -20  -23.89340 -133.33333
2   1 -160   5 -133.33333  -23.89340
3   1  -60   0  -23.89340 -133.33333
4   1 -160 -60 -133.33333  -23.89340
5   1 -180 -20  -23.89340 -133.33333
6   2 -100 -50 -127.66065 -127.66065
7   2 -160 -60  -26.10686  -26.10686
8   2 -180 -10 -127.66065 -127.66065
9   2 -160  10  -26.10686  -26.10686
10  2  -60   0 -127.66065 -127.66065
11  2 -100 -50  -26.10686  -26.10686

You can use plotArrows to see the polygon

pol<-split(df[,2:3],df$ID)
#plotArrows(pol[[1]])
plotArrows(as.matrix(pol[[1]]))
points(df1[[1]],col=4)

enter image description here

3
votes

Here's a data.table approach. As @czeinerb mentioned, Lon is the first argument of the centroid function, and Lat is the second. We re-define the centroid function below so that, in the data.table aggregation, it receives a matrix with 2 columns (Lat|Lon), which is the required input into the geosphere's centroid function.

# Import packages
library(geosphere)
library(data.table) # Using a data.table approach

# Sample data
df = data.frame("ID" = c(1, 1, 1, 2, 2, 2), "Lat" = c(25.32, 25.29, 24.12, 12.42, 12.11, 12.22), "Lon" = c(-63.32, -64.21, -62.43, 54.64, 53.43, 53.23))

df

  ID   Lat    Lon
1  1 25.32 -63.32
2  1 25.29 -64.21
3  1 24.12 -62.43
4  2 12.42  54.64
5  2 12.11  53.43
6  2 12.22  53.23

# Convert to data.table
setDT(df)

# Re-define centroid function - Lon is first argument and Lat is second
# Geosphere takes a matrix with two columns: Lon|Lat, so we use cbind to coerce the data to this form
findCentroid <- function(Lon, Lat, ...){
  centroid(cbind(Lon, Lat), ...)
}

# Find centroid Lon and Lat by ID, as required
df[, c("Cent_lon", "Cent_lat") := as.list(findCentroid(Lon, Lat)), by = ID]
df

   ID   Lat    Lon  Cent_lon Cent_lat
1:  1 25.32 -63.32 -63.32000 24.91126
2:  1 25.29 -64.21 -63.32000 24.91126
3:  1 24.12 -62.43 -63.32000 24.91126
4:  2 12.42  54.64  53.76667 12.25003
5:  2 12.11  53.43  53.76667 12.25003
6:  2 12.22  53.23  53.76667 12.25003
1
votes

Function centroid of the geosphere package takes a matrix as data argument: "Arguments : x a 2-column matrix (longitude/latitude)"

https://cran.r-project.org/web/packages/geosphere/geosphere.pdf

Also, longitude is the first and latitude is the second column, not the other way around :)

So the code in your case could be like:

library(geosphere)

df <- data.frame(ID = c(1,1,1,2,2,2,2)
                , Lon = c(-63.32, -64.43, -62.43, 54.64, 53.43, 54.64, 53.43)
                , Lat = c(25.32, 25.29, 24.12, 12.42, 12.11, 11.11, 10.55))
mx <- as.matrix(df)

(mx1 <- by(mx[,2:3], mx[,1], centroid))

With the output:

> INDICES: 1
> lon      lat
> [1,] -63.39333 24.91126
> ----------------------------------------------------------------- 
> INDICES: 2
> lon lat
> [1,] Inf  90
0
votes

From ?centroid it says that it only takes a 2-column matrix as its argument. The ID information you have is making the matrix three columns.

df <- rbind(c(25.32,-63.32),c(25.29,-64.32),c(24.12,-62.43),c(12.42,54.64),c(12.11,53.43) centroid(df)

  lon       lat
[1,] 24.27109 -60.37098