2
votes

My current data frame in R has only 2 columns, namely longitude and latitude. There are around 1500 records (rows) and they include lots of duplicates.

An extract of the data frame is shown below:

longitude   latitude
57.408999   -20.208104
57.667991   -20.13641
57.539122   -20.103416
57.502332   -20.124798
57.414653   -20.261872
57.65949    -20.126768
57.468383   -20.223031
57.754464   -20.25823
57.754464   -20.25823
57.680745   -20.121893
57.65949    -20.179457
57.669408   -20.177538
57.702715   -20.211515

I want to convert this data frame into the following format:

longitude   latitude    emp emp2
57.408999   -20.208104  1   0.1
57.667991   -20.13641   11  1.1
57.539122   -20.103416  16  1.6
57.502332   -20.124798  10  1
57.414653   -20.261872  1   0.1
57.65949    -20.126768  2   0.2
57.468383   -20.223031  17  1.7
57.754464   -20.25823   9   0.9
57.754464   -20.25823   13  1.3
57.680745   -20.121893  13  1.3
57.65949    -20.179457  4   0.4
57.669408   -20.177538  3   0.3
57.702715   -20.211515  1   0.1

emp will be a new column which is the frequency of each longitude and latitude. Thus my data frame will now only have unique longitude and latitude with their respective counts.

emp2 is simply the value of emp divided by 10

Can this be done with R? If yes, any help would be highly appreciated.

Since I am new to R, I am confused as to where to start to solve the issue.

2

2 Answers

2
votes

An easy way with dplyr would be

library(dplyr)
df %>%
  group_by(longitude, latitude) %>%
  summarise(emp = n(), 
            emp2 = emp/10)
1
votes

Alternative base R solution using aggregate.

attach(df)
df <- aggregate(df, by=list(longitude, latitude), FUN=length)
colnames(df) <- c('longitude', 'latitude', 'emp', 'emp2')
df$emp2 <- df$emp2 / 10