I am developing a workflow processing script for dealing with sf
objects in R. sf
is the simple features class of objects which provide a means of dealing with spatial data in the tidyverse. However, I am having crippling difficulties doing standard group_by() %>% summarize() %>% mutate() processes with data stored as sf
. I am experiencing an issue where group_by() %>% summarize() works with the object after it is converted to a data frame, but not as an sf
.
Essentially I am trying to group lower level geographies by higher level geographies and output summary variables. I then need to mutate a variable in my new summarized sf
data object that computes a sum across multiple variables and divides by another variable. With sf
objects this last operation throws an error "x 'x' must be numeric" but the identical operation works for a data frame of the same data (just sans geography
). And I've verified that x is numeric for all variables passed to the rowSums
function.
Full reprex below. In the first example, you see the operation fails on the sf
version of the sample data. In the second example, with as.data.frame()
passed before the separate()
function, the process succeeds, but this eliminates the geographies, which are crucial for my analysis.
Thanks, all!
library(sf)
#> Warning: package 'sf' was built under R version 4.0.2
#> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.0.2
#> Warning: package 'tibble' was built under R version 4.0.2
#> Warning: package 'tidyr' was built under R version 4.0.2
#> Warning: package 'dplyr' was built under R version 4.0.2
library(dplyr)
library(spdep)
#> Loading required package: sp
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
library(stringi)
#> Warning: package 'stringi' was built under R version 4.0.2
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "+proj=longlat +datum=NAD27"
row.names(nc) <- as.character(nc$FIPSNO)
names(nc)
#> [1] "CNTY_ID" "AREA" "PERIMETER" "CNTY_" "NAME" "FIPS"
#> [7] "FIPSNO" "CRESS_ID" "BIR74" "SID74" "NWBIR74" "BIR79"
#> [13] "SID79" "NWBIR79" "east" "north" "x" "y"
#> [19] "lon" "lat" "L_id" "M_id" "geometry"
nc %>%
separate(CNTY_ID,into = c("ID1","ID2"),sep = 2,remove = FALSE) %>%
group_by(ID1) %>%
dplyr::summarize(AREA = sum(AREA, na.rm = TRUE),
BIR74 = sum(BIR74,na.rm = TRUE),
SID74 = sum(SID74,na.rm = TRUE),
NWBIR74 = sum(NWBIR74,na.rm = TRUE)
) %>%
mutate(stupid_var = rowSums(dplyr::select(.,'SID74':'NWBIR74'))/BIR74)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Error: Problem with `mutate()` input `stupid_var`.
#> x 'x' must be numeric
#> ℹ Input `stupid_var` is `rowSums(dplyr::select(., "SID74":"NWBIR74"))/BIR74`.
class(nc$SID74)
#> [1] "numeric"
class(nc$NWBIR74)
#> [1] "numeric"
class(nc$BIR74)
#> [1] "numeric"
nc %>%
as.data.frame() %>%
separate(CNTY_ID,into = c("ID1","ID2"),sep = 2,remove = FALSE) %>%
group_by(ID1) %>%
dplyr::summarize(AREA = sum(AREA, na.rm = TRUE),
BIR74 = sum(BIR74,na.rm = TRUE),
SID74 = sum(SID74,na.rm = TRUE),
NWBIR74 = sum(NWBIR74,na.rm = TRUE)
) %>%
mutate(stupid_var = rowSums(dplyr::select(.,'SID74':'NWBIR74'))/BIR74)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 5 x 6
#> ID1 AREA BIR74 SID74 NWBIR74 stupid_var
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18 2.53 36723 89 12788 0.351
#> 2 19 4.03 132525 203 38392 0.291
#> 3 20 3.94 111540 237 35281 0.318
#> 4 21 1.63 38117 106 14915 0.394
#> 5 22 0.494 11057 32 3723 0.340
Created on 2020-09-21 by the reprex package (v0.3.0)