4
votes

I have a list of points that I want to overlay on a map of San Francisco using ggplot2. Each point is a longitude, latitude pair. I want the resulting map to be in a longitude/latitude coordinate system. I managed to reproduce Hadley Wickham's directions for plotting polygon shapefiles using his example file. I am using R 2.15.1 for Windows.

However, I tried to use cdp files downloaded from the UScensus2010cdp package. Here's my code snippet:

require("rgdal") 
require("maptools")
require("ggplot2")
require("sp")
require("plyr")
gpclibPermit() # required for fortify method
require(UScensus2010)
require(UScensus2010cdp)
data(california.cdp10)
sf <- city(name = "san francisco", state="ca")
sf.points = fortify(sf)

I get the following error:

Using name to define regions.
Error in unionSpatialPolygons(cp, invert(polys)) : input lengths differ
In addition: Warning message:
In split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) :
   NAs introduced by coercion

Does anybody know:

  1. What is a good value to give to the region parameter of fortify()?
  2. If that fails, a source of map data with untransformed lat/long coordinates for San Francisco that ggplot2 can draw?
  3. Alternatively, I found here another map of San Francisco, whose data is translated. Can you tell me how to either translate this data to raw lat/long or make the reverse translation for my set of points?
1
the link within the link to UScensus2010cdp package appears to be brokenmnel
Thanks, I edited this to point to the link you gave.Yuval F

1 Answers

7
votes

note:

The issue

The issue arises from the fact that fortify.SpatialPolygonsDataFrame relies on converting the row.names to numeric, and the rownames of your data are the identifiers.

ggplot2:::fortify.SpatialPolygonsDataFrame 

function (model, data, region = NULL, ...) 
{
    attr <- as.data.frame(model)
    if (is.null(region)) {
        region <- names(attr)[1]
        message("Using ", region, " to define regions.")
    }
    polys <- split(as.numeric(row.names(attr)), addNA(attr[, 
        region], TRUE))
    cp <- polygons(model)
    try_require(c("gpclib", "maptools"))
    unioned <- unionSpatialPolygons(cp, invert(polys))
    coords <- fortify(unioned)
    coords$order <- 1:nrow(coords)
    coords
}

In your case

row.names(sf@data)
## [1] "california_586" "california_590" "california_616"

are the identifiers you wish to use as the region parameters, as place state and name do not uniquely identify the three polygons.

# as.character used to coerce from factor
lapply(lapply(sf@data[,c('place','state','name')], unique), as.character)
## $place
## [1] "67000"
## 
## $state
## [1] "06"
## 
## $name
## [1] "San Francisco"

As a character vector where the elements begin with alphabetic characters, when coerced to numeric, it becomes NA

as.numeric(rownames(sf@data))
## [1] NA NA NA
## Warning message:
## NAs introduced by coercion

Which is one of the warnings given

Solution

  1. Define a column to be the rownames
  2. Set the row.names to NULL or 1:nrow(sf@data)

So..

# rownames
sf@data[['place_id']] <- rownames(sf@data)
row.names(sf@data) <- NULL

# fortify
sf_ggplot <- fortify(sf, region = 'place_id')
# merge to add the original data
sf_ggplot_all <- merge(sf_ggplot, sf@data, by.x = 'id', by.y = 'place_id')
# very basic and uninteresting plot
ggplot(sf_ggplot_all,aes(x=long,y=lat, group = group)) + 
  geom_polygon(aes(fill =pop2000)) + 
  coord_map()

enter image description here