(Apologies for bad title, English is not my native language and I couldn't think of a good way to summarise the question.)
I have a dataset of various US county variables and a shapefile of US counties. I've merged the two, no problem, and now I'm trying to illustrate a variable across the counties of a particular state. But when I try to limit my data to the counties in a particular state, it selects not only the counties in that particular state, but also all counties in other states that have a namesake in said state. I just don't understand why it does that, from what I can tell it really should select only the counties in the specified state.
I'm using the sf, tmap, tmaptools, dplyr, ggplot, and leaflet packages. Here's the code I'm using:
mydata <- readr::read_csv("county_facts.csv")
mymap <- st_read("cb_2014_us_county_500k.shp")
map_and_data <- inner_join(mymap, mydata, by = c("NAME" = "area_name"))
tm_shape(map_and_data[map_and_data$state_abbreviation == "SC",])+
tm_polygons("AGE135214", id = "NAME", palette = "Greens")
(the column for county names is "NAME" in the shapefiles and "area_name" in the data set)
Here AGE135214 is the variable I'm plotting, and NAME is the county names, and in this example I'm trying to plot it for South Carolina. I attempted a workaround by changing the merging of the data and shapefiles:
map_and_data2 <- inner_join(mymap, mydata[mydata$state_abbreviation=="SC",], by = c("NAME" = "area_name"))
But this only resulted in the new merged data frame including the erroneous namesakes.
I'm new to programming so apologies if there is a super obvious solution. Any help is greatly appreciated!
The data and shapefiles is from https://www.kaggle.com/benhamner/2016-us-election, if that helps.