0
votes

I'm trying to sort a dataframe by countries and legislative elections - in one step that is replicable for multiple different political party families.

What I did so far was to sort the main dataset into party family (parfam == '10'), "recent" elections (date > '201000'), and excluding countries with no relevant data (! country %in% nodata, nodata being a list of values I'd already created):

eco <- filter(CMPdataset, parfam == '10' & date > '201000' & ! country %in% nodata)

Due to some countries having multiple elections coded into the overarching dataset CMPdataset in the time-period after 2010, I went through the data manually and eliminated all the unnecessary ones by hand using:

eco <- eco[-c(1,8,10,11,13,14,18,20,21,22,23,27,28,31,32,34,35,37), ]

As you can see, this can be quite tedious for larger dataframes, though. So I thought I'd combine the formulae I know and came up with the following (edate is a variable with the specific election date in the format YYYY-MM-DD, I made a list of all the specific elections I include under the name included_elections):

eco2 <- filter(CMPdataset, parfam == '10' & ! country %in% nodata & edate %in% included_elections)

However, this yields no results, and I have no clue why! I could just stick to doing it all by hand, but it's quite tedious and not easily replicable, which is why I'd really prefer a solution like this. Any help would be greatly appreciated!

2
can u show a small reproducible example and expected output base don that - akrun
Can you provide, at a minimum, dput(head(CMPdataset$edate)) and dput(head(included_elections))? The dates might be encoded differently. - Frank
@Frank > dput(head(CMPdataset$edate)) structure(c(-9237, -9237, -9237, -9237, -9237, -7774), class = "Date") > dput(head(included_elections2)) c("2014-09-14", "2013-09-09", "2011-09-15", "2011-04-17", "2013-04-27", "2010-06-13") - luca_s
The reason for your immediate error is that you need to convert included_elections to date format, included_elections <- as.Date(included_elections). But @iod's approach is a better long-term solution. - Frank
@Frank thanks for the tip, this worked out well! Doesn't look like I can accept a comment as a "correct answer", though :( iod's approach would work, but in some cases, i need the second-to-last election, which is why such a general approach unfortunately doesn't work in my specific case. - luca_s

2 Answers

0
votes

Thanks for providing the dput output. The reason for your immediate error is that you need to convert included_elections to date format:

included_elections <- as.Date(included_elections)

That said, something more systematic that incorporates the conditions you want (for example, when you want the last election and when you want the second-last), along the lines of @iod's approach, is a better long-term solution.

0
votes
CMPdataset %>% group_by(country) %>% 
filter(parfam==`10`, !country %in% nodata, date==max(edate), date>201000)

date==max(date) will filter the data frame so that within each group (i.e., country), only the row for the latest elections is kept. (also, no need for & between the conditions, they're all joined by & by default).