0
votes

I have a column of dates from which I am trying to create a list of years for each row. For example, this is a few rows of my data:

1997-2001
1994
2007-2009; 2013-2015; 2016
2007-2008; 2014

For example, for the first row I want a list containing: 1997, 1998, 1999, 2000 and 2001. For the second row I want a list containing just 1994. For the 3rd row I want a list containing: 2007, 2008, 2009, 2013, 2014, 2015, and 2016. and so on like this. Is there a way to do this?

2
what data type does your column hold? string or what? can you possibly dput your column here?989
Here are some solutions, though I don't know which would be best recommended: r.789695.n4.nabble.com/…leekaiinthesky

2 Answers

3
votes

It's ugly, but it gets the job done:

lapply(strsplit(df$date,'\\s*;\\s*'),function(x) unlist(lapply(strsplit(x,'-'),function(y) { z <- as.integer(y); if (length(z)==1L) z else z[1L]:z[2L]; })));
## [[1]]
## [1] 1997 1998 1999 2000 2001
##
## [[2]]
## [1] 1994
##
## [[3]]
## [1] 2007 2008 2009 2013 2014 2015 2016
##
## [[4]]
## [1] 2007 2008 2014
##

Data

df <- data.frame(date=c('1997-2001','1994','2007-2009; 2013-2015; 2016','2007-2008; 2014'),
stringsAsFactors=F);

Note: If your input vector is a factor, as opposed to a character vector, then you'll have to wrap it in as.character() before passing it to the initial strsplit() call.

1
votes

bgoldst's answer resolved the problem but here's another way you could do it.

You can use gsub to convert your semicolons to commas and dashes to colons like so (where df is the data frame and x is the column containing the data):

df$x<-gsub("-",":",df$x)
df$x<-gsub(";",",",df$x)

which would give you:

1997:2001
1994
2007:2009, 2013-2015, 2016
2007:2008, 2014

Then use a for-loop to evaluate all those expressions:

years<-list()
for(i in 1:nrow(df)){
     years[i]<-list(eval(parse(text=paste("c(",df$x[i],")"))))
}

As above, if your input is a vector of factors rather than characters, you will need to replace df$x[i] with as.character(df$x[i]).