3
votes

I can use the polygon function in R to specify on a figure which days I would like to exclude in my data:

require(gamair)
data(cairo)
data1 <- within(cairo, Date <- as.Date(paste(year, month, day.of.month, sep = "-")))
data1 <- data1[,c('Date','temp')]
plot(data1)
dd <- data.frame(year = seq(1995,2005),
                 istart = c(341,355,356,370,371,380,360,400,378,360,360),
                 iend = c(450,400,380,390,420,410,425,450,421,430,400))

dates <- paste(dd[,1], '-01', '-01', sep = '')
istart <- as.Date(dates) + dd[,2]
iend <- as.Date(dates) + dd[,3]

for (i in 1:length(iend)){
  polygon(c(istart[i],iend[i],iend[i],istart[i]),c(0,0,110,110),
          col=rgb(1, 0, 0,0.5), border=NA)
}

enter image description here

I now wonder is it possible to remove these highlighted times from data_1 to generate a new time series data_2 which does not include these highlighted values?

I can remove the individual days specified in istart and iend but can't seem to remove the range of values between these dates. How can this be done?

2

2 Answers

1
votes

You can try the following code:

ret <- rep(FALSE, NROW(data1))
for (i in seq_along(istart)) {
    ret <- ret | ((data1$Date >= istart[i]) & (data1$Date <= iend[i]))
}
data2 <- data1[!ret, ]
plot(data2, pch = ".")
for (i in 1:length(iend)){
  polygon(c(istart[i],iend[i],iend[i],istart[i]),c(0,0,110,110),
          col=rgb(1, 0, 0,0.5), border=NA)
}

So for each value of istart and iend you create a logical value vector of all values which are within one of these intervals. Then all you have to do, is to select all rows of data1 which are not within these intervals.

(I changed the plotting symbol to . in order to make it more visible that all values are indeed filtered out)

enter image description here

0
votes

Using mapply you can define a vector of dates, which you'd like to exclude from your data.

exclude = unlist(mapply(function(istart, iend) {seq(istart, iend, "days")}, istart, iend))
data1 = data1[!(data1$Date %in% exclude), ]

additional, there's a shorter way to define your istart and iend vectors:

istart = seq(as.Date("1995-01-01"), as.Date("2005-01-01"), "years") + c(341,355,356,370,371,380,360,400,378,360,360)
iend = seq(as.Date("1995-01-01"), as.Date("2005-01-01"), "years") + c(450,400,380,390,420,410,425,450,421,430,400))