Delete next row after every count/list of day; in R

Question

I am finding it difficult to wrap my head around this: In the dataframe below I want to delete the next row after every count/list of, say, Thursday, same for Friday and so on. I would prefer not using a loop since the data is big.

mydata<- read.table(header=TRUE, text=" 

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 -0.0278569545 Friday 0 8 2001-01-05 0.0056338177 Friday 0 9 2001-01-05 0.0037383221 Friday 0 10 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 3.511856e-03 Monday 0 12 2002-02-25 -3.511856e-03 Monday 0 13 2002-02-25 -4.398505e-04 Monday 0 14 2002-02-25 -2.643173e-03 Monday 0 15 2002-02-25 4.401416e-03 Monday 0 16 2002-02-26 9.189066e-03 Tuesday 0 17 2002-02-26 -8.243166e-04 Tuesday 0 18 2002-02-26 9.533751e-03 Tuesday 0 19 2002-02-26 4.527688e-03 Tuesday 0 20 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 8.717651e-03 Friday 0 101 2002-03-01 1.990115e-02 Friday 0 102 2002-03-01 -1.344387e-03 Friday 0 103 2002-03-01 -1.445373e-02 Friday 0 ") The output I need should be like this:

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 0.0056338177 Friday 0 8 2001-01-05 0.0037383221 Friday 0 9 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 -3.511856e-03 Monday 0 12 2002-02-25 -4.398505e-04 Monday 0 13 2002-02-25 -2.643173e-03 Monday 0 14 2002-02-25 4.401416e-03 Monday 0 15 2002-02-26 -8.243166e-04 Tuesday 0 16 2002-02-26 9.533751e-03 Tuesday 0 17 2002-02-26 4.527688e-03 Tuesday 0 18 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 1.990115e-02 Friday 0 101 2002-03-01 -1.344387e-03 Friday 0 102 2002-03-01 -1.445373e-02 Friday 0

Thank you in advance. Sorry if I have wrongfully asked the question. This is my first time of asking a question here; I have tried to follow the rules as best as I can; especially how the table should appear.

The codes I have tried, I believe, are really far from the answer I desire. Just counting and subsetting; below. table(ret.df$Weekday=="Thursday") r1<-ret.df[!(ret.df$Weekday=="Thursday"),]

I hope my question less vague now.

A follow up from the previous answer:

removing rows based on condition in ret_1ON

ret_1ON<- ret.df[duplicated(ret.df$Date)|1:nrow(ret.df)==1,]

dim(ret_1ON)

[1] 98734 4

head(ret_1ON)

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1

tail(ret_1ON)
        Date      AAPL.ret  Weekday Thursday

99994 2006-01-19 0.0013771520 Thursday 1 99995 2006-01-19 -0.0007321584 Thursday 1 99996 2006-01-19 -0.0029026141 Thursday 1 99997 2006-01-19 -0.0002511616 Thursday 1 99998 2006-01-19 0.0011297309 Thursday 1 99999 2006-01-19 -0.0002509410 Thursday 1

I'm wandering why the last item in tail is not 98734 but rather 99999?

dim(ret.df)

[1] 99999 4 which means the condition was effected, though.

What is your desired output from this data? Is it just "remove row 79"? — r2evans
@r2evans removing row 79 is only the first thing I need, then the same process repeats for all the days in the data. I have only shown a part of the data. — OJP
Yes, so in this data, the only difference is row 79, right? (You don't have to show this much data to accomplish the question, btw. You might grab 5-8 rows from each day, showing multiple days, so that the answers you receive may be more flexible/correct.) — r2evans
Are you intending to remove the first row of each day? Your description is a little vague. Could you please modify your data so that 95% of it isn't the same day? Also, please indicate your desired output and/or which rows (numerically) you think should be removed. Lastly, what code have you tried so far? — r2evans

akrun akrun · Accepted Answer · 2017-04-14T03:27:00

We can do this with data.table

library(data.table)
setDT(mydata)[, .SD[(seq_len(.N) != 1)], Date]

if we wanted to keep the first row of the dataset

setDT(mydata)[, .SD[(seq_len(.N) != 1)|seq_len(.N)==.I[1]], Date]

Or with dplyr

library(dplyr)
mydata %>%
      group_by(Date) %>%
      filter(row_number() != 1)

Or using base R, if the 'Date' column is ordered

mydata[duplicated(mydata$Date),]

or with including the first row

mydata[duplicated(mydata$Date)|1:nrow(mydata)==1,]

Delete next row after every count/list of day; in R

removing rows based on condition in ret_1ON

1 Answers