1
votes

I am finding it difficult to wrap my head around this: In the dataframe below I want to delete the next row after every count/list of, say, Thursday, same for Friday and so on. I would prefer not using a loop since the data is big.

mydata<- read.table(header=TRUE, text=" 

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 -0.0278569545 Friday 0 8 2001-01-05 0.0056338177 Friday 0 9 2001-01-05 0.0037383221 Friday 0 10 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 3.511856e-03 Monday 0 12 2002-02-25 -3.511856e-03 Monday 0 13 2002-02-25 -4.398505e-04 Monday 0 14 2002-02-25 -2.643173e-03 Monday 0 15 2002-02-25 4.401416e-03 Monday 0 16 2002-02-26 9.189066e-03 Tuesday 0 17 2002-02-26 -8.243166e-04 Tuesday 0 18 2002-02-26 9.533751e-03 Tuesday 0 19 2002-02-26 4.527688e-03 Tuesday 0 20 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 8.717651e-03 Friday 0 101 2002-03-01 1.990115e-02 Friday 0 102 2002-03-01 -1.344387e-03 Friday 0 103 2002-03-01 -1.445373e-02 Friday 0 ") The output I need should be like this:

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1 7 2001-01-05 0.0056338177 Friday 0 8 2001-01-05 0.0037383221 Friday 0 9 2001-01-05 0.0000000000 Friday 0 11 2002-02-25 -3.511856e-03 Monday 0 12 2002-02-25 -4.398505e-04 Monday 0 13 2002-02-25 -2.643173e-03 Monday 0 14 2002-02-25 4.401416e-03 Monday 0 15 2002-02-26 -8.243166e-04 Tuesday 0 16 2002-02-26 9.533751e-03 Tuesday 0 17 2002-02-26 4.527688e-03 Tuesday 0 18 2002-02-26 4.105933e-04 Tuesday 0 ............. 100 2002-03-01 1.990115e-02 Friday 0 101 2002-03-01 -1.344387e-03 Friday 0 102 2002-03-01 -1.445373e-02 Friday 0

Thank you in advance. Sorry if I have wrongfully asked the question. This is my first time of asking a question here; I have tried to follow the rules as best as I can; especially how the table should appear.

The codes I have tried, I believe, are really far from the answer I desire. Just counting and subsetting; below. table(ret.df$Weekday=="Thursday") r1<-ret.df[!(ret.df$Weekday=="Thursday"),]

I hope my question less vague now.

A follow up from the previous answer:

removing rows based on condition in ret_1ON

ret_1ON<- ret.df[duplicated(ret.df$Date)|1:nrow(ret.df)==1,]

dim(ret_1ON)

[1] 98734 4

head(ret_1ON)

    Date     AAPL.ret  Weekday Thursday

1 2001-01-04 0.000000000 Thursday 1 2 2001-01-04 0.000000000 Thursday 1 3 2001-01-04 -0.025317808 Thursday 1 4 2001-01-04 0.014545711 Thursday 1 5 2001-01-04 0.007194276 Thursday 1 6 2001-01-04 -0.007194276 Thursday 1

tail(ret_1ON)
        Date      AAPL.ret  Weekday Thursday

99994 2006-01-19 0.0013771520 Thursday 1 99995 2006-01-19 -0.0007321584 Thursday 1 99996 2006-01-19 -0.0029026141 Thursday 1 99997 2006-01-19 -0.0002511616 Thursday 1 99998 2006-01-19 0.0011297309 Thursday 1 99999 2006-01-19 -0.0002509410 Thursday 1

I'm wandering why the last item in tail is not 98734 but rather 99999?

dim(ret.df)

[1] 99999 4 which means the condition was effected, though.

1
What is your desired output from this data? Is it just "remove row 79"?r2evans
@r2evans removing row 79 is only the first thing I need, then the same process repeats for all the days in the data. I have only shown a part of the data.OJP
Yes, so in this data, the only difference is row 79, right? (You don't have to show this much data to accomplish the question, btw. You might grab 5-8 rows from each day, showing multiple days, so that the answers you receive may be more flexible/correct.)r2evans
Are you intending to remove the first row of each day? Your description is a little vague. Could you please modify your data so that 95% of it isn't the same day? Also, please indicate your desired output and/or which rows (numerically) you think should be removed. Lastly, what code have you tried so far?r2evans
@r2evans Thank you. Yes, you are right.OJP

1 Answers

0
votes

We can do this with data.table

library(data.table)
setDT(mydata)[, .SD[(seq_len(.N) != 1)], Date]

if we wanted to keep the first row of the dataset

setDT(mydata)[, .SD[(seq_len(.N) != 1)|seq_len(.N)==.I[1]], Date]

Or with dplyr

library(dplyr)
mydata %>%
      group_by(Date) %>%
      filter(row_number() != 1)

Or using base R, if the 'Date' column is ordered

mydata[duplicated(mydata$Date),]

or with including the first row

mydata[duplicated(mydata$Date)|1:nrow(mydata)==1,]