3
votes
> #transforming length of time
> transLOT<-log(LengthofTimemin)
> 
> #checking for outliers
> fit<-lm(transLOT~DielEnd+TideEnd+TideStart+Moonphase+TideStart*Moonphase, data=resdata)
> outlierTest(fit)
    rstudent unadjusted p-value Bonferonni p
295 4.445284         1.1025e-05    0.0052808
> 
> #getting rid of the outlier data in row 295
> rdata<-resdata[-295, ]
> print(rdata[294:296,5:10])
# A tibble: 3 × 6
  DepartureDate       DepartureTime        LengthofTime LengthofTimemin EventLengthCategories
         <dttm>              <dttm>              <dttm>           <dbl>                 <chr>
1    2016-09-19 1899-12-30 23:46:46 1899-12-30 00:05:49        5.816667                  5-15
2    2016-09-20 1899-12-30 01:55:28 1899-12-30 00:01:20        1.333333                    <5
3    2016-09-20 1899-12-30 04:07:28 1899-12-30 00:01:21        1.350000                    <5
> newfit<-lm(transLOT~DielEnd+TideEnd+TideStart+Moonphase+TideStart*Moonphase, na.action=na.exclude, data=rdata)
Error in model.frame.default(formula = transLOT ~ DielEnd + TideEnd +  : 
  variable lengths differ (found for 'DielEnd')
> #now all of a sudden the variable lengths differ

I understand that the problem occurs with the removal of the row of data but I assumed that na.exclude would account for it. After thoroughly searching, I am unable to determine why this error is occurring.

1

1 Answers

4
votes

This happens because in your first step you created a separate variable outside of your data frame, transLOT<-log(LengthofTimemin). When you remove a row from the data, transLOT is unchanged. Even worse than differing lengths, your data doesn't line up any more - if the different lengths were ignored, your rows of data would be "off by one" compared to the response after the row you removed.

The simple solution is to create your transLOT variable in the data frame. Then, whenever you do things to the data (like remove rows), the same thing is done to transLOT.

resdata$transLOT <- log(resdata$LengthofTimemin)

Note that I also use the resdata$LengthofTimemin rather than LengthofTimemin which you seem to have in your workspace. Did you use attach() at some point? You shouldn't use attach for exactly this reason. Keep variables in the data frame!