I'm trying to use tapply to get the average weight of turtles caught per day. tapply returns NA for every date value (class:POSIXct) for every approach I've tried
I've tried: calling tapply on the weight column and date column -> arguments are different lengths error
removing records with NA values in the weight column of my dataframe then calling tapply on the weight column and date column. -> arguments are different lengths error
calling tapply on the na.omit call of the weight column and the date column indexed by the na.omit call of the weight column -> arguments are different lengths error
calling tapply on the na.omit call of the weight column and the factor-coerced date column indexed by the na.omit call of the weight column -> returns NA for every level of the factor-coerced date column
head of original dataframe
> head(stinkpotData)
Date DateCt Species Turtle.ID ID.Code Location Recapture Weight.g C.Length.mm
1 6/1/2001 2001-06-01 Stinkpot 1 1 keck lab dock site 0 190 95
2 6/1/2001 2001-06-01 Stinkpot 2 10 Right of dock 0 200 100
3 8/9/2001 2001-08-09 Stinkpot 2 10 #4 Deep Right of lab 1 175 104
4 8/27/2001 2001-08-27 Stinkpot 2 10 #4 Deep Right of lab 1 175 105
5 6/1/2001 2001-06-01 Stinkpot 3 11 Right of dock 0 200 109
6 10/3/2001 2001-10-03 Stinkpot 3 11 #4 Deep Right of lab 1 205 109
C.Width.mm Female.1.Male.2 Rotation Marks
1 70 <NA> <NA> <NA>
2 72 <NA> <NA> <NA>
3 72 2 <NA> Male
4 71 2 <NA> male, 1 small leech Right front leg
5 74 <NA> <NA> algae covered
6 76 2 <NA> male, 1 lg & 1 sm leech right rear leg
head of the original dataframe with records with NA weights omitted (checked that NAs were actually omitted)
> head(noNAWeightsDf)
Date DateCt Species Turtle.ID ID.Code Location Recapture Weight.g C.Length.mm
1 6/1/2001 2001-06-01 Stinkpot 1 1 keck lab dock site 0 190 95
2 6/1/2001 2001-06-01 Stinkpot 2 10 Right of dock 0 200 100
3 8/9/2001 2001-08-09 Stinkpot 2 10 #4 Deep Right of lab 1 175 104
4 8/27/2001 2001-08-27 Stinkpot 2 10 #4 Deep Right of lab 1 175 105
5 6/1/2001 2001-06-01 Stinkpot 3 11 Right of dock 0 200 109
6 10/3/2001 2001-10-03 Stinkpot 3 11 #4 Deep Right of lab 1 205 109
C.Width.mm Female.1.Male.2 Rotation Marks
1 70 <NA> <NA> <NA>
2 72 <NA> <NA> <NA>
3 72 2 <NA> Male
4 71 2 <NA> male, 1 small leech Right front leg
5 74 <NA> <NA> algae covered
6 76 2 <NA> male, 1 lg & 1 sm leech right rear leg
calling tapply on the columns in the original dataframe
> tapply(stinkpotData$Weight.g, stinkpotData$DateCt, FUN = mean)
Error in tapply(stinkpotData$Weight.g, stinkpotData$DateCt, FUN = mean) :
arguments must have same length
calling tapply on the columns in the noNA dataframe
>tapply(noNAWeightsDf$Weight.g, noNAWeightsDf$DateCt, FUN = mean)
Error in tapply(noNAWeightsDf$Weight.g, noNAWeightsDf$DateCt, FUN = mean) :
arguments must have same length
calling tapply on the na.omit call of the weight column and the date column
> tapply(na.omit(stinkpotData$Weight.g), stinkpotData$DateCt[!is.na(stinkpotData$Weight.g)], FUN = mean)
Error in tapply(na.omit(stinkpotData$Weight.g), stinkpotData$DateCt[!is.na(stinkpotData$Weight.g)], :
arguments must have same length
calling tapply on the na.omit call of the weight column and the factor-
coerced date column indexed by the na.omit call of the weight column
tapply(na.omit(stinkpotData$Weight.g), as.factor(stinkpotData$DateCt[!is.na(stinkpotData$Weight.g)]), FUN = mean)
2001-01-07 2001-06-01 2001-06-04 2001-06-06 2001-06-07 2001-06-11 2001-06-12 2001-06-15 2001-06-19
NA NA NA NA NA NA NA NA NA
2001-06-20 2001-06-25 2001-06-27 2001-06-29 2001-07-03 2001-07-09 2001-07-11 2001-07-13 2001-07-16
NA NA NA NA NA NA NA NA NA ................etc
There were 50 or more warnings (use warnings() to see the first 50)
calling warnings() after the above error gives:
> warnings()
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
.......................etc
EDIT:
split(na.omit(stinkpotData$Weight.g), as.factor(stinkpotData$DateCt[!is.na(stinkpotData$Weight.g)]))
Gave a list of the individual weights of turtles on each date. Verified that it was of mode list. Its elements were of mode numeric, class factor. lapply on the split list with FUN=mean still returned NA for each level of date. Can get means of individual elements of the split list of coerced to vectors but not quite what I need.
EDIT 2: Finally got the result I wanted, but the steps to get there seem over-complicated and I still don't understand why using tapply won't work. I had to call split as in the first edit, then coerce each element of the resultant list to class numeric (originally returned as class factor) with lapply, then call mean on every element with lapply:
weightsDateList = split(na.omit(stinkpotData$Weight.g), as.factor(stinkpotData$DateCt[!is.na(stinkpotData$Weight.g)]))
weightsDateList = lapply(weightsDateList, FUN = as.numeric)
weightsDateList = lapply(weightsDateList, FUN = mean)
EDIT 3: I realize now that the result I get from the solution in EDIT 2 and calling tapply( severely underestimates the means, so still lost.
EDIT 4: Realized that converting weight to class numeric returned the number of the level of the weight from when it was a factor, which explains the severe underestimation of means.
I want the tapply call to return every date with turtle weight(s) and its respective average weight of turtles caught on those dates. Thanks and I apologize if I'm missing something easy.
aggregate(Weight.g ~ DateCt, data = stinkpotData, mean)
– heds1data.table
anddplyr
both offer much easier grouping facilities. I'm quite partial to data.table but I recommend checking out both & seeing what suits you – MichaelChiricotapply
but I'm quite partial tobase
R. Many of its methods offer grouping facilities:tapply
,by
,split
,ave
,aggregate
to name a few. I recommend checking these out & seeing what suits you. – Parfaitdput
a few rows of your actual original dataframe that reproduces this error. Did you check NAs inDateCt
? – Parfait