2
votes

Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example:

SP         TM      AR
B  1-jan-03 07:22  1
F  1-jan-03 09:22  4
A  1-jan-03 09:22  1
C  1-jan-03 08:17  3
D  1-jan-03 09:20  1
E  1-jan-03 06:55  4
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
B  3-jan-03 09:15  1
A  3-jan-03 10:30  5
F  3-jan-03 07:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4

The desired result for this dummy table would be:

SP         TM      AR
A  1-jan-03 09:22  1
D  1-jan-03 09:20  1
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
A  3-jan-03 10:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4 

Note: Species A appears repeatedly throughout the dataset in any given area ranging from 1-81 ant any given time. On a previous set of post, I broke this question in two, so I could learn how to integrate the codes, but my specifications for the problem were flawed. Many thanks to the users Thelatemail and Jason who provided helpful answers. Subsetting based on co-occurrence within a time window Subsetting neighboring fileds The feedback was:

with(dat,dat[
(
SP=="A" |
Area %in% c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1)
) & 
apply(
sapply(Time[SP=="A"],
function(x) abs(difftime(Time,x,units="mins"))<=30 ),1,any
) 
,]
)

Which worked partially, however, it only subsets within the time window, not by area. I think it is caused by issues with POSIXct and using the subset commands, since different times are included in a time window. Would another apply function be necessary for separating that area interval? Any help is much appreciated

1
It would be great if you could insert links to your previous questions. Thanks. - Henrik
Remove the line SP=='A' | and you should have what you need. See if you can say, in words, what each line is doing and follow the logic in the subsetting. It will be a good exercise and will help with your understanding of R (see my edit) - Justin
You could rewrite c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1) as Area[SP=='A'] + c(-1, 0, 1) (but I don't think %in% works the way you expect). And naming intermediate results would make your code much easier to understand - hadley
@Justin, do you get F 2003-01-01 09:22:00 4 included when you run your code? It is close in space, but at the wrong time (A is in Area 5, but on the 3rd), or vice versa: close in time, but on the wrong site (A is seen same time, but is then in Area 1). Could it be the any in apply that allows for this, i.e. 'any' true time is resulting in an aggregated TRUE, regardless of space for the single TRUE? I apologize in advance if I have messed things up. - Henrik
I haven't actually run any of the code... - Justin

1 Answers

1
votes

A possible solution very much inspired by @thelatemail's and @Justin's previous, nice answers, but this accounts for time in the boolean expression for space (see my comments to this question).

Using sapply, we 'loop' over each time of registration of Species A (time[SP == "A"]), and create a boolean matrix mm with one column per registration of A. Each row represents a test for space and time for each registration against a given registration of A.

mm <- with(dat,
           sapply(time[SP == "A"], function(x)
             abs(AR - AR[SP == "A" & time == x]) <= 1 &
                    abs(difftime(time, x, units = "mins")) <= 30))

# select rows from data where at least one column in mm is TRUE    
dat[rowSums(mm) > 0, ]

# SP                time AR
# 3   A 2003-01-01 09:22:00  1
# 5   D 2003-01-01 09:20:00  1
# 7   D 2003-01-01 09:03:00  1
# 8   E 2003-01-01 09:12:00  2
# 9   F 2003-01-01 09:45:00  1
# 11  A 2003-01-03 10:30:00  5
# 13  F 2003-01-03 10:20:00  6
# 14  D 2003-01-03 10:05:00  4