2
votes

I am attempting to:

  1. calculate the difference in call duration between police units responding to the same call
  2. identify the longest duration among a group of calls with the same call ID
  3. arrange in descending order of duration

My steps to do so are found in the code snippets below.

First, I arrange in descending order by ID (multiple calls with the same ID) and then arrange within that by the call duration in hours (descending).

Then, I make my data.frame into a data.table.

Then, apply sequences (descending) by duration.

call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]

This is where the problem occurs: I get an error that says

"Error in [.data.table(call_duration_diff_by_unit, , :=(duration_seq, : Supplied 2 items to be assigned to group 1 of size 1 in column 'duration_seq'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code."

The only explanation for this error I have found was specific to a unique package that I am not using. I understand the concept of "recycling" now, but not sure how it applies to this scenario... there aren't two vectors with different lengths.

Could R be reading the by = c("ID") part incorrectly as a second input?

call_duration_diff_by_unit <- cad_cfs_data %>% 
  arrange(desc(ID), desc(CALL_DURATION_HOURS))

call_duration_diff_by_unit <- 
  data.table(call_duration_diff_by_unit)

call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]

I expected it to make a unique numeric ID (assigning 1 to the longest duration) for each group of unique call IDs. Instead, I get the error and it doesn't save the variable "duration_seq" for use later down in the code.

1
The two lengths which need to match are the length of the vectors (columns) in the given ID group (i.e. the number of rows for which ID is equal to the given value), and the output of the RHS of :=. - IceCreamToucan
I think you actually want DT[, .(duration_seq = seq(...)), by = ...] but I'm not sure from the description. The error message is pretty clear: you assign a vector into the data.table that doesn't match its number of rows. - Roland
Thanks @IceCreamToucan and @Roland! I guess I don't see how I'm assigning a vector to the data.table that doesn't match the number of rows. I'm using a function (seq) that should automatically create a numeric sequence that exactly matches the number of rows (and restarts at 1 each time a new ID starts). Can you explain which vector in the code it could be saying doesn't match? - Alice Kassinger
Based on what you just said, I think you should use seq_along instead of seq - IceCreamToucan

1 Answers

2
votes

I think what you are looking for can be done more easily with the special symbols in data.table. The one with .N is very helpful because it simply counts the number of rows in the data.table and if you specify a group it will count the number of rows within that group. So the code would look like this:

call_duration_diff_by_unit[, duration_seq := 1:.N, by = c("ID")]

Is this what you are going for?