You just need %in%
to compare a vector
of length
> 1 i.e.
subset(taxes, State %in% c('AL', 'MO', 'TX'))
# State amount
#4 MO 14143
#27 TX 11517
#30 AL 14465
Or using data.table
, we convert the 'data.frame' to 'data.table' (setDT(taxes
), set the key
column as 'State' and extract the rows that have 'MO', 'TX', 'AL' in the 'State'.
library(data.table)
setDT(taxes, key='State')[c('MO', 'TX', 'AL')]
# State amount
#1: MO 14143
#2: TX 11517
#3: AL 14465
To understand why your code didn't work, let's check the logical vector output.
with(taxes, State==c('AL', 'MO', 'TX'))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [49] FALSE FALSE
Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length
is not a multiple of shorter object length
None of the elements were TRUE
for this example. The way it compares is based on the recycling. The first 3 elements of 'State' is compared with the vector 'AL', 'MO', and 'TX' in that order
taxes$State[1:3] == c('AL', 'MO', 'TX')
#[1] FALSE FALSE FALSE
Here, we compare element-by-element between corresponding values of both the vectors and as
taxes$State[1:3]
#[1] AK AL AR
is not matching the 'AL', 'MO', and 'TX' at the corresponding positions, it returns 'FALSE'.
The same way, it is compared up to the length
of 'State' column, i.e. the next comparison is
taxes$State[4:6] == c('AL', 'MO', 'TX')
#[1] FALSE FALSE FALSE
Here also all are FALSE
as the corresponding 'State' elements were 'AZ', 'CA', and 'CO'. We get a warning at the end because
nrow(taxes)
#[1] 50
50%%3!=0
If the nrow
of the dataset is 51
, the warning will not be there, but still as comparison is based on position, we may not the result as intended.
data
set.seed(24)
taxes <- data.frame(State=sample(state.abb),
amount=sample(400:20000, 50, replace=TRUE), stringsAsFactors=FALSE)