2
votes

I am looking to subset a data table recursively, by changing the index of the column z AND at the same time filter rows based on some %in% based vector.

dt <- setDT(copy(diamonds))
dt <- setDT(data.frame(lapply(dt, as.character), stringsAsFactors=FALSE))
z=4
subset_by <- unique(dt[,z])[1:2]
### obviously does not work
###dt1<-dt[ z %in% subset_by]

I am looking for the most memory-efficient operation to do this and I am sure there is a way without using colnames, but I just cannot find it. I looked at a lot of posts, with this beign the most relevant

1
Your dataset is not longer data.table after converting to data.frame in the 2nd line. You need dt[z %in% subset_by,] - akrun
Sorry, fixed it. That's what i get when i try to create a reproducible example! - J. Doe.
After converting to data.table, the way to subset columns should be either unique(dt[[z]])[1:2] - akrun
Thank you, I am not looking to simply select the first 2 (or any number of) rows . I want to subset based on some values using the %in operator - J. Doe.
After that you can use i1 <- dt[, .I[.SD[[1]] %in% subset_by],.SDcols = z] ; dt[i1] - akrun

1 Answers

2
votes

If we are subsetting based on the index or names, we can specify it in .SDcols

i1 <- dt[, .I[.SD[[1]] %chin% subset_by], .SDcols = z]
dt[i1]

Note that subsetting a column in data.table/tbl_df/data_frame would be either [[ or $

subset_by <- unique(dt[[z]])[1:2]