2
votes

I was wondering if you could subset a dataframe like the one below based on the values of one of the columns (such as ids), you could use the equals operator like in df2 however, if you want to subset based on a list like ids I cannot find an operator to subset the dataframe based on a list as the .in operator does not seem to work with dataframes is there another operator I could use?

  df = DataFrame(ids = [1, 1000, 10000, 100000,1,2,3,4], B = [1,2,3,4,123,6,2,7], D = ["N", "M", "I", "J","hi","CE", "M", "S"])
  df2= df[df[:pmid] .== 1000, :]
  ids = [2,3, 10000]
  df3= df[df[:pmid] .in ids,:]

As of right now df3 gives me a bounds error.

Also I am running this on Julia 0.6.4

1

1 Answers

3
votes

I guess there's typo in your first line ids= should be pmid=, I guess, since you're filtering using that name later.

As for df3, the correct syntax should be (I tried on 1.0.2):

df3= df[in.(df[:pmid], [ids]),:]

note added [] around ids as that should be vector of vectors.

I'd like to point you to DataFramesMeta.jl package, which provides much clearer syntax:

using DataFramesMeta
@where df (in.(:pmid, [ids]))

There was also quite an interesting discussion on discourse.julialang.org regarding syntax for filtering by list, including performance tips.