2
votes

I'm trying to subset a DataFrame in Julia as follows:

df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a == 2, :]

I'd expect to get back just the second row, but instead I get an error:

ERROR: BoundsError: attempt to access "attempt to access a data frame with 3 rows at index false"

What does this error mean and how do I subset the DataFrame?

2
Why do you keep asking and answering your own questions?Oscar Smith
To make it easier for other people to google errors to common mistakes and find answers more quickly. My understanding is that this is encouraged.J. Blauvelt

2 Answers

2
votes

Just to mention other options note that you can use the filter function here:

julia> filter(row -> row.a == 2, df)
1×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 2     │ y      │

or

julia> df[filter(==(2), df.a), :]
1×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 2     │ y      │
1
votes

Fortunately, you only need to add one character: .. The . character enables broadcasting on any Julia function, even ones like ==. Therefore, your code would be as follows:

df = DataFrame(a=[1,2,3], b=["x", "y", "z"])
df2 = df[df.a .== 2, :]

Without the broadcast, the clause df.a == 2 returns false because it's literally comparing the Array [1,2,3], as a whole unit, to the scalar value of 2. An Array of shape (3,) will never be equal to a scalar value of 2, without broadcasting, because the sizes are different. Therefore, that clause just returns a single false.

The error you're getting tells you that you're trying to access the DataFrame at index false, which is not a valid index for a DataFrame with 3 rows. By broadcasting with ., you're now creating a Bool Array of shape (3,), which is a valid way to index a DataFrame with 3 rows.

For more on broadcasting, see the official Julia documentation here.