The pandas cut()
documentation states that: "Out of bounds values will be NA in the resulting Categorical object." This makes it difficult when the upper bound is not necessarily clear or important. For example:
cut (weight, bins=[10,50,100,200])
Will produce the bins:
[(10, 50] < (50, 100] < (100, 200]]
So cut (250, bins=[10,50,100,200])
will produce a NaN
, as will cut (5, bins=[10,50,100,200])
. What I'm trying to do is produce something like > 200
for the first example and < 10
for the second.
I realize I could do cut (weight, bins=[float("inf"),10,50,100,200,float("inf")])
or the equivalent, but the report style I am following doesn't allow things like (200, inf]
. I realize too I could actually specify custom labels via the labels
parameter on cut()
, but that means remembering to adjust them every time I adjust bins
, which could be often.
Have I exhausted all the possibilities, or is there something in cut()
or elsewhere in pandas
that would help me do this? I'm thinking about writing a wrapper function for cut()
that would automatically generate the labels in desired format from the bins, but I wanted to check here first.
the_data.max()+1
or something, but I think you'll have to set the label manually if you want that specific format. – BrenBarn