How can I reduce the amount of buckets created by a Crossfilter group function?

Question

By default, when creating a Crossfilter group on a Crossfilter dimension, the size of the group will be equal to the number of unique values in the dimension. For instance, if I do this:

var array = [1,1,1,2,2,2,3,3,4,5,5,6,6,7];
var dimension = crossfilter.dimension(array);
var group = crossfilter.group(dimension);
// group.size() will equal 7, as group is a representation of dimension's unique values

This is useful for creating a histogram and showing the distribution of a dimension.

However, if you have hundreds of unique values, using that group for a histogram becomes less practical because your histogram bars become too small for the view frame you have, or would be too small to be discernible to a viewer (and unlike the Crossfilter examples, I'm using rectangles instead of paths to have better control over colors).

I would like to decrease the number of possible buckets created by crossfilter.group(dimension), so that I fold over buckets of unique values into one another.

For instance, if I had a group with 300 unique value buckets, I would like to be able to reduce that number to 20 (let's assume even split for now), where the first 15 values of of the original 300 are folded into one bucket, and the next 15 into another, and so on until only 20 buckets are created from the original 300.

I could do this easily enough with just javascript, but I need to keep the representation glued to the crossfilter object. Is there a way to do this with a crossfilter method?

Tom Tom · Accepted Answer · 2014-06-22T11:51:11

I use something along these lines:

var array = [1,1,1,2,2,2,3,3,4,5,5,6,6,7];
var ndx = crossfilter(array);
var scale = d3.scale.quantize().domain([0, 10]).range(d3.range(1, 4));
var dimension = ndx.dimension(function(d) { return scale(d); });
// or, more concisely:
var dimension = ndx.dimension(scale);
var group = dimension.group();

This creates a scale function that maps the domain to the range, "rounding" as appropriate. See quantize scale.

Note that the domain and range don't include the max value, so:

scale(0)  // 1
scale(9)  // 3

The dimension can then be created to use this remapping function, and group will count them up.

The result of group.all() is:

[{key: 1, value: 8}, {key: 2, value: 5}, {key: 3, value: 1}]

You will probably need to convert the keys back into the original domain to be used for ticks/labels on your histogram or whatever, and you can use scale.invertExtent to do that:

scale.invertExtent(1)  // [0, 3.33..]
scale.invertExtent(2)  // [3.33.., 6.66..]
scale.invertExtent(3)  // [6.66.., 10]

How can I reduce the amount of buckets created by a Crossfilter group function?

1 Answers